Quality Score

Principles

Make code legible to agents with semantic names and explicit boundaries.
Prefer small, testable modules over files that require broad context to edit.
Enforce style, architecture, and reliability rules mechanically where possible.
Keep a cleanup loop for stale docs, generated artifacts, and accumulated implementation debt.

Fast Checks (run on every change)

just typecheck    # tsc --project tsconfig.resources.json, no emit
just lint         # eslint across src/

Both must pass before any commit. Typecheck catches type drift early. Lint enforces import rules that enforce the Pi clean seam (ADR-010).

Slow Checks (run before shipping)

just test         # full unit suite — node --test runner, no coverage overhead
just test-smoke   # sf --version, sf --help, sf --print — all three must pass

Coverage thresholds (enforced by npm run test:coverage):

Statements: 40% minimum
Lines: 40% minimum
Branches: 20% minimum
Functions: 20% minimum
Autonomous path overrides:
- src/resources/extensions/sf/auto/**: 60% statements/lines/functions, 40% branches
- src/resources/extensions/sf/uok/**: 60% statements/lines/functions, 40% branches

These are floors, not targets. The real quality bar is purposeful tests that assert behavior contracts (see docs/SPEC_FIRST_TDD.md).

Evals (ad-hoc, not yet automated)

No automated eval suite exists yet. ADR-018 Phase 3 defines the eval runner contract. Until then, quality for autonomous behavior is measured by:

Smoke test pass rate across providers
Manual milestone runs with trace inspection (.sf/traces/)
Decision register review at milestone close

Known Blind Spots

Area	Gap	Risk
`headless.ts`	RPC lifecycle (spawn → event stream → restart) is not covered by unit tests; only integration-tested manually	High: crash recovery correctness
Parallel milestone orchestration	No tests for concurrent STATE.md mutations	Medium: data loss under parallelism
Notification routing	Text-matching classification has no per-pattern unit tests	Low: wrong exit code on wording change
Stuck detection	Sliding-window logic tested, but real-loop replay is not	Medium: false positives under unusual patterns
Provider fallback	Model routing under simulated provider failure not covered	Medium: silent routing to wrong tier

Doc Quality Signal

grep -r "TODO\|placeholder\|Describe the\|Document.*here\|Record.*here\|Use this as\|Capture.*here\|Track cleanup" \
  docs/ --include="*.md"

This should return empty. Any match is a placeholder doc that needs real content.

2.7 KiB Raw Permalink Blame History