324 lines
15 KiB
Markdown
324 lines
15 KiB
Markdown
# Repository Guidelines
|
|
|
|
## Setup Checklist for New Contributors
|
|
|
|
- [ ] Install dev dependencies: `npm install`
|
|
- [ ] Install pre-commit hooks: `npm run secret-scan:install-hook`
|
|
- [ ] Apply GitHub labels: `gh label create priority/P0 --color B60205 --description "Critical"` (see .github/labels.yml for full list)
|
|
- [ ] Verify devcontainer: `devcontainer build --workspace-folder .`
|
|
- [ ] Run first tech-debt scan: `node scripts/tech-debt-scan.mjs`
|
|
|
|
## Purpose-First Doctrine
|
|
|
|
sf follows **spec-first TDD**: see [`docs/SPEC_FIRST_TDD.md`](docs/SPEC_FIRST_TDD.md) for the full constitution.
|
|
|
|
SF's foundational architecture decision is [`ADR-0000: SF Is a Purpose-to-Software Compiler`](docs/adr/0000-purpose-to-software-compiler.md).
|
|
Treat this as the product contract for all planning and implementation:
|
|
|
|
1. capture bounded intent
|
|
2. translate intent into the eight PDD fields
|
|
3. research missing context and name assumptions
|
|
4. apply run-control policy from confidence, risk, reversibility, blast radius, cost, legal/compliance scope, and production/customer impact
|
|
5. generate milestone/slice/task contracts from structured state
|
|
6. write failing tests or executable evidence before implementation
|
|
7. implement the smallest code change that satisfies the contract
|
|
8. verify, record evidence, retain useful memory, and continue
|
|
|
|
Iron Law:
|
|
|
|
```
|
|
THE TEST IS THE SPEC. THE JSDOC IS THE PURPOSE. CODE EXISTS TO FULFILL PURPOSE.
|
|
|
|
NO BEHAVIOR CHANGE WITHOUT A FAILING TEST FIRST.
|
|
NO COMPLETION WITHOUT A REAL CONSUMER.
|
|
NO JUDGMENT CALL WITHOUT A CONFIDENCE AND FALSIFIER.
|
|
```
|
|
|
|
Every artifact (slice plan, task plan, function, test, ADR) must answer:
|
|
|
|
- **why** this behaviour exists
|
|
- **what value** it creates or protects
|
|
- **who** uses it in production (real consumer, not just tests)
|
|
- **what breaks** if it returns the wrong answer
|
|
|
|
If any answer is missing: `BLOCKED: purpose unclear — [field]`. Surfacing the gap beats rationalising past it.
|
|
|
|
## Project Structure
|
|
|
|
This is a TypeScript monorepo with npm workspaces. The main entry point is `dist/loader.js` (bin: `sf`).
|
|
|
|
- `src/` — Main CLI source (sf-run core, extensions, agents)
|
|
- `packages/` — Workspace packages (7 total): pi-tui, pi-ai, pi-agent-core, pi-coding-agent, daemon, native, rpc-client
|
|
- `web/` — Next.js web frontend (optional web host mode)
|
|
- `rust-engine/` — Rust N-API bindings for performance-critical operations
|
|
- `scripts/` — Build, dev, release, and CI helper scripts
|
|
- `tests/` — Fixtures, smoke tests, live tests, live-regression tests
|
|
- `docs/` — User guides and developer documentation
|
|
- `docker/` — Docker sandbox and builder configurations
|
|
|
|
## Build, Test, and Development Commands
|
|
|
|
```bash
|
|
# Full build (core + web)
|
|
npm run build
|
|
|
|
# Build core only (packages + tsc + resources)
|
|
npm run build:core
|
|
|
|
# Dev mode with hot reload
|
|
npm run dev
|
|
|
|
# Run all tests (unit + integration)
|
|
npm test
|
|
|
|
# Unit tests only
|
|
npm run test:unit
|
|
|
|
# Integration tests only
|
|
npm run test:integration
|
|
|
|
# Coverage check (Vitest V8 provider; thresholds: statements 40%, lines 40%, branches 20%, functions 20%)
|
|
npm run test:coverage
|
|
|
|
# Type check extensions (no emit)
|
|
npm run typecheck:extensions
|
|
|
|
# Native Rust build
|
|
npm run build:native
|
|
|
|
# Root lint checks (Biome over src/)
|
|
npm run lint
|
|
npm run lint:fix
|
|
|
|
# Web lint (Next.js ESLint; separate package)
|
|
npm --prefix web run lint
|
|
|
|
# Release workflow (changelog + version bump)
|
|
npm run release:changelog
|
|
npm run release:bump
|
|
```
|
|
|
|
## Coding Style & Naming Conventions
|
|
|
|
- **Language**: TypeScript with `"strict": true` enabled in all packages
|
|
- **Module resolution**: NodeNext
|
|
- **Target**: ES2022
|
|
- **Package manager**: npm (canonical; do not commit `bun.lock` or `pnpm-lock.yaml`)
|
|
- **Commit format**: Conventional Commits enforced via commit-msg hook
|
|
- **Branch naming**: `<type>/<short-description>` — e.g. `feat/new-command`, `fix/login-bug`
|
|
- Types: `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `infra`, `ci`, `perf`, `build`, `revert`
|
|
|
|
### JSDoc Purpose Convention
|
|
|
|
Every exported function, type, class, and module-level constant opens with a JSDoc block whose first sentence is its **purpose** — the consumer-facing reason it exists. Not what it does (the signature shows that), but **why**.
|
|
|
|
```ts
|
|
/**
|
|
* Acquire a unit claim atomically. Returns true on success, false if another worker
|
|
* already holds an unexpired lease.
|
|
*
|
|
* Purpose: prevent two workers from dispatching the same unit when the run-lock is
|
|
* unavailable (shared NFS, broken filesystem semantics) — the conditional UPDATE in
|
|
* SQLite is the safety net.
|
|
*
|
|
* Consumer: autonomous dispatch.ts when picking the next eligible unit per poll tick.
|
|
*/
|
|
export function claimUnit(unitId: string, leaseMs: number): boolean { ... }
|
|
```
|
|
|
|
Required for every exported symbol whose behaviour is non-trivial:
|
|
|
|
- **First line** — what it returns / does, in the present tense.
|
|
- **Purpose:** — why it exists; the value it protects.
|
|
- **Consumer:** — who calls it in production. If you can't name a consumer, the symbol shouldn't exist yet.
|
|
|
|
A bare `/** Helper. */` is a code smell. Either write the purpose or delete the symbol.
|
|
|
|
For module-level JSDoc (file headers): keep the existing `module-name.ts — short description` opening, then a `Purpose:` line stating why the module exists as a separable unit.
|
|
|
|
## Testing Guidelines
|
|
|
|
- **Primary test runner**: Vitest via `npm run test:unit`, `npm run test:integration`, and `npm test`
|
|
- **Node test runner**: used only by specific package/native/browser-tool scripts where `package.json` says `node --test`
|
|
- **Coverage tool**: Vitest coverage with `@vitest/coverage-v8`; thresholds are enforced in CI
|
|
- **Naming**: `*.test.ts` and `*.test.mjs` patterns
|
|
- **Smoke tests**: `npm run test:smoke`
|
|
- **Live tests**: `npm run test:live` (requires environment variables)
|
|
|
|
### Purposeful Tests
|
|
|
|
Test names are contract claims. Use the form `<what>_<when>_<expected>`:
|
|
|
|
| Good | Bad |
|
|
|---|---|
|
|
| `claim_when_lease_expired_returns_true` | `test claim` |
|
|
| `dispatch_when_blocker_unresolved_skips_unit` | `test dispatch logic` |
|
|
|
|
Three-tier organisation:
|
|
|
|
1. **Behaviour contracts** (primary) — what the consumer receives. The spec. A different implementation that passes these is equally correct.
|
|
2. **Degradation contracts** — what happens when dependencies fail. Consumer must always get a useful response; failure must degrade, not crash.
|
|
3. **Implementation guards** (secondary, labelled `// guard:`) — protect specific failure modes (resource leaks, infinite loops). Refactors update guards, not behaviour contracts.
|
|
|
|
Write behaviour contracts first. They are the work order.
|
|
|
|
A test that asserts call counts or mock interactions is **mechanical**, not purposeful — it should be a labelled implementation guard, not a primary contract test. A test that breaks on a refactor without behaviour change is mechanical too. Fix the test or relabel it.
|
|
|
|
**Bug = missing correct-behaviour test.** When fixing a bug, write a test for the *correct* behaviour first — it must fail (RED) because the bug exists. If it passes immediately, the test is testing the broken behaviour; fix the test, not the code.
|
|
|
|
## Extension Development
|
|
|
|
Extensions live in `src/resources/extensions/`. Each extension should:
|
|
- Export a manifest with `name`, `version`, `tools[]`, and `agents[]`
|
|
- Include tests in `src/resources/extensions/<name>/tests/`
|
|
- Register tools via the extension API
|
|
|
|
## Pull Request Guidelines
|
|
|
|
1. **Link an issue** — PRs without a linked issue will be closed without review
|
|
2. **One concern per PR** — don't bundle unrelated changes
|
|
3. **No drive-by formatting** — don't reformat code you didn't touch
|
|
4. **CI must pass** — fix failing tests before requesting review
|
|
5. **Rebase onto main** — do not merge main into your feature branch
|
|
6. Use the PR template at `.github/PULL_REQUEST_TEMPLATE.md`
|
|
|
|
## Environment Setup
|
|
|
|
Copy `docker/.env.example` to `.env` and fill in API keys. At minimum you need one LLM provider key (Anthropic, OpenAI, Google, or OpenRouter).
|
|
|
|
## Architecture Notes
|
|
|
|
- State lives on disk in `.sf/` — no in-memory state survives across sessions
|
|
- Bundled extensions/agents sync to `~/.sf/agent/` on every launch
|
|
- LLM providers are lazy-loaded on first use to reduce cold-start time
|
|
- Native Rust engine handles grep, glob, ps, highlight, ast, diff
|
|
|
|
## SF Planning State
|
|
|
|
SQLite (`.sf/sf.db`) is the canonical structured store for SF agent state whenever schema, ordering, priority, joins, or validation matter. Runtime files under `.sf/` are working artifacts, generated projections, evidence, or recovery inputs.
|
|
|
|
**Promote-only rule:** Agent runtime state (`.sf/milestones/`, `.sf/evals/`, `.sf/harness/`, locks, journals, and generated manifests) is transient and gitignored — never committed directly. Project `.sf/` files tracked in the repo root are limited to deliberate human-authored guidance such as `PRINCIPLES.md`, `TASTE.md`, `ANTI-GOALS.md`, `DECISIONS.md`, `KNOWLEDGE.md`, `REQUIREMENTS.md`, and `ROADMAP.md`.
|
|
|
|
SF keeps the working spec contract in `.sf`, database first. Root-level `SPEC.md`, `BASE_SPEC.md`, product spec files, and `docs/specs/` are human exports, reports, review surfaces, or external evidence, not a competing planning model. SF can read any repo file as source evidence, but information required for SF's own future operation must be analyzed into `.sf`/DB-backed state. New plans must state purpose on every milestone, slice, and task before implementation detail.
|
|
|
|
SF has one flow engine across TUI, CLI, web, editor, and machine entrypoints.
|
|
Keep integration language separated: **surface** means TUI/CLI/web/editor/machine,
|
|
**protocol** means ACP/RPC/stdio JSON-RPC/HTTP/wire, **output format** means
|
|
text/json/stream-json, **run control** means manual/assisted/autonomous, and
|
|
**permission profile** means restricted/normal/trusted/unrestricted.
|
|
`sf headless` is the current machine-surface command, not a separate flow and
|
|
not a synonym for JSON. See `docs/specs/sf-operating-model.md`.
|
|
|
|
Source placement follows the same model. `src/resources/extensions/sf/` owns the
|
|
SF flow extension, `src/headless*.ts` owns the `sf headless` machine-surface
|
|
command path, `web/` owns the browser surface, `vscode-extension/` owns the
|
|
editor surface, `packages/rpc-client/` owns reusable RPC adapter code, and
|
|
`packages/*` own reusable workspace packages. See
|
|
`docs/specs/sf-operating-model.md`.
|
|
|
|
Promoted artifacts — milestone summaries, architecture decision records (ADRs), and durable specifications — belong in tracked documentation directories:
|
|
|
|
- `docs/plans/` — reviewed implementation plans promoted from `.sf/` milestone planning
|
|
- `docs/adr/` — accepted architectural decisions promoted from `.sf/DECISIONS.md`
|
|
- `docs/specs/` — human-readable behavior/API contract exports and reports
|
|
|
|
**Naming conventions:**
|
|
- Milestone IDs: `M001`, `M002`, …
|
|
- Slice IDs: `S01`, `S02`, …
|
|
- Task IDs: `T01`, `T02`, …
|
|
|
|
**Commands:**
|
|
- `sf plan promote <source>` — copy a file from `.sf/` to `docs/plans/`, `docs/adr/`, or `docs/specs/`
|
|
- `sf plan list` — list active milestone and slice records/artifacts
|
|
- `sf plan diff` — compare runtime planning state with promoted `docs/` artifacts
|
|
- `sf plan specs generate|diff|check` — regenerate or verify human `docs/specs/` exports from `.sf` state
|
|
|
|
See [`docs/plans/README.md`](docs/plans/README.md), [`docs/adr/README.md`](docs/adr/README.md), and [`docs/specs/README.md`](docs/specs/README.md) for directory-specific conventions.
|
|
|
|
## SF Schedule
|
|
|
|
The SF schedule system (`/sf schedule`) stores project time-bound reminders in the repo SQLite DB (`.sf/sf.db`, `schedule_entries`) and global reminders in `~/.sf/sf.db`. Legacy `.sf/schedule.jsonl` rows are import-only compatibility input when a project has no schedule rows yet. Items surface on their due date via pull queries at launch and autonomous mode boundaries — there is no background daemon.
|
|
|
|
**When to use `sf schedule` vs backlog:**
|
|
- **`sf schedule`** — time-bound items that must surface at a future date: a 2-week adoption review after shipping a feature, a 1-month audit of an architectural decision, a 30-minute reminder to run a command. Use when the *timing* matters, not just the *priority*.
|
|
- **Backlog** (milestone/slice queue) — priority-ordered items with no specific timing. Items are dispatched in sequence by the autonomous controller based on readiness and dependency, not wall-clock time.
|
|
|
|
**Examples:**
|
|
```
|
|
sf schedule add --in 2w "Review feature adoption metrics"
|
|
sf schedule add --in 1mo --kind audit "Audit ADR-007 decision implementation"
|
|
sf schedule add --in 30m --kind reminder "Run integration tests"
|
|
```
|
|
|
|
For the full specification, see [`docs/specs/sf-schedule.md`](docs/specs/sf-schedule.md).
|
|
|
|
## Eval Dump Inbox
|
|
|
|
SF/Pi automatically loads `AGENTS.md` and `CLAUDE.md` from the repo tree at
|
|
startup. It does not automatically load `TODO.md`, but this repo uses root
|
|
`TODO.md` as a temporary human dump inbox for eval and self-evolution ideas.
|
|
|
|
When a repo contains a root `TODO.md`, treat it as a temporary dump inbox and
|
|
read it before planning substantive work in that repo. This applies even when
|
|
the user does not explicitly mention evals. Treat the `Raw Dump Inbox` section
|
|
as untriaged source material, not as durable instructions. Triage it into
|
|
reviewable artifacts: concrete eval cases, harness gaps, memory extraction
|
|
requirements, docs, tests, or follow-up implementation tasks. After triage,
|
|
remove the processed dump notes from `TODO.md` so the file returns to an empty
|
|
inbox/template state. Do not treat dumped notes as runtime memory or approved
|
|
behavior until they are converted into tested, versioned project artifacts.
|
|
|
|
## CI/CD
|
|
|
|
- `ci.yml` — builds, tests, gates merges to main
|
|
- `pipeline.yml` — three-stage release (dev → test → prod)
|
|
- `pr-risk.yml` — PR risk classification
|
|
- `ai-triage.yml` — AI-based issue/PR triage
|
|
|
|
## Code Quality Tooling
|
|
|
|
The repository uses the following quality tools:
|
|
|
|
- **Biome** — root source linting via `npm run lint` and autofix via `npm run lint:fix`
|
|
- Scope: `src/` plus versioned JSON checks
|
|
- Config: `biome.json`
|
|
- Format touched files with `npx biome check --write <paths>`; full-repo formatting is not the current CI gate.
|
|
- **ESLint** — web app linting via `npm --prefix web run lint`
|
|
- Scope: `web/`
|
|
- Config: `web/eslint.config.mjs`
|
|
- **TypeScript** — Strict mode enabled; run `npm run typecheck:extensions`
|
|
- **Knip** — Detect unused code and dependencies: `npx knip` (config at `knip.json`)
|
|
- **jscpd** — Detect duplicate code: `npx jscpd` (config at `.jscpd.json`)
|
|
- **Tech Debt Scanner** — `node scripts/tech-debt-scan.mjs`
|
|
- Tracks TODO/FIXME/HACK/XXX counts against thresholds
|
|
- **Secret Scan** — `npm run secret-scan` (pre-commit hook available via `npm run secret-scan:install-hook`)
|
|
- **Coverage** — `npm run test:coverage` (Vitest V8 coverage with 40/40/20/20 thresholds)
|
|
|
|
## Dev Container
|
|
|
|
A Dev Container configuration is available at `.devcontainer/devcontainer.json`.
|
|
Open the repository in VS Code with the Dev Containers extension, or run:
|
|
|
|
```bash
|
|
devcontainer up --workspace-folder .
|
|
```
|
|
|
|
The container includes Node 26, Rust, GitHub CLI, Docker-in-Docker, and recommended VS Code extensions.
|
|
|
|
## Dependency Updates
|
|
|
|
Dependabot is configured at `.github/dependabot.yml` for:
|
|
- Root npm dependencies (weekly, grouped by ecosystem)
|
|
- Web app dependencies (weekly)
|
|
- GitHub Actions (weekly)
|
|
|
|
## Issue Labels
|
|
|
|
Label definitions are at `.github/labels.yml`. Apply labels using:
|
|
|
|
```bash
|
|
# Create a single label
|
|
gh label create priority/P0 --color B60205 --description "Critical — blocks release"
|
|
|
|
# Or use a label management action in CI
|
|
```
|