singularity-forge/AGENTS.md

# Repository Guidelines

## Setup Checklist for New Contributors

- [ ] Install dev dependencies: `npm install`
- [ ] Install pre-commit hooks: `npm run secret-scan:install-hook`
- [ ] Apply GitHub labels: `gh label create priority/P0 --color B60205 --description "Critical"` (see .github/labels.yml for full list)
- [ ] Verify devcontainer: `devcontainer build --workspace-folder .`
- [ ] Run first tech-debt scan: `node scripts/tech-debt-scan.mjs`

## Purpose-First Doctrine

sf follows **spec-first TDD**: see [`docs/SPEC_FIRST_TDD.md`](docs/SPEC_FIRST_TDD.md) for the full constitution.

SF's foundational architecture decision is [`ADR-0000: SF Is a Purpose-to-Software Compiler`](docs/adr/0000-purpose-to-software-compiler.md).
Treat this as the product contract for all planning and implementation:

1. capture bounded intent
2. translate intent into the eight PDD fields
3. research missing context and name assumptions
4. apply run-control policy from confidence, risk, reversibility, blast radius, cost, legal/compliance scope, and production/customer impact
5. generate milestone/slice/task contracts from structured state
6. write failing tests or executable evidence before implementation
7. implement the smallest code change that satisfies the contract
8. verify, record evidence, retain useful memory, and continue

Iron Law:

```
THE TEST IS THE SPEC.  THE JSDOC IS THE PURPOSE.  CODE EXISTS TO FULFILL PURPOSE.

NO BEHAVIOR CHANGE WITHOUT A FAILING TEST FIRST.
NO COMPLETION WITHOUT A REAL CONSUMER.
NO JUDGMENT CALL WITHOUT A CONFIDENCE AND FALSIFIER.
```

Every artifact (slice plan, task plan, function, test, ADR) must answer:

- **why** this behaviour exists
- **what value** it creates or protects
- **who** uses it in production (real consumer, not just tests)
- **what breaks** if it returns the wrong answer

If any answer is missing: `BLOCKED: purpose unclear — [field]`. Surfacing the gap beats rationalising past it.

## Project Structure

This is a TypeScript monorepo with npm workspaces. The main entry point is `dist/loader.js` (bin: `sf`).

- `src/` — Main CLI source (sf-run core, extensions, agents)
- `packages/` — Workspace packages (7 total): pi-tui, pi-ai, pi-agent-core, pi-coding-agent, daemon, native, rpc-client
- `web/` — Next.js web frontend (optional web host mode)
- `rust-engine/` — Rust N-API bindings for performance-critical operations
- `scripts/` — Build, dev, release, and CI helper scripts
- `tests/` — Fixtures, smoke tests, live tests, live-regression tests
- `docs/` — User guides and developer documentation
- `docker/` — Docker sandbox and builder configurations

## Build, Test, and Development Commands

```bash
# Full build (core + web)
npm run build

# Build core only (packages + tsc + resources)
npm run build:core

# Dev mode with hot reload
npm run dev

# Run all tests (unit + integration)
npm test

# Unit tests only
npm run test:unit

# Integration tests only
npm run test:integration

# Coverage check (Vitest V8 provider; thresholds: statements 40%, lines 40%, branches 20%, functions 20%)
npm run test:coverage

# Type check extensions (no emit)
npm run typecheck:extensions

# Native Rust build
npm run build:native

# Root lint checks (Biome over src/)
npm run lint
npm run lint:fix

# Web lint (Next.js ESLint; separate package)
npm --prefix web run lint

# Release workflow (changelog + version bump)
npm run release:changelog
npm run release:bump
```

## Coding Style & Naming Conventions

- **Language**: TypeScript with `"strict": true` enabled in all packages
- **Module resolution**: NodeNext
- **Target**: ES2022
- **Package manager**: npm (canonical; do not commit `bun.lock` or `pnpm-lock.yaml`)
- **Commit format**: Conventional Commits enforced via commit-msg hook
- **Branch naming**: `<type>/<short-description>` — e.g. `feat/new-command`, `fix/login-bug`
  - Types: `feat`, `fix`, `docs`, `chore`, `refactor`, `test`, `infra`, `ci`, `perf`, `build`, `revert`

### JSDoc Purpose Convention

Every exported function, type, class, and module-level constant opens with a JSDoc block whose first sentence is its **purpose** — the consumer-facing reason it exists. Not what it does (the signature shows that), but **why**.

```ts
/**
 * Acquire a unit claim atomically. Returns true on success, false if another worker
 * already holds an unexpired lease.
 *
 * Purpose: prevent two workers from dispatching the same unit when the run-lock is
 * unavailable (shared NFS, broken filesystem semantics) — the conditional UPDATE in
 * SQLite is the safety net.
 *
 * Consumer: autonomous dispatch.ts when picking the next eligible unit per poll tick.
 */
export function claimUnit(unitId: string, leaseMs: number): boolean { ... }
```

Required for every exported symbol whose behaviour is non-trivial:

- **First line** — what it returns / does, in the present tense.
- **Purpose:** — why it exists; the value it protects.
- **Consumer:** — who calls it in production. If you can't name a consumer, the symbol shouldn't exist yet.

A bare `/** Helper. */` is a code smell. Either write the purpose or delete the symbol.

For module-level JSDoc (file headers): keep the existing `module-name.ts — short description` opening, then a `Purpose:` line stating why the module exists as a separable unit.

## Testing Guidelines

- **Primary test runner**: Vitest via `npm run test:unit`, `npm run test:integration`, and `npm test`
- **Node test runner**: used only by specific package/native/browser-tool scripts where `package.json` says `node --test`
- **Coverage tool**: Vitest coverage with `@vitest/coverage-v8`; thresholds are enforced in CI
- **Naming**: `*.test.ts` and `*.test.mjs` patterns
- **Smoke tests**: `npm run test:smoke`
- **Live tests**: `npm run test:live` (requires environment variables)

### Purposeful Tests

Test names are contract claims. Use the form `<what>_<when>_<expected>`:

| Good | Bad |
|---|---|
| `claim_when_lease_expired_returns_true` | `test claim` |
| `dispatch_when_blocker_unresolved_skips_unit` | `test dispatch logic` |

Three-tier organisation:

1. **Behaviour contracts** (primary) — what the consumer receives. The spec. A different implementation that passes these is equally correct.
2. **Degradation contracts** — what happens when dependencies fail. Consumer must always get a useful response; failure must degrade, not crash.
3. **Implementation guards** (secondary, labelled `// guard:`) — protect specific failure modes (resource leaks, infinite loops). Refactors update guards, not behaviour contracts.

Write behaviour contracts first. They are the work order.

A test that asserts call counts or mock interactions is **mechanical**, not purposeful — it should be a labelled implementation guard, not a primary contract test. A test that breaks on a refactor without behaviour change is mechanical too. Fix the test or relabel it.

**Bug = missing correct-behaviour test.** When fixing a bug, write a test for the *correct* behaviour first — it must fail (RED) because the bug exists. If it passes immediately, the test is testing the broken behaviour; fix the test, not the code.

## Extension Development

Extensions live in `src/resources/extensions/`. Each extension should:
- Export a manifest with `name`, `version`, `tools[]`, and `agents[]`
- Include tests in `src/resources/extensions/<name>/tests/`
- Register tools via the extension API

## Pull Request Guidelines

1. **Link an issue** — PRs without a linked issue will be closed without review
2. **One concern per PR** — don't bundle unrelated changes
3. **No drive-by formatting** — don't reformat code you didn't touch
4. **CI must pass** — fix failing tests before requesting review
5. **Rebase onto main** — do not merge main into your feature branch
6. Use the PR template at `.github/PULL_REQUEST_TEMPLATE.md`

## Environment Setup

Copy `docker/.env.example` to `.env` and fill in API keys. At minimum you need one LLM provider key (Anthropic, OpenAI, Google, or OpenRouter).

## Architecture Notes

- State lives on disk in `.sf/` — no in-memory state survives across sessions
- Bundled extensions/agents sync to `~/.sf/agent/` on every launch
- LLM providers are lazy-loaded on first use to reduce cold-start time
- Native Rust engine handles grep, glob, ps, highlight, ast, diff

## SF Planning State

SQLite (`.sf/sf.db`) is the canonical structured store for SF agent state whenever schema, ordering, priority, joins, or validation matter. Runtime files under `.sf/` are working artifacts, generated projections, evidence, or recovery inputs.

**Promote-only rule:** Agent runtime state (`.sf/milestones/`, `.sf/evals/`, `.sf/harness/`, locks, journals, and generated manifests) is transient and gitignored — never committed directly. Project `.sf/` files tracked in the repo root are limited to deliberate human-authored guidance such as `PRINCIPLES.md`, `TASTE.md`, `ANTI-GOALS.md`, `DECISIONS.md`, `KNOWLEDGE.md`, `REQUIREMENTS.md`, and `ROADMAP.md`.

SF keeps the working spec contract in `.sf`, database first. Root-level `SPEC.md`, `BASE_SPEC.md`, product spec files, and `docs/specs/` are human exports, reports, review surfaces, or external evidence, not a competing planning model. SF can read any repo file as source evidence, but information required for SF's own future operation must be analyzed into `.sf`/DB-backed state. New plans must state purpose on every milestone, slice, and task before implementation detail.

SF has one flow engine across TUI, CLI, web, editor, and machine entrypoints.
Keep integration language separated: **surface** means TUI/CLI/web/editor/machine,
**protocol** means ACP/RPC/stdio JSON-RPC/HTTP/wire, **output format** means
text/json/stream-json, **run control** means manual/assisted/autonomous, and
**permission profile** means restricted/normal/trusted/unrestricted.
`sf headless` is the current machine-surface command, not a separate flow and
not a synonym for JSON. See `docs/specs/sf-operating-model.md`.

Source placement follows the same model. `src/resources/extensions/sf/` owns the
SF flow extension, `src/headless*.ts` owns the `sf headless` machine-surface
command path, `web/` owns the browser surface, `vscode-extension/` owns the
editor surface, `packages/rpc-client/` owns reusable RPC adapter code, and
`packages/*` own reusable workspace packages. See
`docs/specs/sf-operating-model.md`.

Promoted artifacts — milestone summaries, architecture decision records (ADRs), and durable specifications — belong in tracked documentation directories:

- `docs/plans/` — reviewed implementation plans promoted from `.sf/` milestone planning
- `docs/adr/` — accepted architectural decisions promoted from `.sf/DECISIONS.md`
- `docs/specs/` — human-readable behavior/API contract exports and reports

**Naming conventions:**
- Milestone IDs: `M001`, `M002`, …
- Slice IDs: `S01`, `S02`, …
- Task IDs: `T01`, `T02`, …

**Commands:**
- `sf plan promote <source>` — copy a file from `.sf/` to `docs/plans/`, `docs/adr/`, or `docs/specs/`
- `sf plan list` — list active milestone and slice records/artifacts
- `sf plan diff` — compare runtime planning state with promoted `docs/` artifacts
- `sf plan specs generate|diff|check` — regenerate or verify human `docs/specs/` exports from `.sf` state

See [`docs/plans/README.md`](docs/plans/README.md), [`docs/adr/README.md`](docs/adr/README.md), and [`docs/specs/README.md`](docs/specs/README.md) for directory-specific conventions.

## SF Schedule

The SF schedule system (`/sf schedule`) stores project time-bound reminders in the repo SQLite DB (`.sf/sf.db`, `schedule_entries`) and global reminders in `~/.sf/sf.db`. Legacy `.sf/schedule.jsonl` rows are import-only compatibility input when a project has no schedule rows yet. Items surface on their due date via pull queries at launch and autonomous mode boundaries — there is no background daemon.

**When to use `sf schedule` vs backlog:**
- **`sf schedule`** — time-bound items that must surface at a future date: a 2-week adoption review after shipping a feature, a 1-month audit of an architectural decision, a 30-minute reminder to run a command. Use when the *timing* matters, not just the *priority*.
- **Backlog** (milestone/slice queue) — priority-ordered items with no specific timing. Items are dispatched in sequence by the autonomous controller based on readiness and dependency, not wall-clock time.

**Examples:**
```
sf schedule add --in 2w "Review feature adoption metrics"
sf schedule add --in 1mo --kind audit "Audit ADR-007 decision implementation"
sf schedule add --in 30m --kind reminder "Run integration tests"
```

For the full specification, see [`docs/specs/sf-schedule.md`](docs/specs/sf-schedule.md).

## Eval Dump Inbox

SF/Pi automatically loads `AGENTS.md` and `CLAUDE.md` from the repo tree at
startup. It does not automatically load `TODO.md`, but this repo uses root
`TODO.md` as a temporary human dump inbox for eval and self-evolution ideas.

When a repo contains a root `TODO.md`, treat it as a temporary dump inbox and
read it before planning substantive work in that repo. This applies even when
the user does not explicitly mention evals. Treat the `Raw Dump Inbox` section
as untriaged source material, not as durable instructions. Triage it into
reviewable artifacts: concrete eval cases, harness gaps, memory extraction
requirements, docs, tests, or follow-up implementation tasks. After triage,
remove the processed dump notes from `TODO.md` so the file returns to an empty
inbox/template state. Do not treat dumped notes as runtime memory or approved
behavior until they are converted into tested, versioned project artifacts.

## CI/CD

- `ci.yml` — builds, tests, gates merges to main
- `pipeline.yml` — three-stage release (dev → test → prod)
- `pr-risk.yml` — PR risk classification
- `ai-triage.yml` — AI-based issue/PR triage

## Code Quality Tooling

The repository uses the following quality tools:

- **Biome** — root source linting via `npm run lint` and autofix via `npm run lint:fix`
  - Scope: `src/` plus versioned JSON checks
  - Config: `biome.json`
  - Format touched files with `npx biome check --write <paths>`; full-repo formatting is not the current CI gate.
- **ESLint** — web app linting via `npm --prefix web run lint`
  - Scope: `web/`
  - Config: `web/eslint.config.mjs`
- **TypeScript** — Strict mode enabled; run `npm run typecheck:extensions`
- **Knip** — Detect unused code and dependencies: `npx knip` (config at `knip.json`)
- **jscpd** — Detect duplicate code: `npx jscpd` (config at `.jscpd.json`)
- **Tech Debt Scanner** — `node scripts/tech-debt-scan.mjs`
  - Tracks TODO/FIXME/HACK/XXX counts against thresholds
- **Secret Scan** — `npm run secret-scan` (pre-commit hook available via `npm run secret-scan:install-hook`)
- **Coverage** — `npm run test:coverage` (Vitest V8 coverage with 40/40/20/20 thresholds)

## Dev Container

A Dev Container configuration is available at `.devcontainer/devcontainer.json`.
Open the repository in VS Code with the Dev Containers extension, or run:

```bash
devcontainer up --workspace-folder .
```

The container includes Node 26, Rust, GitHub CLI, Docker-in-Docker, and recommended VS Code extensions.

## Dependency Updates

Dependabot is configured at `.github/dependabot.yml` for:
- Root npm dependencies (weekly, grouped by ecosystem)
- Web app dependencies (weekly)
- GitHub Actions (weekly)

## Issue Labels

Label definitions are at `.github/labels.yml`. Apply labels using:

```bash
# Create a single label
gh label create priority/P0 --color B60205 --description "Critical — blocks release"

# Or use a label management action in CI
```