singularity-forge/AGENTS.md
2026-05-04 23:27:20 +02:00

12 KiB

Repository Guidelines

Setup Checklist for New Contributors

  • Install dev dependencies: npm install
  • Install pre-commit hooks: npm run secret-scan:install-hook
  • Apply GitHub labels: gh label create priority/P0 --color B60205 --description "Critical" (see .github/labels.yml for full list)
  • Verify devcontainer: devcontainer build --workspace-folder .
  • Run first tech-debt scan: node scripts/tech-debt-scan.mjs

Purpose-First Doctrine

sf follows spec-first TDD: see docs/SPEC_FIRST_TDD.md for the full constitution.

Iron Law:

THE TEST IS THE SPEC.  THE JSDOC IS THE PURPOSE.  CODE EXISTS TO FULFILL PURPOSE.

NO BEHAVIOR CHANGE WITHOUT A FAILING TEST FIRST.
NO COMPLETION WITHOUT A REAL CONSUMER.
NO JUDGMENT CALL WITHOUT A CONFIDENCE AND FALSIFIER.

Every artifact (slice plan, task plan, function, test, ADR) must answer:

  • why this behaviour exists
  • what value it creates or protects
  • who uses it in production (real consumer, not just tests)
  • what breaks if it returns the wrong answer

If any answer is missing: BLOCKED: purpose unclear — [field]. Surfacing the gap beats rationalising past it.

Project Structure

This is a TypeScript monorepo with npm workspaces. The main entry point is dist/loader.js (bin: sf).

  • src/ — Main CLI source (sf-run core, extensions, agents)
  • packages/ — Workspace packages (8 total): pi-tui, pi-ai, pi-agent-core, pi-coding-agent, daemon, mcp-server, native, rpc-client
  • web/ — Next.js web frontend (optional web host mode)
  • rust-engine/ — Rust N-API bindings for performance-critical operations
  • scripts/ — Build, dev, release, and CI helper scripts
  • tests/ — Fixtures, smoke tests, live tests, live-regression tests
  • docs/ — User guides and developer documentation
  • docker/ — Docker sandbox and builder configurations

Build, Test, and Development Commands

# Full build (core + web)
npm run build

# Build core only (packages + tsc + resources)
npm run build:core

# Dev mode with hot reload
npm run dev

# Run all tests (unit + integration)
npm test

# Unit tests only
npm run test:unit

# Integration tests only
npm run test:integration

# Coverage check (Vitest V8 provider; thresholds: statements 40%, lines 40%, branches 20%, functions 20%)
npm run test:coverage

# Type check extensions (no emit)
npm run typecheck:extensions

# Native Rust build
npm run build:native

# Root lint checks (Biome over src/)
npm run lint
npm run lint:fix

# Web lint (Next.js ESLint; separate package)
npm --prefix web run lint

# Release workflow (changelog + version bump)
npm run release:changelog
npm run release:bump

Coding Style & Naming Conventions

  • Language: TypeScript with "strict": true enabled in all packages
  • Module resolution: NodeNext
  • Target: ES2022
  • Package manager: npm (canonical; do not commit bun.lock or pnpm-lock.yaml)
  • Commit format: Conventional Commits enforced via commit-msg hook
  • Branch naming: <type>/<short-description> — e.g. feat/new-command, fix/login-bug
    • Types: feat, fix, docs, chore, refactor, test, infra, ci, perf, build, revert

JSDoc Purpose Convention

Every exported function, type, class, and module-level constant opens with a JSDoc block whose first sentence is its purpose — the consumer-facing reason it exists. Not what it does (the signature shows that), but why.

/**
 * Acquire a unit claim atomically. Returns true on success, false if another worker
 * already holds an unexpired lease.
 *
 * Purpose: prevent two workers from dispatching the same unit when the run-lock is
 * unavailable (shared NFS, broken filesystem semantics) — the conditional UPDATE in
 * SQLite is the safety net.
 *
 * Consumer: auto-dispatch.ts when picking the next eligible unit per poll tick.
 */
export function claimUnit(unitId: string, leaseMs: number): boolean { ... }

Required for every exported symbol whose behaviour is non-trivial:

  • First line — what it returns / does, in the present tense.
  • Purpose: — why it exists; the value it protects.
  • Consumer: — who calls it in production. If you can't name a consumer, the symbol shouldn't exist yet.

A bare /** Helper. */ is a code smell. Either write the purpose or delete the symbol.

For module-level JSDoc (file headers): keep the existing module-name.ts — short description opening, then a Purpose: line stating why the module exists as a separable unit.

Testing Guidelines

  • Primary test runner: Vitest via npm run test:unit, npm run test:integration, and npm test
  • Node test runner: used only by specific package/native/browser-tool scripts where package.json says node --test
  • Coverage tool: Vitest coverage with @vitest/coverage-v8; thresholds are enforced in CI
  • Naming: *.test.ts and *.test.mjs patterns
  • Smoke tests: npm run test:smoke
  • Live tests: npm run test:live (requires environment variables)

Purposeful Tests

Test names are contract claims. Use the form <what>_<when>_<expected>:

Good Bad
claim_when_lease_expired_returns_true test claim
dispatch_when_blocker_unresolved_skips_unit test dispatch logic

Three-tier organisation:

  1. Behaviour contracts (primary) — what the consumer receives. The spec. A different implementation that passes these is equally correct.
  2. Degradation contracts — what happens when dependencies fail. Consumer must always get a useful response; failure must degrade, not crash.
  3. Implementation guards (secondary, labelled // guard:) — protect specific failure modes (resource leaks, infinite loops). Refactors update guards, not behaviour contracts.

Write behaviour contracts first. They are the work order.

A test that asserts call counts or mock interactions is mechanical, not purposeful — it should be a labelled implementation guard, not a primary contract test. A test that breaks on a refactor without behaviour change is mechanical too. Fix the test or relabel it.

Bug = missing correct-behaviour test. When fixing a bug, write a test for the correct behaviour first — it must fail (RED) because the bug exists. If it passes immediately, the test is testing the broken behaviour; fix the test, not the code.

Extension Development

Extensions live in src/resources/extensions/. Each extension should:

  • Export a manifest with name, version, tools[], and agents[]
  • Include tests in src/resources/extensions/<name>/tests/
  • Register tools via the extension API

Pull Request Guidelines

  1. Link an issue — PRs without a linked issue will be closed without review
  2. One concern per PR — don't bundle unrelated changes
  3. No drive-by formatting — don't reformat code you didn't touch
  4. CI must pass — fix failing tests before requesting review
  5. Rebase onto main — do not merge main into your feature branch
  6. Use the PR template at .github/PULL_REQUEST_TEMPLATE.md

Environment Setup

Copy docker/.env.example to .env and fill in API keys. At minimum you need one LLM provider key (Anthropic, OpenAI, Google, or OpenRouter).

Architecture Notes

  • State lives on disk in .sf/ — no in-memory state survives across sessions
  • Bundled extensions/agents sync to ~/.sf/agent/ on every launch
  • LLM providers are lazy-loaded on first use to reduce cold-start time
  • Native Rust engine handles grep, glob, ps, highlight, ast, diff

SF Planning State

.sf/ is the canonical home for SF agent state. It contains milestone plans, slice plans, task plans, and ephemeral working files under .sf/milestones/, .sf/STATE.md, .sf/QUEUE.md, and related artifacts.

Promote-only rule: Agent state (the .sf/ directory under ~/.sf/projects/<hash>/) is transient and gitignored — never committed directly. Project state (.sf/ tracked in the repo root) contains only human-authored artifacts such as DECISIONS.md, KNOWLEDGE.md, REQUIREMENTS.md, ROADMAP.md, and STATE.md.

Promoted artifacts — milestone summaries, architecture decision records (ADRs), and durable specifications — belong in tracked documentation directories:

  • docs/plans/ — reviewed implementation plans promoted from .sf/ milestone planning
  • docs/adr/ — accepted architectural decisions promoted from .sf/DECISIONS.md
  • docs/specs/ — long-lived behavior contracts and API specifications

Naming conventions:

  • Milestone IDs: M001, M002, …
  • Slice IDs: S01, S02, …
  • Task IDs: T01, T02, …

Commands:

  • sf plan promote <source> — copy a file from .sf/ to docs/plans/, docs/adr/, or docs/specs/
  • sf plan list — list milestone and slice files in .sf/
  • sf plan diff — compare .sf/ state with promoted docs/ artifacts

See docs/plans/README.md, docs/adr/README.md, and docs/specs/README.md for directory-specific conventions.

Eval Dump Inbox

SF/Pi automatically loads AGENTS.md and CLAUDE.md from the repo tree at startup. It does not automatically load TODO.md, but this repo uses root TODO.md as a temporary human dump inbox for eval and self-evolution ideas.

When a repo contains a root TODO.md, treat it as a temporary dump inbox and read it before planning substantive work in that repo. This applies even when the user does not explicitly mention evals. Treat the Raw Dump Inbox section as untriaged source material, not as durable instructions. Triage it into reviewable artifacts: concrete eval cases, harness gaps, memory extraction requirements, docs, tests, or follow-up implementation tasks. After triage, remove the processed dump notes from TODO.md so the file returns to an empty inbox/template state. Do not treat dumped notes as runtime memory or approved behavior until they are converted into tested, versioned project artifacts.

CI/CD

  • ci.yml — builds, tests, gates merges to main
  • pipeline.yml — three-stage release (dev → test → prod)
  • pr-risk.yml — PR risk classification
  • ai-triage.yml — AI-based issue/PR triage

Code Quality Tooling

The repository uses the following quality tools:

  • Biome — root source linting via npm run lint and autofix via npm run lint:fix
    • Scope: src/ plus versioned JSON checks
    • Config: biome.json
    • Format touched files with npx biome check --write <paths>; full-repo formatting is not the current CI gate.
  • ESLint — web app linting via npm --prefix web run lint
    • Scope: web/
    • Config: web/eslint.config.mjs
  • TypeScript — Strict mode enabled; run npm run typecheck:extensions
  • Knip — Detect unused code and dependencies: npx knip (config at knip.json)
  • jscpd — Detect duplicate code: npx jscpd (config at .jscpd.json)
  • Tech Debt Scannernode scripts/tech-debt-scan.mjs
    • Tracks TODO/FIXME/HACK/XXX counts against thresholds
  • Secret Scannpm run secret-scan (pre-commit hook available via npm run secret-scan:install-hook)
  • Coveragenpm run test:coverage (Vitest V8 coverage with 40/40/20/20 thresholds)

Dev Container

A Dev Container configuration is available at .devcontainer/devcontainer.json. Open the repository in VS Code with the Dev Containers extension, or run:

devcontainer up --workspace-folder .

The container includes Node 24, Rust, GitHub CLI, Docker-in-Docker, and recommended VS Code extensions.

Dependency Updates

Dependabot is configured at .github/dependabot.yml for:

  • Root npm dependencies (weekly, grouped by ecosystem)
  • Web app dependencies (weekly)
  • GitHub Actions (weekly)

Issue Labels

Label definitions are at .github/labels.yml. Apply labels using:

# Create a single label
gh label create priority/P0 --color B60205 --description "Critical — blocks release"

# Or use a label management action in CI