sf snapshot: uncommitted changes after 110m inactivity

This commit is contained in:
Mikael Hugo 2026-05-08 00:17:47 +02:00
parent d05e7164a9
commit 89677b7e9b
102 changed files with 2055 additions and 830 deletions

View file

@ -2,5 +2,7 @@
- SF must not ship or revive an MCP server package or runtime endpoint. SF may consume external MCP servers as a client, but its own tools remain native SF/pi tools.
- Runtime state files under `.sf/` must not become a peer source of truth when SQLite can hold the structured state. JSON, JSONL, and Markdown runtime artifacts are generated evidence, projections, or legacy import inputs.
- Do not design new SF repo state around "maybe no database." Initialized Forge repos always have SQLite; no-DB handling is bootstrap, import, or recovery code.
- Do not add direct `sqlite3 .sf/sf.db` workflows to docs or agent guidance. Database access should go through runtime-owned SF commands, tools, or adapters so schema and validation rules stay centralized.
- Do not commit transient `.sf` runtime directories such as eval outputs, harness scaffolds, milestone workspaces, locks, journals, or migration worktrees. Promote durable decisions and reviewed plans into `docs/`.
- Do not add a second source tree for machine, web, editor, or protocol behavior when the existing axis-owned placement fits. Extend the current surface/protocol/package boundary instead of creating parallel implementations.

View file

@ -1,6 +1,10 @@
# Principles
- SQLite is the canonical structured store for SF runtime state when the data needs schema, ordering, priority, joins, or validation. File artifacts may be generated from the DB or imported once from legacy state, but they should not become competing authorities.
- Durable project knowledge belongs in reviewed, versioned artifacts. Promote plans, specs, and ADRs to `docs/`; keep `.sf` runtime output transient unless a file is explicitly human-authored project guidance.
- SQLite is the canonical structured store for initialized SF repos. Treat `.sf/sf.db` as the first place for planning hierarchy, ordering, priority, gates, ledgers, schedules, and validation-sensitive state; a missing DB is bootstrap/recovery, not a parallel normal mode.
- `.sf` is the working model boundary. Keep operational state, project knowledge, preferences, decisions, requirements, roadmap state, and generated projections there first; promote only reviewed plans, specs, and ADRs to `docs/`.
- Generated docs are human-facing exports and reports. They may change because Git keeps their review history; SF-owned operational history belongs in `.sf`/SQLite when SF needs it for future behavior.
- File artifacts may be generated from the DB or imported once from legacy state, but they should not become competing authorities.
- Native SF/pi tools are the product boundary. Integrations may call external MCP servers as clients, but SF-owned capabilities should not be exposed by an SF MCP server.
- Prioritization should be represented as structured state, not filename order or prose position. Prefer explicit priority/order fields in DB-backed roadmap and task records.
- Forge has one flow engine across surfaces. Source placement should name the axis it implements: `src/resources/extensions/sf/` for the SF flow extension, `src/headless*.ts` for the `sf headless` machine surface command path, `src/cli.ts` and `src/help-text.ts` for CLI/session I/O, `web/` for the web surface, `vscode-extension/` for the editor surface, `packages/rpc-client/` for protocol adapters, and `packages/*` for reusable workspace packages.
- Keep run control and permission profile separate in planning state. Run control is manual, assisted, or autonomous. Permission profile is restricted, normal, trusted, or unrestricted.

View file

@ -1,6 +1,8 @@
# Taste
- Prefer runtime adapters over ad hoc file parsing when reading SF state. For example, query solver eval history through `sf-db.js` helpers rather than reading `.sf/evals/**/report.json`.
- Make DB-backed tools the pleasant path. If a human-readable file mirrors structured state, prefer a tool that mutates the DB and regenerates the file over hand-editing the projection.
- Keep generated artifacts clearly named, ignored, and reproducible. A committed doc should read like reviewed source, not like a cached run output with host-local paths.
- Use precise boundary names in files and symbols. Avoid stale `mcp` names for native workflow tools; reserve MCP wording for client-side integration with external servers.
- Make migrations one-way and observable. Legacy JSON, JSONL, or Markdown should be imported into SQLite with schema/version checks, then left as ignored fallback or removed when the cutover is complete.
- Prefer product terms that reveal the axis: surface, protocol, output format, run control, permission profile. Do not use `headless`, JSON, or autonomous as catch-all words when a narrower term fits.

View file

@ -18,7 +18,7 @@ Treat this as the product contract for all planning and implementation:
1. capture bounded intent
2. translate intent into the eight PDD fields
3. research missing context and name assumptions
4. apply autonomy policy from confidence, risk, reversibility, blast radius, cost, legal/compliance scope, and production/customer impact
4. apply run-control policy from confidence, risk, reversibility, blast radius, cost, legal/compliance scope, and production/customer impact
5. generate milestone/slice/task contracts from structured state
6. write failing tests or executable evidence before implementation
7. implement the smallest code change that satisfies the contract
@ -195,15 +195,32 @@ Copy `docker/.env.example` to `.env` and fill in API keys. At minimum you need o
## SF Planning State
SQLite (`.sf/sf.db`) is the canonical structured store for SF agent state whenever schema, ordering, priority, joins, or validation matter. Runtime files under `.sf/` are working artifacts, generated projections, evidence, or legacy import inputs.
SQLite (`.sf/sf.db`) is the canonical structured store for SF agent state whenever schema, ordering, priority, joins, or validation matter. Runtime files under `.sf/` are working artifacts, generated projections, evidence, or recovery inputs.
**Promote-only rule:** Agent runtime state (`.sf/milestones/`, `.sf/evals/`, `.sf/harness/`, locks, journals, and generated manifests) is transient and gitignored — never committed directly. Project `.sf/` files tracked in the repo root are limited to deliberate human-authored guidance such as `PRINCIPLES.md`, `TASTE.md`, `ANTI-GOALS.md`, `DECISIONS.md`, `KNOWLEDGE.md`, `REQUIREMENTS.md`, and `ROADMAP.md`.
SF keeps the working spec contract in `.sf`, database first. Root-level `SPEC.md`, `BASE_SPEC.md`, product spec files, and `docs/specs/` are human exports, reports, review surfaces, or external evidence, not a competing planning model. SF can read any repo file as source evidence, but information required for SF's own future operation must be analyzed into `.sf`/DB-backed state. New plans must state purpose on every milestone, slice, and task before implementation detail.
SF has one flow engine across TUI, CLI, web, editor, and machine entrypoints.
Keep integration language separated: **surface** means TUI/CLI/web/editor/machine,
**protocol** means ACP/RPC/stdio JSON-RPC/HTTP/wire, **output format** means
text/json/stream-json, **run control** means manual/assisted/autonomous, and
**permission profile** means restricted/normal/trusted/unrestricted.
`sf headless` is the current machine-surface command, not a separate flow and
not a synonym for JSON. See `docs/specs/sf-operating-model.md`.
Source placement follows the same model. `src/resources/extensions/sf/` owns the
SF flow extension, `src/headless*.ts` owns the `sf headless` machine-surface
command path, `web/` owns the browser surface, `vscode-extension/` owns the
editor surface, `packages/rpc-client/` owns reusable RPC adapter code, and
`packages/*` own reusable workspace packages. See
`docs/specs/sf-operating-model.md`.
Promoted artifacts — milestone summaries, architecture decision records (ADRs), and durable specifications — belong in tracked documentation directories:
- `docs/plans/` — reviewed implementation plans promoted from `.sf/` milestone planning
- `docs/adr/` — accepted architectural decisions promoted from `.sf/DECISIONS.md`
- `docs/specs/` — long-lived behavior contracts and API specifications
- `docs/specs/`human-readable behavior/API contract exports and reports
**Naming conventions:**
- Milestone IDs: `M001`, `M002`, …
@ -214,6 +231,7 @@ Promoted artifacts — milestone summaries, architecture decision records (ADRs)
- `sf plan promote <source>` — copy a file from `.sf/` to `docs/plans/`, `docs/adr/`, or `docs/specs/`
- `sf plan list` — list active milestone and slice records/artifacts
- `sf plan diff` — compare runtime planning state with promoted `docs/` artifacts
- `sf plan specs generate|diff|check` — regenerate or verify human `docs/specs/` exports from `.sf` state
See [`docs/plans/README.md`](docs/plans/README.md), [`docs/adr/README.md`](docs/adr/README.md), and [`docs/specs/README.md`](docs/specs/README.md) for directory-specific conventions.

View file

@ -65,7 +65,7 @@ One command. Walk away. Come back to a built project with clean git history.
- **Full OAuth login URLs** — OAuth login URLs are now displayed in full instead of being truncated.
- **MiniMax bearer auth** — MiniMax Anthropic API requests use proper bearer authentication.
- **Case-insensitive tool rendering** — renderable tool matching is now case-insensitive, fixing missed tool output.
- **Headless idle timeout** — idle timeout is kept off during interactive tool execution in headless mode.
- **Machine-surface idle timeout** — idle timeout is kept off during interactive tool execution in `sf headless`.
### Reliability & Internals
@ -243,9 +243,9 @@ Autonomous mode is governed by the Unified Operation Kernel (UOK), not by the LL
2. **Context pre-loading** — The dispatch prompt includes inlined task plans, slice plans, prior task summaries, dependency summaries, roadmap excerpts, and decisions register. The LLM starts with everything it needs instead of spending tool calls reading files.
3. **Git isolation** — When `git.isolation` is set to `worktree` or `branch`, each milestone runs on its own `milestone/<MID>` branch (in a worktree or in-place). All slice work commits sequentially — no branch switching, no merge conflicts. When the milestone completes, it's squash-merged to main as one clean commit. The default is `none` (work on the current branch), configurable via preferences.
3. **Git isolation** — When `git.isolation` is set to `worktree` or `branch`, each milestone runs on its own `milestone/<MID>` branch (in a worktree or in-place). All slice work commits sequentially — no branch switching, no merge conflicts. When the milestone completes, it's squash-merged to main as one clean commit. The default is `worktree`, configurable via preferences.
4. **Crash recovery** — A lock file tracks the current unit. If the session dies, the next `/sf autonomous` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context. Parallel orchestrator state is persisted to disk with PID liveness detection, so multi-worker sessions survive crashes too. In headless mode, crashes trigger automatic restart with exponential backoff (default 3 attempts).
4. **Crash recovery** — A lock file tracks the current unit. If the session dies, the next `/sf autonomous` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context. Parallel orchestrator state is persisted to disk with PID liveness detection, so multi-worker sessions survive crashes too. Through the machine surface, crashes trigger automatic restart with exponential backoff (default 3 attempts).
5. **Provider error recovery** — Transient provider errors (rate limits, 500/503 server errors, overloaded) auto-resume after a delay. Permanent errors (auth, billing) pause for manual review. The model fallback chain retries transient network errors before switching models.
@ -339,13 +339,16 @@ sf
Both terminals read and write the same `.sf/` files on disk. Your decisions in terminal 2 are picked up automatically at the next phase boundary — no need to stop autonomous mode.
### Headless mode — CI and scripts
### Machine surface — CI and scripts
`sf headless` runs any `/sf` command without a TUI. Designed for CI pipelines, cron jobs, and scripted automation.
`sf headless` is the current command for SF's machine surface: it runs the same
SF flow as the TUI, but without rendering the TUI. It is designed for CI
pipelines, cron jobs, parent processes, and scripted automation. Headless is a
surface, not run control, not a permission profile, and not an output format.
```bash
# Run autonomous mode in CI
sf headless --timeout 600000
sf headless --timeout 600000 autonomous
# Create and execute a milestone end-to-end
sf headless new-milestone --context spec.md --auto
@ -356,13 +359,32 @@ sf headless next
# Instant JSON snapshot (no LLM, ~50ms)
sf headless query
# Stream structured events as JSONL
sf headless --output-format stream-json autonomous
# Force a specific pipeline phase
sf headless dispatch plan
```
Headless auto-responds to interactive prompts, detects completion, and exits with structured codes: `0` complete, `1` error/timeout, `2` blocked. Auto-restarts on crash with exponential backoff. Use `sf headless query` for instant, machine-readable state inspection — returns phase, next dispatch preview, and parallel worker costs as a single JSON object without spawning an LLM session. Pair with [remote questions](./docs/user-docs/remote-questions.md) to route decisions to Slack or Discord when human input is needed.
The machine surface handles prompts according to the configured run control and
permission profile, detects completion, and exits with structured codes:
`0` complete, `1` error/timeout, `10` blocked, `11` cancelled, and `12` reload.
Auto-restarts on crash with
exponential backoff. Use `sf headless query` for instant, machine-readable state
inspection — returns phase, next dispatch preview, and parallel worker costs as
a single JSON object without spawning an LLM session. Use `--output-format json`
for one batch result object, `--output-format stream-json` for event JSONL, and
the default text output for human logs. Pair with [remote questions](./docs/user-docs/remote-questions.md) to route decisions to Slack or Discord when human input is needed.
**Multi-session orchestration** — headless mode supports file-based IPC in `.sf/parallel/` for coordinating multiple SF workers across milestones. Build orchestrators that spawn, monitor, and budget-cap a fleet of SF workers.
**Multi-session orchestration** — the machine surface supports file-based IPC in `.sf/parallel/` for coordinating multiple SF workers across milestones. Build orchestrators that spawn, monitor, and budget-cap a fleet of SF workers.
**Terminology:** SF has one flow engine. TUI, CLI, web, editor adapters, and the
machine surface are entrypoints around that flow. ACP/RPC/stdio/HTTP are
protocols. `text`, `json`, and `stream-json` are output formats. Manual,
assisted, and autonomous are run-control modes. Restricted, normal, trusted,
and unrestricted are permission profiles. See
[SF operating model](./docs/specs/sf-operating-model.md), a generated human
export from `.sf` working state and source evidence.
### First launch
@ -404,8 +426,8 @@ On first run, SF launches a branded setup wizard that walks you through LLM prov
| `Alt+V` | Paste clipboard image (macOS) |
| `sf config` | Re-run the setup wizard (LLM provider + tool keys) |
| `sf update` | Update SF to the latest version |
| `sf headless [cmd]` | Run `/sf` commands without TUI (CI, cron, scripts) |
| `sf headless query` | Instant JSON snapshot — state, next dispatch, costs (no LLM) |
| `sf headless [cmd]` | Machine surface for `/sf` commands (CI, cron, scripts) |
| `sf headless query` | Instant machine snapshot — JSON state, next dispatch, costs (no LLM) |
| `sf --continue` (`-c`) | Resume the most recent session for the current directory |
| `sf --worktree` (`-w`) | Launch an isolated worktree session for the active milestone |
| `sf sessions` | Interactive session picker — browse and resume any saved session |
@ -433,9 +455,16 @@ Every dispatch is carefully constructed. The LLM never wastes tool calls on orie
| `T01-SUMMARY.md` | What happened — YAML frontmatter + narrative |
| `S01-UAT.md` | Human test script derived from slice outcomes |
SF's working spec/state model is `.sf`-native. If an inherited repo has
`SPEC.md`, `BASE_SPEC.md`, or product spec docs, SF treats them as external
evidence and projects useful facts into `.sf/PROJECT.md`, `.sf/REQUIREMENTS.md`,
milestones, slices, tasks, decisions, and evidence. New work should not create
a second root-level spec system. Every milestone, slice, and task plan starts
with its purpose before implementation details.
### Git Strategy
Branch-per-slice with squash merge. Fully automated.
Branch-per-milestone with sequential task commits and squash merge. Fully automated.
```
main:
@ -444,7 +473,7 @@ main:
feat(M001/S02): API endpoints and middleware
feat(M001/S01): data model and type system
sf/M001/S01 (deleted after merge):
milestone/M001 (deleted after merge):
feat(S01/T03): file writer with round-trip fidelity
feat(S01/T02): markdown parser for plan files
feat(S01/T01): core types and interfaces
@ -528,7 +557,7 @@ auto_report: true
| `skill_rules` | Situational rules for skill routing |
| `skill_staleness_days` | Skills unused for N days get deprioritized (default: 60, 0 = disabled) |
| `unique_milestone_ids` | Uses unique milestone names to avoid clashes when working in teams of people |
| `git.isolation` | `none` (default), `worktree`, or `branch` — enable worktree or branch isolation for milestone work |
| `git.isolation` | `worktree` (default), `branch`, or `none` — enable worktree or branch isolation for milestone work |
| `git.manage_gitignore` | Set `false` to prevent SF from modifying `.gitignore` |
| `verification_commands`| Array of shell commands to run after task execution (e.g., `["npm run lint", "npm run test"]`) |
| `verification_auto_fix`| Auto-retry on verification failures (default: true) |
@ -702,7 +731,7 @@ sf (CLI binary)
- **`pkg/` shim directory** — `PI_PACKAGE_DIR` points here (not project root) to avoid Pi's theme resolution collision with our `src/` directory. Contains only `piConfig` and theme assets.
- **Two-file loader pattern**`loader.ts` sets all env vars with zero SDK imports, then dynamic-imports `cli.ts` which does static SDK imports. This ensures `PI_PACKAGE_DIR` is set before any SDK code evaluates.
- **Always-overwrite sync**`npm update -g` takes effect immediately. Bundled extensions and agents are synced to `~/.sf/agent/` on every launch, not just first run.
- **State lives on disk**`.sf/sf.db` is the structured source of truth for migrated runtime state, including planning hierarchy, ordering, validation, gates, UOK lifecycle, backlog, and schedule rows. Markdown/JSON files under `.sf/` are human views, generated projections, evidence, or explicit legacy import/recovery inputs. No in-memory state survives across sessions.
- **State lives on disk**`.sf/sf.db` is the structured source of truth for runtime state, including planning hierarchy, ordering, validation, gates, UOK lifecycle, backlog, and schedule rows. Markdown/JSON files under `.sf/` are human views, generated projections, evidence, or explicit recovery inputs. No in-memory state survives across sessions.
---

View file

@ -44,7 +44,7 @@ export SF_HOME=/opt/sf
# Disable RTK compression
export SF_RTK_DISABLED=1
# Enable headless mode with prompt tracing
# Enable the machine surface with prompt tracing
export SF_HEADLESS=1
export SF_HEADLESS_PROMPT_TRACE=1
```
@ -93,8 +93,8 @@ All debug flags are **0 or 1** (disabled or enabled):
- `SF_DEBUG` — Enable verbose logging
- `SF_DEBUG_EXTENSIONS` — Enable extension debug logging
- `SF_TRACE_ENABLED` — Collect execution traces
- `SF_HEADLESS` — Suppress TUI, use stdio only
- `SF_HEADLESS_PROMPT_TRACE` — Trace prompts in headless mode
- `SF_HEADLESS` — Suppress TUI for the machine surface, use stdio only
- `SF_HEADLESS_PROMPT_TRACE` — Trace prompts in the machine surface
- `SF_STARTUP_TIMING` — Measure cold-start latency
- `SF_SHOW_TOKEN_COST` — Show LLM token costs
- `SF_FIRST_RUN_BANNER` — Show first-run welcome

View file

@ -13,7 +13,7 @@ Guides for installing, configuring, and using SF day-to-day. Located in [`user-d
| [Getting Started](./user-docs/getting-started.md) | Installation, first run, and basic usage |
| [Autonomous Mode](./user-docs/auto-mode.md) | How autonomous execution works — the state machine, crash recovery, and steering |
| [Commands Reference](./user-docs/commands.md) | All commands, keyboard shortcuts, and CLI flags |
| [Remote Questions](./user-docs/remote-questions.md) | Discord and Slack integration for headless autonomous-mode |
| [Remote Questions](./user-docs/remote-questions.md) | Discord and Slack delivery for run-control-gated questions |
| [Configuration](./user-docs/configuration.md) | Preferences, model selection, git settings, and token profiles |
| [Provider Setup](./user-docs/providers.md) | Step-by-step setup for OpenRouter, Ollama, LM Studio, vLLM, and all supported providers |
| [Custom Models](./user-docs/custom-models.md) | Advanced model configuration — models.json schema, compat flags, overrides |

View file

@ -1,6 +1,10 @@
# Reliability
## Exit Codes (headless mode)
## Exit Codes (machine surface)
`sf headless` is the current machine-surface command. These codes describe the
non-interactive runner and are independent from output format: text, one JSON
result, and streaming JSONL use the same completion semantics.
| Code | Meaning |
|------|---------|
@ -22,7 +26,7 @@
4. If the session JSONL is unreadable, fall back to starting the unit fresh
### Timeout
**Detection:** Headless parent receives no heartbeat within `HEADLESS_HEARTBEAT_INTERVAL_MS` (60 000 ms), or the unit wall-clock exceeds the configured timeout.
**Detection:** Machine-surface parent receives no heartbeat within `HEADLESS_HEARTBEAT_INTERVAL_MS` (60 000 ms), or the unit wall-clock exceeds the configured timeout.
**Recovery path:** `auto-timeout-recovery.ts` writes a timeout summary, marks the unit `needs_fix`, and advances the loop. The parent exits with code 1 unless `--max-restarts` allows a retry.
@ -46,7 +50,7 @@
**Recovery path:** Auto-mode pauses; user must explicitly approve resumption. The current unit is not retried.
## Restart Loop (headless daemon mode)
## Restart Loop (machine surface)
`sf headless autonomous --max-restarts 3` applies exponential backoff: 5 s → 10 s → 30 s (cap). After exhausting restarts the parent exits with code 1. Each restart resumes via crash recovery above.
@ -57,7 +61,7 @@
| Structured trace | `.sf/traces/trace-<timestamp>.json` — full session span tree with tokens, cost, duration |
| Event audit log | `.sf/event-log.jsonl` — every unit completion, tool call, decision save (v2 format) |
| Desktop notifications | OS-native; configurable via preferences (`notifications.*`) |
| Stderr progress | All headless output goes to stderr; stdout carries JSON result when `--output-format json` |
| Stderr progress | Human-readable machine-surface progress goes to stderr; stdout carries the batch JSON result for `--output-format json` or JSONL events for `--output-format stream-json` |
| Heartbeat | Emitted every 60 s to detect hung parent/child communication |
## Release Checks

View file

@ -50,7 +50,7 @@ Write purposeful tests first. They are the spec. A different implementation that
## Decomposition Path
`Vision (SPEC.md / VISION.md) → Milestone → Slice → Task → contract test → code → evidence`
`.sf working model + DB roadmap → Milestone → Slice → Task → contract test → code → evidence`
Reject: `prompt → files → hope`.
@ -79,7 +79,7 @@ Treat the contract as a **falsifiable hypothesis**: name the evidence that would
Before any plan:
- Where does this sit in `SPEC.md` / `VISION.md` / `REQUIREMENTS.md`?
- Where does this sit in `.sf/PROJECT.md`, `.sf/REQUIREMENTS.md`, `.sf/DECISIONS.md`, or DB-backed roadmap state?
- Why is it useful, who needs it, what does it enable?
- What breaks if wrong, what is out of scope?
@ -227,7 +227,7 @@ When a test fails against existing code, fix the code. The test told you what wa
sf modifies its own codebase via the auto-loop. Without a protected zone, constitutional drift is silent.
**Protected files (human approval required):**
`SPEC.md`, `BUILD_PLAN.md`, `UPSTREAM_PORT_GUIDE.md`, `AGENTS.md`, `CLAUDE.md`, `CONTRIBUTING.md`, `docs/SPEC_FIRST_TDD.md`, every `docs/dev/ADR-*.md`.
`.sf/PRINCIPLES.md`, `.sf/TASTE.md`, `.sf/ANTI-GOALS.md`, `.sf/REQUIREMENTS.md`, `.sf/DECISIONS.md`, `BUILD_PLAN.md`, `UPSTREAM_PORT_GUIDE.md`, `AGENTS.md`, `CLAUDE.md`, `CONTRIBUTING.md`, `docs/SPEC_FIRST_TDD.md`, every `docs/dev/ADR-*.md`.
Autonomous agents may propose changes but must not merge to these without human review.
@ -249,7 +249,7 @@ Persist learning: when a unit produces a gotcha or anti-pattern, write to sf's m
| Dependency down | Behaviour |
|---|---|
| Native engine (`forge_engine.node`) | Fall back to JS implementations; log degraded mode. Never silently proceed without confirming fallback path is wired. |
| `node:sqlite` and `better-sqlite3` both unavailable | Filesystem-derived state (no DB); log degraded discovery. Block any operation that requires durable state. |
| `node:sqlite` and `better-sqlite3` both unavailable | Block DB-owned operations; there is no normal no-DB planning mode. Read files only as human evidence. |
| LLM provider | Try next allowed provider per `~/.sf/preferences.md`; if exhausted, halt unit with `ErrModelUnavailable` (no silent skip). |
| SOPS unavailable | Use already-exported env vars; log that secret refresh is unavailable. Block secret-touching commands. |
@ -266,7 +266,7 @@ If a task cannot be described this way, it is underspecified.
## See Also
- [`AGENTS.md`](../AGENTS.md) — repo guidelines, build/test/lint commands.
- [`SPEC.md`](../SPEC.md) — sf v3 specification (what we're building).
- [`docs/specs/sf-operating-model.md`](./specs/sf-operating-model.md) — generated operating-model export for human review.
- [`UPSTREAM_PORT_GUIDE.md`](../UPSTREAM_PORT_GUIDE.md) — porting from pi-mono legacy port.
- [`src/resources/extensions/sf/skills/advisory-partner/SKILL.md`](../src/resources/extensions/sf/skills/advisory-partner/SKILL.md) — adversarial review framework.
- [`src/resources/extensions/sf/skills/code-review/SKILL.md`](../src/resources/extensions/sf/skills/code-review/SKILL.md) — multi-lens review skill.

View file

@ -7,13 +7,13 @@
## Context
SF has enough moving parts that it can be mistaken for a generic coding agent: a TUI,
headless mode, autonomous mode, model routing, memory, Sift, doctor, milestones,
machine surface, autonomous mode, model routing, memory, Sift, doctor, milestones,
slices, workers, and generated project state. That framing is too weak. A generic
coding agent can still accept vague intent, write code early, and call the result done
because tests or lint happen to pass.
SF's stronger product shape is: take a bounded intent, turn it into a falsifiable
purpose contract, research missing context, decide whether autonomy is allowed, then
purpose contract, research missing context, decide whether autonomous run control is allowed, then
generate tests and implementation work from that contract.
The eight PDD fields are the purpose gate:
@ -40,21 +40,21 @@ The canonical pipeline is:
1. Capture bounded intent.
2. Translate intent into PDD fields.
3. Research missing context and mark unresolved assumptions.
4. Apply an autonomy policy based on confidence, risk, reversibility, blast radius,
4. Apply a run-control policy based on confidence, risk, reversibility, blast radius,
cost, legal/compliance scope, and production/customer impact.
5. Generate milestone, slice, task, and artifact contracts from structured state.
6. Write failing tests or executable evidence before implementation.
7. Implement the smallest code change that satisfies the contract.
8. Verify, record evidence, retain useful memory, and continue.
Structured state is authoritative. Markdown is a projection for humans, reviews, and
promotion into `docs/`. Runtime planning state belongs in `.sf`/`sf.db`; durable
architecture decisions and specifications are promoted into tracked `docs/adr/`,
Structured state is authoritative. Markdown is a projection for humans, reviews,
reports, and git history. Runtime planning state belongs in `.sf`/`sf.db`;
durable human-facing exports are promoted into tracked `docs/adr/`,
`docs/specs/`, and `docs/plans/`.
TUI, headless, slash commands, autonomous mode, workers, and future frontends are
different interaction surfaces over the same planner/executor contract. They must not
invent separate planning semantics.
TUI, CLI, web, editor integrations, machine automation, workers, and future frontends
are different surfaces over the same planner/executor contract. Protocols and output
formats must not invent separate planning semantics.
## Enforcement
@ -67,7 +67,7 @@ SF must prefer enforcement over recommendation:
falsifier are missing.
- Research claims are cited, linked to repo evidence, or explicitly marked as
assumptions.
- Autonomy proceeds only when the configured policy allows it; otherwise SF researches
- Run control proceeds only when the configured policy allows it; otherwise SF researches
more, parks the work, or asks for a human decision.
- Memory stores facts, decisions, failures, and falsifiers that improve future
decisions. It must not become unverified lore.

View file

@ -31,7 +31,7 @@ The SF schedule system is **pull-based**:
- There is no background daemon or timer process.
- Entries are queried ("pulled") at defined integration points:
1. **Launch**`loader.ts` calls `findDue()` and prints a banner if items are overdue
2. **Auto-mode boundaries**`sf headless query` populates a `schedule` field with `due` and `upcoming` entries
2. **Auto-mode boundaries**`sf headless query` populates a machine snapshot `schedule` field with `due` and `upcoming` entries
3. **CLI**`sf schedule list --due` for explicit human query
4. **TUI status overlay** — displays due/upcoming schedule entries in the dashboard

View file

@ -144,7 +144,7 @@ export { main } from './main.js'
|---|---|
| `modes/interactive/` | Full TUI interactive mode (~30 component files) |
| `modes/rpc/` | RPC server, client, JSON protocol, remote terminal |
| `modes/print/` | Print/headless mode |
| `modes/print/` | Print and machine-surface output |
| `modes/shared/` | Shared mode utilities and UI context setup |
| `cli/args.ts` | CLI argument parsing |
| `cli/config-selector.ts` | Config directory selection |

View file

@ -24,8 +24,8 @@ Round-robin debate mode is implemented on the existing `subagent` tool as
participant the prior rounds' transcript and keeps the parent as synthesiser.
Full inbox-based swarm chat remains deferred until the persistent-agent layer
(`agents`, `agent_messages`, `agent_inbox`, `send_message` tool) lands. That
machinery is still shared with SPEC §17-18 and should not be rebuilt inside the
(`agents`, `agent_messages`, `agent_inbox`, `send_message` tool) lands in
current `.sf`/DB-backed state. That machinery should not be rebuilt inside the
ephemeral subagent extension.
## Alternatives Considered
@ -64,7 +64,7 @@ Subagents share a JSON scratchpad written between turns. Each subagent reads wha
### Option C — Ephemeral swarm with inbox (long-term target)
Reuse the persistent-agent infrastructure from `SPEC.md` §1718 (`agent_inbox`, `agent_messages`, `send_message` tool — currently NEW, not implemented) but scope each ephemeral swarm by `swarm_id` with a TTL. Swarm agents can `send_message` to each other freely during the task; on `synthesize()`, the swarm's rows get archived.
Reuse the future persistent-agent infrastructure once it exists in current `.sf`/DB-backed state, but scope each ephemeral swarm by `swarm_id` with a TTL. Swarm agents can `send_message` to each other freely during the task; on `synthesize()`, the swarm's rows get archived.
```
swarm({
@ -108,7 +108,7 @@ swarm({
## Out of Scope
- **Persistent inter-agent messaging across runs**covered by SPEC §1718 (`agent_inbox`, `agent_messages`); orthogonal to ephemeral swarms.
- **Persistent inter-agent messaging across runs**belongs in current `.sf`/DB-backed persistent-agent state when that layer exists; orthogonal to ephemeral swarms.
- **Cross-session swarm replay** — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
- **Human-in-the-loop debate** — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing `discuss` flow).
@ -131,12 +131,11 @@ swarm({
|---|---|
| Now | Option A is available as bounded debate mode on `subagent`. |
| Six months of Option A in production | Decide whether full swarm-chat with inbox is worth the build. |
| Persistent-agent layer scoped (`SPEC.md` §17 NEW → IN PROGRESS) | Revisit Option C because inbox/message persistence machinery will exist. |
| Persistent-agent layer projected into `.sf`/DB state | Revisit Option C when inbox/message persistence exists as current runtime state. |
## References
- `docs/SPEC.md` §17 (Persistent Agents) — defines `agents`, `agent_memory_blocks`, `agent_messages`, `agent_inbox`.
- `docs/SPEC.md` §18 (Inter-Agent Messaging) — defines `send_message` tool. Currently NEW (not implemented).
- Older persistent-agent and inter-agent messaging SPEC notes — external design evidence only; project accepted facts into `.sf`/DB-backed state before treating them as operational.
- `src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md` — current single/parallel/debate/chain guidance.
- `src/resources/extensions/sf/skills/advisory-partner/SKILL.md` — primary consumer of adversarial dispatch today.
- `src/resources/extensions/sf/prompts/gate-evaluate.md` — pre-execution Q3/Q4 gates.

View file

@ -19,7 +19,7 @@ This ADR maps the federation surfaces, takes a position on each, and sequences t
### Surface 1 — Knowledge (anti-patterns, learnings, contracts)
**Status:** designed in `SPEC.md` §16 — Singularity Memory (`sm`) is the explicit cross-instance knowledge layer. HTTP API holding memories, learnings, anti-patterns. Runs embedded (single-user sf) or remote (shared service on tailnet, reachable from sf, Hermes, OpenClaw, Claude Code, Cursor through their own client integrations).
**Status:** captured in older SPEC notes as §16; treat that as external design evidence, not current operational authority. The current SF working model must project accepted federation facts into `.sf`/DB-backed state. Singularity Memory (`sm`) is the proposed cross-instance knowledge layer: an HTTP API holding memories, learnings, and anti-patterns.
**Code reality:** not yet wired. `src/resources/extensions/sf/memory-store.ts` and `memory-extractor.ts` write to a local SQLite `memories` table. The spec's "remote-mode" isn't connected.
@ -44,7 +44,7 @@ The pain ceiling is bounded today (per-instance discovery is at worst a few wast
### Surface 4 — Federated persistent agents
**Status:** not designed. `SPEC.md` §17 (NEW) introduces persistent agents, but scopes them to a single project's DB.
**Status:** not designed. Older SPEC notes sketched persistent agents scoped to a single project's DB; those notes are evidence only until projected into current `.sf`/DB state.
**Decision:** **defer.** Per-instance for v3. If Mikki has a "code-reviewer" persistent agent, it lives in Mikki's DB. Federation requires:
@ -57,7 +57,7 @@ None of this earns its keep until we have a concrete use case where one agent sh
### Surface 5 — Distributed execution (clarifying note, not federation)
**Status:** spec'd in `SPEC.md` §22 (NEW); not built. SSH workers — one daemon dispatches units to remote worker hosts.
**Status:** captured in older SPEC notes; not built. SSH workers mean one daemon dispatches units to remote worker hosts.
**Decision:** **clarify that this is NOT federation.** Distributed execution = one daemon owns many workers (parallel scaling). Federation = many daemons share state across hosts (knowledge sharing). Different problems. The spec already separates them; this ADR just affirms the line.
@ -78,7 +78,7 @@ None of this earns its keep until we have a concrete use case where one agent sh
**Risks and mitigations**
- *Risk:* Singularity Memory becomes a bottleneck — sf can't dispatch when memory is down.
- *Mitigation:* sf MUST treat memory as best-effort. A memory-fetch failure logs degraded-mode and proceeds with empty recall. Local SQLite stays as the authoritative scheduler state (per `SPEC.md` §3).
- *Mitigation:* sf MUST treat memory as best-effort. A memory-fetch failure logs degraded-mode and proceeds with empty recall. Local SQLite stays as the authoritative scheduler state.
- *Risk:* federated benchmarks make sf overconfident in stale data.
- *Mitigation:* every benchmark observation carries `recorded_at` and `host`. Consumers weight by recency and reject stale data older than `circuit_breaker_resets_at + N`.
- *Risk:* cross-instance attacker plants poisoned anti-patterns to steer agent behaviour.
@ -95,16 +95,14 @@ None of this earns its keep until we have a concrete use case where one agent sh
| When | Action |
|---|---|
| Tier 1+ (next 13 months) | Wire Singularity Memory remote-mode in `memory-store.ts`. Provider chain fallback: remote → embedded → local-only. Update `SPEC.md` §16 status from PARTIAL to EXISTS once landed. |
| Tier 1+ (next 13 months) | Wire Singularity Memory remote-mode in `memory-store.ts`. Provider chain fallback: remote → embedded → local-only. Promote accepted runtime requirements into `.sf`/DB-backed state once landed. |
| After Singularity Memory in production for 1+ month | Decide whether to ride it for benchmarks (Surface 2) or build a separate service. Decision driven by observed cost of duplicated benchmark discovery. |
| If/when concrete cross-instance agent pain shows up | Reopen Surface 4 (federated persistent agents). Don't pre-build. |
| Never in sf | Surface 3 (cross-repo unit deps) — that's a separate product. |
## References
- `docs/SPEC.md` §16 — Singularity Memory (Knowledge Layer).
- `docs/SPEC.md` §1718 — Persistent agents and inter-agent messaging (single-instance scope).
- `docs/SPEC.md` §22 — Distributed Execution (SSH workers — *not* federation).
- Older SPEC notes for Singularity Memory, persistent agents, inter-agent messaging, and distributed execution — external design evidence only; project accepted facts into `.sf`/DB-backed state before treating them as operational.
- `src/resources/extensions/sf/memory-store.ts` — current local-only memory store.
- `packages/daemon/src/daemon.ts` — single-host daemon process.
- `docs/dev/ADR-011-swarm-chat-and-debate-mode.md` — related: ephemeral swarms within a single instance.

View file

@ -7,8 +7,8 @@
sf today runs as a single daemon per host. Three forces push it toward a multi-host topology:
- **SSH workers** (`SPEC.md` §22, NEW): the orchestrator dispatches unit attempts to remote hosts (GPU, Windows, parallel scaling) — needs an SSH-served worker process.
- **Singularity Memory remote-mode** (ADR-012, ADR-014, sf SPEC §16): the cross-instance knowledge layer runs as a service on the tailnet, reachable from sf, Hermes, OpenClaw, Claude Code, Cursor.
- **SSH workers**: the orchestrator dispatches unit attempts to remote hosts (GPU, Windows, parallel scaling) — needs an SSH-served worker process.
- **Singularity Memory remote-mode** (ADR-012, ADR-014): the cross-instance knowledge layer runs as a service on the tailnet, reachable from SF and other clients.
- **Multi-instance federation** (ADR-012): future federated agents and benchmarks ride the same network substrate.
This ADR fixes the network and SSH-execution layer the above all depend on.
@ -53,8 +53,8 @@ This ADR fixes the network and SSH-execution layer the above all depend on.
**Risks and mitigations**
- *Risk:* SSH disconnect mid-turn produces zombie agent processes (SPEC §22.3).
- *Mitigation:* spec-mandated remote-cleanup script on disconnect; `--sf-run-id=<id>` marker on the agent process for `pgrep` / `kill`.
- *Risk:* SSH disconnect mid-turn produces zombie agent processes.
- *Mitigation:* worker cleanup script on disconnect; `--sf-run-id=<id>` marker on the agent process for `pgrep` / `kill`.
- *Risk:* `wish` API churn pre-1.0.
- *Mitigation:* pin a version; planned upgrade window once per quarter.
- *Risk:* `xpty` / `conpty` edge cases on niche shells.
@ -90,11 +90,11 @@ This ADR fixes the network and SSH-execution layer the above all depend on.
└── /healthz, /readyz simple HTTP for orchestrator health checks
```
The worker is **stateless** — claim, lease, retry, persistence are all the orchestrator's job (per SPEC §22). The worker just executes one attempt at a time and streams output.
The worker is **stateless** — claim, lease, retry, persistence are all the orchestrator's job. Older SPEC notes captured this as distributed-execution evidence; current implementation must persist accepted requirements through `.sf`/DB-backed state. The worker just executes one attempt at a time and streams output.
## References
- `SPEC.md` §22 — Distributed Execution.
- Older distributed-execution SPEC notes — external design evidence only; project accepted facts into `.sf`/DB-backed state before treating them as operational.
- `ADR-012` — Multi-instance federation (this ADR provides the substrate).
- `ADR-014` — Singularity Knowledge + Agent Platform (deploys onto this substrate).
- `ADR-016` — Charm AI stack adoption strategy (frames why Go for new services).

View file

@ -6,7 +6,7 @@
## Context
`SPEC.md` §16 defines a cross-instance knowledge layer (Singularity Memory). `SPEC.md` §1718 defines persistent agents and inter-agent messaging (status NEW). sf instances today carry their own local memory store (`memory-store.ts`); persistent agents are not implemented at all.
Older SPEC notes define a cross-instance knowledge layer (Singularity Memory) and sketch persistent agents plus inter-agent messaging. Treat those notes as historical input, not current SF source of truth. sf instances today carry their own local memory store (`memory-store.ts`); persistent agents are not implemented at all.
Two trajectories converge:
@ -36,7 +36,7 @@ The implementation arm of this ADR lives in [`singularity-memory/MIGRATION.md`](
### Stack
- **Stay Python + FastAPI + Postgres.** Status quo. Works today.
- *Rejected:* misses the foundation bet for central persistent agents (sf SPEC §17). Building those on Python + raw OpenAI/Anthropic SDK calls means retrofitting fantasy-style agent semantics later — real refactor cost. The trigger to migrate isn't pain in the current server; it's foundation laying for what comes next.
- *Rejected:* misses the foundation bet for central persistent agents from the older SPEC notes. Building those on Python + raw OpenAI/Anthropic SDK calls means retrofitting fantasy-style agent semantics later — real refactor cost. The trigger to migrate isn't pain in the current server; it's foundation laying for what comes next.
- **Rust + axum + Postgres.** Uniformly fast, but Charm's agentic ecosystem (fantasy, catwalk, wish, charm-server, the entire Bubble Tea family) is Go-native. Rust on the server side would mean reimplementing those abstractions or shelling out. Rejected — wrong ecosystem.
- **TypeScript + Node + Postgres.** Keeps language alignment with sf core. But sf is moving toward parallel-build (ADR-016): TS in sf core, Go in new services. The Node ecosystem doesn't have an equivalent to fantasy + charm-server + Wish. Rejected.
@ -108,17 +108,17 @@ Phase 4 was originally planned as a "central persistent-agent runtime" built on
> *The content below is the original Phase 4 design, preserved as a historical record. It is **not** the current plan.*
The original Phase 4 called for singularity-memory's Go server to host a central persistent-agent runtime using `charmbracelet/fantasy`. Long-lived cross-project agents (code-reviewer, memory-curator, security-auditor, build-watch) would run there, with their state managed by the same Postgres store. This depended on sf SPEC §17 scoping being completed ("status NEW" at ADR-014's writing date).
The original Phase 4 called for singularity-memory's Go server to host a central persistent-agent runtime using `charmbracelet/fantasy`. Long-lived cross-project agents (code-reviewer, memory-curator, security-auditor, build-watch) would run there, with their state managed by the same Postgres store. This depended on the older persistent-agent notes being fully scoped ("status NEW" at ADR-014's writing date).
The rationale for building this in singularity-memory was ecosystem alignment with `fantasy` + `charm-server` + `wish` and avoiding per-project agent redundancy. The timeline was listed as "variable" because SPEC §17 had not been fully scoped.
The rationale for building this in singularity-memory was ecosystem alignment with `fantasy` + `charm-server` + `wish` and avoiding per-project agent redundancy. The timeline was listed as "variable" because persistent-agent scope had not been fully defined.
ADR-019 made this moot by choosing a cleaner isolation model (hypervisor-level VM snapshots) that is language-agnostic inside the VM, multi-tenant by construction, and owned by ACE rather than a shared Go server.
## References
- `MIGRATION.md` (singularity-memory repo) — implementation arm.
- `SPEC.md` §16 — Knowledge Layer.
- `SPEC.md` §1718 — Persistent Agents and Inter-Agent Messaging.
- Older SPEC notes §16 — Knowledge Layer historical input.
- Older SPEC notes §1718 — Persistent Agents and Inter-Agent Messaging historical input.
- `ADR-012` — Multi-instance federation (this is one of its surfaces).
- `ADR-013` — Network and remote-execution (deployment substrate).
- `ADR-016` — Charm AI stack adoption (frames the polyglot decision).

View file

@ -5,7 +5,7 @@
## Context
`SPEC.md` §1 retargeted sf v3 from Go-on-Crush to TypeScript-on-pi-mono. That decision still stands for sf core. But over the past year the Charm ecosystem has matured to a point that *adjacent* services would be better served by it:
Older SPEC notes retargeted sf v3 from Go-on-Crush to TypeScript-on-pi-mono. That decision still stands for sf core. But over the past year the Charm ecosystem has matured to a point that *adjacent* services would be better served by it:
- **`fantasy`** (730 stars, pushed today) — multi-provider AI agent SDK in Go. Equivalent of `pi-ai`. Wasn't this complete at retarget time.
- **`catwalk`** (688 stars) — provider/model registry used by `crush`.
@ -17,13 +17,13 @@
- **`pony` + `ultraviolet`** — next-gen declarative TUI markup. Pre-1.0 / experimental.
- **`anthropic-sdk-go`, `openai-go`, `go-genai`** — Charm-maintained Go LLM SDKs.
The question the SPEC retarget didn't have to answer: **now that this much is here, do we migrate?**
The question the historical retarget didn't have to answer: **now that this much is here, do we migrate?**
## Decision
**Option A — Parallel build, no core migration.**
- **sf core (TypeScript on pi-mono): unchanged.** SPEC §1 retarget rationale stands. Pi-mono SDK alignment, MCP client boundary, ~200+ TS files, real production users — none of it justifies a 36 month rewrite.
- **sf core (TypeScript on pi-mono): unchanged.** The historical retarget rationale stands. Pi-mono SDK alignment, MCP client boundary, ~200+ TS files, real production users — none of it justifies a 36 month rewrite.
- **New services: Go on Charm, comprehensively.** sf-worker (ADR-013), Singularity Knowledge + Agent Platform (ADR-014), flight recorder (ADR-015), Charm TUI client (ADR-017) — all in Go using the Charm ecosystem.
- **Native engine (Rust): permanent.** ~11k LOC in `rust-engine/` (git, text, forge_parser, grep, highlight, ast, diff, etc.) is best-of-breed and not re-implementable in Go without losing performance. Bindings (napi-rs from TS today; cgo from Go for new services if needed) flex per consumer.
- **Pony adoption: now, not deferred.** Reversed from initial conservative stance. Adopting pony from day one in Phase-3 admin surfaces (Singularity Memory admin UI, future audit dashboards) — admin tolerates churn better than user-facing surfaces, and the foundation bet pays back if pony stabilises.
@ -101,7 +101,7 @@ Three languages, three clean boundaries. Each layer using the stack that fits.
## References
- `SPEC.md` §1 — original retarget rationale (TS-on-pi-mono over Go-on-Crush).
- Older SPEC notes §1 — original retarget rationale (TS-on-pi-mono over Go-on-Crush).
- `ADR-013` — Network + remote execution (concrete: sf-worker).
- `ADR-014` — Singularity Knowledge + Agent Platform (concrete: SM rewrite).
- `ADR-015` — Flight recorder (concrete: x/vcr-based).

View file

@ -3,6 +3,17 @@
---
## Source Placement Axes (current)
- **Flow engine**: core workflow/control logic in `sf/*`, `auto*`, `core/*`, and `agent-loop.ts`-style paths.
- **Surfaces**: CLI/session I/O (`src/cli.ts`, `src/help-text.ts`), machine surface (`src/headless*.ts` via `sf headless`), web (`web/`), editor (`vscode-extension/`), and interactive TUI (`packages/pi-tui/*`).
- **Protocols / adapters**: `packages/rpc-client/*`, `packages/pi-coding-agent/src/modes/rpc/*`, `src/web/*`, and HTTP route surfaces.
- **Output formats**: `text`, `json`, `stream-json` with JSONL event framing in headless/RPC paths.
- **Run control + permissions**: manual/assisted/autonomous and restricted/normal/trusted/unrestricted are independent axes selected and enforced by command/session policy.
- **Workspace packages**: all `packages/*` entries are workspace packages (shared modules, including `rpc-client` as protocol adapter package).
---
## System Labels Reference
| Label | Description |
@ -30,7 +41,7 @@
| **File Search** | grep, glob, fd — file and content discovery |
| **SF Workflow** | Core SF planning/execution workflow engine |
| **Google Search** | Web search via Google API |
| **Headless Mode** | Non-interactive / scripted command execution |
| **Machine Surface** | Non-interactive / scripted command execution |
| **Image Processing** | Image decode, resize, encode, clipboard images |
| **Integration Tests** | Smoke, fixture, live, regression test suites |
| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
@ -45,12 +56,14 @@
| **Node.js Bindings** | TypeScript wrappers around Rust N-API modules |
| **Onboarding** | First-run wizard and setup flows |
| **Permissions** | Permission management for tools and trust |
| **Protocol / Adapter** | RPC/JSON-RPC/HTTP transport adapters and protocol glue |
| **Remote Questions** | Remote prompting via Slack, Discord, Telegram |
| **Search the Web** | Brave/Jina/Tavily-based web search extension |
| **Session Management** | Session file I/O, branches, fork trees |
| **Skills** | Skill tool registration, health, telemetry |
| **Slash Commands** | Command boilerplate generators extension |
| **State Machine** | State, history, persistence, reactive graph |
| **Output Format** | Runtime result encoding (`text`, `json`, `stream-json`, JSONL framing) |
| **Subagent** | Parallel/serial subagent delegation |
| **Syntax Highlighting** | Syntect-backed ANSI code coloring |
| **Text Processing** | Diff, truncation, HTML→MD, ANSI, JSON parse |
@ -74,17 +87,18 @@
| src/app-paths.js | Config | Compiled JS version |
| src/bundled-extension-paths.ts | Extension Registry | Serializes/parses bundled extension directory paths |
| src/bundled-resource-path.ts | Loader/Bootstrap, Extension Registry | Resolves bundled raw resource files from package root |
| src/cli.ts | CLI | Main CLI entry point — arg parsing, mode detection, plugin init |
| src/cli.ts | CLI | CLI/session I/O entry point for command handling and mode dispatch |
| src/cli-web-branch.ts | CLI, Web Mode | Web CLI branch; session dir resolution, legacy migration |
| src/extension-discovery.ts | Extension Registry | Discovers extension entry points from FS and package.json |
| src/extension-registry.ts | Extension Registry | Extension manifests, registry persistence, enable/disable |
| src/headless-answers.ts | Headless Mode | Pre-supply answers to extension UI requests in headless |
| src/headless-context.ts | Headless Mode | Context loading from stdin/files; project bootstrapping |
| src/headless-events.ts | Headless Mode | Event classification, terminal detection, idle timeouts |
| src/headless-query.ts | Headless Mode, CLI | Read-only snapshot query (state, dispatch preview, costs) |
| src/headless-ui.ts | Headless Mode | Extension UI auto-response, progress formatting |
| src/headless.ts | Headless Mode | Orchestrator for /sf subcommands without TUI via RPC |
| src/help-text.ts | CLI | Generates help text for all subcommands |
| src/headless-answers.ts | Machine Surface | Pre-supply answers to extension UI requests in `sf headless` |
| src/headless-context.ts | Machine Surface | Context loading from stdin/files; project bootstrapping |
| src/headless-events.ts | Machine Surface | Event classification, terminal detection, idle timeouts |
| src/headless-query.ts | Machine Surface, CLI | Read-only machine snapshot query (`state`, `dispatch preview`, `costs`) |
| src/headless-types.ts | Output Format | Output-format contract (`text` / `json` / `stream-json`) for machine I/O |
| src/headless-ui.ts | Machine Surface | Extension UI auto-response, progress formatting |
| src/headless.ts | Machine Surface | Machine-surface command implementation for `sf headless` |
| src/help-text.ts | CLI, Output Format | Generates usage/help text and CLI-facing output wording |
| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF |
@ -831,6 +845,18 @@
---
## packages/rpc-client/src/ — Protocol / Adapter
| File | System Label(s) | Description |
|------|-----------------|-------------|
| index.ts | Protocol / Adapter | Public exports for the SF RPC client package |
| rpc-client.ts | Protocol / Adapter | JSON-RPC client transport implementation |
| jsonl.ts | Protocol / Adapter, Output Format | JSONL encode/decode and framing helpers |
| rpc-types.ts | Protocol / Adapter | Typed request/response models for RPC calls |
| rpc-client.test.ts | Protocol / Adapter | Contract tests for serialization and JSONL behavior |
---
## rust-engine/ — Rust Engine
| File | System Label(s) | Description |
@ -968,7 +994,7 @@ Quick lookup: which files are part of each system?
| **File Search** | rust-engine/crates/engine/src/grep.rs, glob.rs, fd.rs, fs_cache.rs, packages/rust-engine/src/grep/*, fd/*, core/tools/grep.ts, find.ts |
| **SF Workflow** | src/resources/extensions/sf/* (non-auto), sf/reports.ts, sf/notifications.ts, sf/prompts/*, sf/workflow-templates/* |
| **Google Search** | src/resources/extensions/google-search/index.ts |
| **Headless Mode** | src/headless*.ts |
| **Machine Surface** | src/headless*.ts |
| **Image Processing** | rust-engine/crates/engine/src/image.rs, packages/rust-engine/src/image/*, utils/image-*.ts, web/lib/image-utils.ts |
| **Integration Tests** | tests/**/* |
| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* |
@ -979,8 +1005,10 @@ Quick lookup: which files are part of each system?
| **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
| **Modes** | pi-coding-agent/src/modes/* |
| **Model System** | pi-coding-agent/src/core/model-*.ts, pi-ai/src/models*.ts, pi-ai/src/api-registry.ts, sf/model-router.ts |
| **Output Format** | src/headless-types.ts, src/headless.ts, packages/rpc-client/src/jsonl.ts |
| **Native / Rust Tools** | rust-engine/crates/engine/src/* |
| **Node.js Bindings** | packages/rust-engine/src/* |
| **Protocol / Adapter** | packages/rpc-client/*, packages/pi-coding-agent/src/modes/rpc/*, packages/rpc-client/src/jsonl.ts |
| **Onboarding** | src/onboarding.ts, src/wizard.ts, web/components/sf/onboarding/*, web/app/api/onboarding/* |
| **Permissions** | core/extensions/project-trust.ts, core/auth-storage.ts |
| **Remote Questions** | src/resources/extensions/remote-questions/* |

View file

@ -73,7 +73,7 @@ A new workspace package at `packages/sf-agent-modes/` that owns all run-mode and
Must contain:
- `modes/interactive/` (full TUI interactive mode and all components)
- `modes/rpc/` (RPC server, RPC client, JSON protocol)
- `modes/print/` (print/headless mode)
- `modes/print/` (print and machine-surface output)
- `cli/` (arg parsing, config selector, session picker, model lister, file processor)
- `main.ts` entry point logic

View file

@ -8,22 +8,21 @@ The foundational decision is [ADR-0000: SF Is a Purpose-to-Software Compiler](..
```
sf (CLI binary)
└─ loader.ts Sets PI_PACKAGE_DIR, SF env vars, dynamic-imports cli.ts
└─ cli.ts Wires SDK managers, loads extensions, starts InteractiveMode
├─ onboarding.ts First-run setup wizard (LLM provider + tool keys)
├─ wizard.ts Env hydration from stored auth.json credentials
├─ app-paths.ts ~/.sf/agent/, ~/.sf/sessions/, auth.json
├─ resource-loader.ts Syncs bundled extensions + agents to ~/.sf/agent/
└─ src/resources/
├─ extensions/sf/ Core SF extension
├─ extensions/... 23 supporting extensions
├─ agents/ scout, researcher, worker
├─ AGENTS.md Agent routing instructions
└─ SF-WORKFLOW.md Manual bootstrap protocol
├─ loader.ts Sets PI_PACKAGE_DIR, SF env vars, dynamic-imports cli.ts
├─ cli.ts CLI + session I/O entrypoint
├─ help-text.ts CLI/session IO contract and usage text
└─ resource-loader.ts Syncs bundled extension + agent assets from src/resources/
└─ src/resources/
├─ extensions/sf/ Core SF extension
├─ extensions/ Bundled extensions (sf + others)
├─ agents/ scout, researcher, worker
├─ AGENTS.md Agent routing instructions
└─ SF-WORKFLOW.md Manual bootstrap protocol
sf headless Headless mode — CI/cron orchestration via RPC child process
sf headless Machine surface command (`src/headless*.ts`)
web/ Web surface (web mode + web UI runtime bridge)
vscode-extension/ Editor surface integration (VS Code chat + sidebar + RPC)
vscode-extension/ VS Code extension — chat participant (@sf), sidebar dashboard, RPC integration
```
## Key Design Decisions
@ -32,6 +31,32 @@ vscode-extension/ VS Code extension — chat participant (@sf), sidebar
Structured `.sf` state is the runtime source of truth. For migrated milestones, `.sf/sf.db` is authoritative for hierarchy, `sequence` priority/order, validation assessments, gates, UOK lifecycle, and outcome ledgers. Markdown and JSON planning files are generated views, exports, or explicit recovery/import inputs; normal auto mode does not fall back to them when the DB exists and opens. No in-memory state survives across sessions. This enables crash recovery, multi-terminal steering, and session resumption.
### Flow, Surfaces, Protocols, Output, Run Control, and Permissions
SF has one flow engine. TUI, CLI, web, editor integrations, and `sf headless`
must follow the same planning/execution/policy/evidence model. `sf headless` is
the machine surface command (`sf headless`) and is implemented in
`src/headless*.ts`. `src/cli.ts` and `src/help-text.ts` are the CLI/session I/O
surface.
The five-axis placement is:
- **Flow engine:** `sf` core and auto/workflow modules (`auto*`, `sf/*`, `core/*`).
- **Surfaces:** CLI via `src/cli.ts`, machine via `src/headless*.ts`, web via `web/*`,
editor/editor-adjacent via `vscode-extension/*`, and TUI via `packages/pi-tui/*`.
- **Workspace packages:** `packages/*` contains shared modules (AI, agent core, tui, native, rpc-client), with protocol adapters in `packages/rpc-client/*`.
- **Protocols / adapters:** ACP, RPC, stdio JSON-RPC, HTTP, and adapter modules in
`packages/rpc-client/*` plus `packages/pi-coding-agent/src/modes/rpc/*`.
- **Output formats:** `text`, `json`, and `stream-json` are the active output contracts.
JSONL is the event transport framing used by machine/adapter paths.
- **Run control:** Manual, assisted, and autonomous map to mode/dispatch decision points.
- **Permission profiles:** Restricted, normal, trusted, unrestricted are enforced at the
surface + command + session decision boundaries.
The human-readable export lives in
[`docs/specs/sf-operating-model.md`](../specs/sf-operating-model.md). The
runtime source of truth remains `.sf/sf.db` plus `.sf` working guidance.
### Two-File Loader Pattern
`loader.ts` sets all environment variables with zero SDK imports, then dynamically imports `cli.ts` which does static SDK imports. This ensures `PI_PACKAGE_DIR` is set before any SDK code evaluates.
@ -70,7 +95,7 @@ Every dispatch creates a new agent session. The LLM starts with a clean context
| **Ask User Questions** | Structured user input with single/multi-select |
| **Secure Env Collect** | Masked secret collection |
| **Async Jobs** | Background command execution with `async_bash`, `await_job`, `cancel_job` |
| **Remote Questions** | Discord, Slack, and Telegram integration for headless question routing |
| **Remote Questions** | Discord, Slack, and Telegram delivery for run-control-gated questions |
| **TTSR** | Tool-triggered system rules — conditional context injection based on tool usage |
| **Universal Config** | Discovery of external tool configurations for import or migration |
| **AWS Auth** | AWS credential management and authentication |

View file

@ -7,8 +7,11 @@ explicitly promotes them into durable project documentation.
- `.sf/` stores SF-local operational state, generated harness notes, scaffold
manifests, runtime caches, locks, and temporary agent files.
- `docs/plans/`, `docs/specs/`, and `docs/adr/` store promoted, reviewed,
durable project artifacts.
- `docs/plans/`, `docs/specs/`, and `docs/adr/` store promoted or generated
human-facing exports for review and git history.
- Generated docs may change by design. Git keeps their human-facing history;
SF-owned operational history belongs in `.sf`/SQLite when runtime replay,
ledgers, memory, or drift analysis matter.
- Root files such as `AGENTS.md`, `ARCHITECTURE.md`, and `.siftignore` are
allowed only when they are part of the versioned scaffold contract.

View file

@ -43,14 +43,20 @@ Additional coder references:
- `symphony`
- `singularity/machine` (`codemachine`)
Spec-system references:
- `spec-kit`
- `OpenSpec`
- `spec-kitty`
- `cc-sdd`
Indexed-only references to include in future passes:
- `kimi-cli` / Kimi Code
- Spec Kit
- upstream CodeMachine CLI (`moazbuilds/CodeMachine-CLI`)
The local `claude-code` checkout is a leaked-source/sourcemap research mirror,
not a clean upstream dependency. Treat it as ergonomics evidence only.
Claude Code references in this survey are limited to public documentation and
observed product behavior. Private or unverified mirrors are out of scope.
## SF + ACE Full-Stack Reference Map
@ -65,13 +71,16 @@ it is strongest.
| `symphony` | Work orchestration | Linear polling, isolated per-issue workspaces, `WORKFLOW.md`, Codex app-server, retries, PR review/landing | local source + Context7 `/openai/symphony` | `README.md`, `SPEC.md`, `elixir/WORKFLOW.md`, `elixir/AGENTS.md`, `.codex/skills/` |
| `codemachine` | Multi-agent workflow engine | Engine matrix, SmartRouter, spec-to-code workflow templates, feature flags, tool health | local fork/source + web upstream | `README.md`, `docs/architecture/`, `templates/workflows/`, `prompts/agents/`, `prompts/moderator/` |
| Amplication | Platform/golden paths | Live templates, service catalog, plugin codegen, generated service lifecycle, compliance/drift | web/GitHub; clone before local planning | `docs/`, `packages/*/src/`, plugin/codegen packages if cloned |
| Spec Kit | Spec-driven artifacts | Constitution, scenarios, FR/SC IDs, spec -> plan -> tasks -> analyze -> implement | Context7 `/github/spec-kit` | templates/docs/spec workflows if cloned |
| `spec-kit` | Spec-driven artifacts | Constitution, scenarios, FR/SC IDs, spec -> plan -> tasks -> analyze -> implement, generated checklists | local source + Context7 `/github/spec-kit` | `templates/commands/`, `templates/*-template.md`, `scripts/`, `src/specify_cli/` |
| `OpenSpec` | Change/spec artifact graph | Current specs vs proposed changes, artifact dependency chain, workspace links, JSON/no-interactive command behavior | local source | `docs/`, `openspec/specs/`, `openspec/changes/`, CLI source |
| `spec-kitty` | Spec runtime governance | machine/project boundary, manifest-last generated artifacts, package-bundled template source, command-owned action context | local source | `architecture/adrs/`, `src/`, `.kittify/`, `kitty-specs/`, `docs/` |
| `cc-sdd` | Kiro-style phase-gated SDD | steering vs specs, discovery routing, approvals, boundary-first tasks, per-task implement/review/debug roles | local source | `AGENTS.md`, `CLAUDE.md`, `docs/guides/`, `tools/cc-sdd/src/` |
| `plandex` | Large-task implementation | Cumulative diff sandbox, plan versioning, context loading, apply/debug loop | local source + Context7 | `README.md`, `app/cli/lib/`, `app/server/db/`, first-party docs |
| `aider` | Edit loop/context map | Repo-map ranking, edit formats, lint/test repair, benchmark metadata | local source + Context7 | `aider/`, `benchmark/`, `tests/`, docs; avoid generated website data unless needed |
| `Agentless` | Bug repair/evals | Localization -> repair -> patch validation, reproduction tests, reranking | local source | `agentless/fl/`, `agentless/repair/`, `agentless/test/`, benchmark docs |
| SWE-agent/OpenHands | Bug repair/runtime research | issue-to-patch loops, sandbox/runtime harnesses, SWE-bench evaluation | Context7/web or local clone if added | source/docs/evals only when cloned |
| `codex` | Execution substrate | Sandbox profiles, approval policy, app-server protocol, typed events, AGENTS scope | local source + Context7 `/openai/codex` | `docs/`, `codex-rs/protocol/src/`, `codex-rs/exec/src/`, `codex-rs/linux-sandbox/`; avoid `vendor/` |
| `claude-code` | UX reference | Permissions, commands, plugins, MCP client UX, subagent UX | local source only; leaked mirror caveat | `src/commands/`, `src/services/mcp/`, `src/tools/`, `src/components/` |
| Claude Code | UX reference | Permissions, commands, plugins, MCP client UX, subagent UX | public docs + observed behavior | public docs, command UX, transcript/output behavior |
| `qwen-code` | Terminal workflow | trusted folders, subagent fork design, terminal-capture tests, provider config | local source + Context7 | `docs/`, `packages/*/src/`, `integration-tests/terminal-capture/` |
| Kimi Code | Model-specific coding agent | long-context coding, Kimi CLI/IDE flow, model-plan comparison | Context7 `/moonshotai/kimi-cli` | docs/source if cloned |
| CodeGeeX2 | Model capability | multilingual code model, HumanEval-X/DS1000, local deployment/quantization | web/GitHub | benchmark/evaluation/docs if cloned |
@ -90,24 +99,27 @@ it is strongest.
|---|---|---|---|
| `plandex` | Large task planning and review workflow | Cumulative diff sandbox, plan versioning, explicit chat-to-plan-to-apply flow, context indexing for big repos | Server/product coupling and any cloud-hosting assumptions |
| `codex` | Execution hardening and protocol boundaries | Typed non-interactive event stream, sandbox permission profiles, app-server protocol shape, Rust crate boundaries, config schema rigor, plugin/skill manager discipline | Treating MCP server code as a Forge product direction |
| `claude-code` | Interactive ergonomics | Permission UX, command discoverability, plugin surfaces, subagent UX, memory/context commands, MCP client config flows | Copying leaked-source implementation or making it an upstream dependency |
| Claude Code | Interactive ergonomics | Permission UX, command discoverability, plugin surfaces, subagent UX, memory/context commands, MCP client config flows | Copying private implementation details or making it an upstream dependency |
| `ace-coder` | Owned multi-repo governance | Reviewer roles, hard quality gates, skill/subagent routing policy, explicit MCP-client contract style | Collapsing ACE and Forge into one product surface |
| `aider` | Tight edit loop, repo maps, and benchmark culture | Token-budgeted repo-map ranking, reproducible benchmark reports with model, edit format, commit hash, dirty state, pass rates, malformed output counts | Early auto-commit posture before validation and commit gates |
| `Agentless` | Bug-fix eval pipeline | Localization, candidate repair, regression-test selection, reproduction-test generation, validation-based patch reranking | SWE-bench-specific harness assumptions |
| `RA.Aid` | Stage boundaries and trajectory records | Explicit research/planning/implementation phases, research-only mode, durable session/tool trajectory records | Broad autonomous shell posture and external Aider outsourcing |
| `goose` | Desktop/CLI/API distribution and diagnostics | Provider breadth, diagnostics/reporting, API embedding, extension packaging, MCP client lifecycle patterns | Built-in/re-exported MCP servers or broad general-agent scope |
| `gemini-cli` | Release/test/docs automation | Release channels, generated settings schema/docs, behavioral eval incubation, sandbox integration tests, perf/memory baselines, GitHub Action workflows | Provider-specific product assumptions or unstable evals as hard CI gates |
| `qwen-code` | Claude-like terminal workflow | Skills/subagents, forked subagent design, trust-gated workspace config, terminal-capture regressions, flexible provider config | OAuth/provider policy coupling or ungated project-local config |
| `qwen-code` | Claude-like terminal workflow and machine I/O | Skills/subagents, forked subagent design, trust-gated workspace config, bidirectional stream-json, JSON fd/file side channels, terminal-capture regressions, flexible provider config | OAuth/provider policy coupling, ungated project-local config, or mixing channel names with surface/protocol names |
| `opencode` | Mode split and schema boundary | Read-only `plan` mode vs full-access `build` mode, client/server framing, LSP opt-in, project-local commands/tools, schema-first domain boundary | Bun-specific implementation style for Forge |
| `crush` | SQLite state, hooks, permissions, TUI | SQLite migrations/query discipline, hook engine, permission layering, session DB, tool markdown descriptions, LSP, pub/sub, MCP client status UX | Replacing Forge's TypeScript extension architecture with Go |
| `crush` | SQLite state, hooks, permissions, TUI | TUI as client over backend/session services, SQLite migrations/query discipline, hook engine, permission layering, session DB, tool markdown descriptions, LSP, pub/sub, MCP client status UX | Replacing Forge's TypeScript extension architecture with Go or hiding machine protocol behind env vars |
| `letta-code` | Long-lived memory-agent UX | Memory lifecycle, skill learning, approval recovery tests, channel/remote control ideas, MCP OAuth/connect UX | Treating memory as unstructured product magic instead of DB-backed state |
| `neovate-code` | Design-doc and terminal UX iteration | Small design records, queued-message designs, subagent design notes, command/terminal UX records | Pulling in provider-specific branding or immature UX churn |
| `amazon-q-developer-cli` | Rust auth/security reference | Auth/security/workspace patterns and Rust CLI lessons where applicable | Product direction; local README says the open source project is no longer actively maintained |
| `neovate-code` | Design-doc and terminal UX iteration | Small design records, queued-message designs, JSONL session replay, high-risk command classification, command/terminal UX records | Quiet/headless auto-approval, global-only session memory, provider-specific branding, or immature UX churn |
| `amazon-q-developer-cli` | Declarative agent and UI event reference | Agent manifest schema, hooks/resources/tools, AG-UI-like lifecycle/text/tool/state event taxonomy, auth/security/workspace patterns | Product direction, recursive delegate subprocesses, legacy raw passthrough as protocol, trust-all delegate defaults |
| `open-codex` | Older/forked approval-mode comparison | Approval-mode vocabulary and provider abstraction history | Fork-specific Chat Completions direction as a primary architecture |
| `symphony` | Work orchestration above individual agents | Issue-tracker polling, per-issue isolated workspaces, repo-owned `WORKFLOW.md`, Codex app-server lifecycle, retries, operator state, CI/PR review and landing loops | High-trust unattended defaults without Forge's UOK gates and DB-first runtime evidence |
| `codemachine` | Multi-agent spec-to-code orchestration | Engine matrix, SmartRouter routing, heterogeneous agents, spec-to-code templates, feature flags, tool health, local workflow examples, upstream repeatable long-running workflow model | Optional MCP-server/tooling posture and Bun-specific implementation assumptions |
| Kimi Code | Long-context model-specific coding agent | Kimi CLI/IDE workflow, long-context coding, subagent-oriented terminal automation, model-plan comparison | Treating provider-specific subscription/API behavior as a Forge architecture |
| Spec Kit | Spec-driven development workflow | Constitution, prioritized user scenarios, acceptance criteria, functional requirements, measurable success criteria, spec -> plan -> tasks -> implement -> analyze loop | Replacing Forge PDD/UOK with a generic spec template instead of mapping useful pieces into PDD fields |
| `spec-kit` | Spec-driven development workflow | Constitution, prioritized user scenarios, acceptance criteria, functional requirements, measurable success criteria, spec -> plan -> tasks -> implement -> analyze loop | Replacing Forge PDD/UOK with a generic spec template instead of mapping useful pieces into PDD fields |
| `OpenSpec` | Brownfield change planning | Clear split between current behavior specs and proposed changes, dependency-aware artifact continuation, workspace link/local mapping | Treating docs/specs as Forge's operational source of truth instead of generated/reviewed exports from `.sf`/SQLite |
| `spec-kitty` | Runtime and generated-artifact governance | Global runtime vs project overlay, package-bundled template source, manifest-last promotion, command-owned context resolver | Per-project full runtime copies, hidden alternate template sources, or prompt-level context discovery |
| `cc-sdd` | Agentic SDD operating loop | Steering/spec split, discovery router, approval gates, boundary annotations, per-task implementer/reviewer/debugger, implementation-note propagation | Markdown-only operational state for SF, or requiring heavy gates for trivial direct fixes |
## Forge Already Has
@ -136,12 +148,15 @@ surfaces instead of adding parallel state systems.
mode, and clearer mode transitions in auto/headless.
3. **Typed headless event stream**
- Use Codex and OpenCode as the references.
- Use Codex, Gemini CLI, Qwen Code, Amazon Q, and OpenCode as the references.
- Target Forge surfaces: stable machine-readable events such as
`thread.started`, `turn.started`, `turn.completed`, `turn.failed`,
`item.started`, `item.updated`, and `item.completed`, with typed payloads
for commands, patches, MCP calls, web/context lookups, todos, and UOK
evidence.
- Qwen-style bidirectional machine contracts are especially relevant:
stream-json input, stream-json output, JSON fd/file side channels, and
long-lived session control.
4. **Reviewable cumulative diffs**
- Use Plandex and Aider as the references.
@ -191,11 +206,46 @@ surfaces instead of adding parallel state systems.
bypassing Forge's safety gates.
11. **Spec-driven artifact pipeline**
- Use Spec Kit and CodeMachine as references.
- Use Spec Kit, OpenSpec, spec-kitty, cc-sdd, and CodeMachine as references.
- Target Forge surfaces: convert intent into PDD fields, prioritized slices,
acceptance criteria, functional requirements, measurable success criteria,
task generation, and consistency analysis before implementation.
12. **Generated human exports and drift checks**
- Use spec-kitty and Spec Kit as references, but keep Forge database-first.
- Target Forge surfaces: generated `docs/specs/` exports, `check` commands
that fail on stale projections, manifest-backed generated artifacts where
promotion needs auditability, and command-owned context resolution rather
than prompt heuristics.
- Stop rule: generated docs may be reviewed and tracked by Git, but SF-owned
operational history and future-use knowledge stay in `.sf`/SQLite.
13. **Batch input and project-state relocation**
- Use Aider and RA.Aid as references.
- Target Forge surfaces: prompt-file batch input, dry-run previews, explicit
`--yes`/confirmation gates, and an overrideable project-state directory
for CI/sandboxes or migrated workspaces.
- Stop rule: history/prompt buffers are convenience, not cross-repo memory or
operational authority.
14. **Decomposed autonomy and high-risk command classification**
- Use Plandex, Goose, Gemini CLI, Qwen Code, and Neovate Code as references.
- Target Forge surfaces: separate choices for context loading, apply,
execution, commit behavior, run control, and permission profile; keep
static high-risk command classification as a first-pass guard; fail closed
when a non-interactive run reaches approval-required tool use without an
explicit permission profile that allows it.
- Stop rule: no broad always-yes mode and no destructive Git cleanup as a
default recovery path.
15. **Declarative agent/run manifests**
- Use Amazon Q and Goose recipes as references.
- Target Forge surfaces: reviewed agent/run manifests with prompt context,
file resources, hooks, visible/allowed tools, model policy, extension
requirements, parameters, and expected response/evidence contracts.
- Stop rule: manifests feed UOK and `.sf/sf.db`; they do not bypass SF
permission or evidence gates.
## Priority Order
P0:
@ -231,37 +281,67 @@ P2:
The follow-up subagent pass inspected these concrete local paths:
- `aider/aider/repomap.py`, `aider/aider/coders/base_coder.py`,
`aider/aider/linter.py`, and `aider/benchmark/README.md`.
`aider/aider/linter.py`, `aider/aider/args.py`, `aider/aider/io.py`,
and `aider/benchmark/README.md`.
- `Agentless/agentless/fl/localize.py`,
`Agentless/agentless/repair/rerank.py`,
`Agentless/agentless/test/generate_reproduction_tests.py`.
`Agentless/agentless/repair/repair.py`,
`Agentless/agentless/test/generate_reproduction_tests.py`, and
`Agentless/agentless/test/run_tests.py`.
- `RA.Aid/ra_aid/agents/`, `RA.Aid/ra_aid/tools/programmer.py`,
`RA.Aid/ra_aid/database/models.py`.
`RA.Aid/ra_aid/database/models.py`, `RA.Aid/ra_aid/config.py`,
`RA.Aid/ra_aid/database/connection.py`, and `RA.Aid/ra_aid/tools/shell.py`.
- `plandex/app/cli/lib/apply.go`, `plandex/app/cli/lib/rewind.go`,
`plandex/app/server/db/diff_helpers.go`.
`plandex/app/cli/lib/git.go`, `plandex/app/cli/lib/repl.go`,
`plandex/app/cli/cmd/plan_exec_helpers.go`,
`plandex/app/cli/cmd/plan_start_helpers.go`,
`plandex/app/server/db/diff_helpers.go`, and
`plandex/app/server/db/plan_config_helpers.go`.
- `codex/codex-rs/exec/src/exec_events.rs`,
`codex/codex-rs/linux-sandbox/README.md`,
`codex/codex-rs/linux-sandbox/src/linux_run_main.rs`.
- `gemini-cli/evals/README.md`, `gemini-cli/perf-tests/README.md`,
`gemini-cli/memory-tests/`.
`gemini-cli/memory-tests/`, `gemini-cli/packages/cli/src/config/config.ts`,
`gemini-cli/packages/cli/src/nonInteractiveCli.ts`,
`gemini-cli/packages/core/src/output/types.ts`,
`gemini-cli/packages/core/src/policy/types.ts`, and
`gemini-cli/packages/core/src/config/storage.ts`.
- `qwen-code/docs/users/configuration/trusted-folders.md`,
`qwen-code/docs/design/fork-subagent/fork-subagent-design.md`,
`qwen-code/integration-tests/terminal-capture/`.
`qwen-code/integration-tests/terminal-capture/`,
`qwen-code/packages/cli/src/config/config.ts`,
`qwen-code/packages/cli/src/nonInteractiveCli.ts`,
`qwen-code/packages/cli/src/nonInteractive/session.ts`,
`qwen-code/packages/channels/base/README.md`, and
`qwen-code/packages/core/src/permissions/types.ts`.
- `opencode/.opencode/`, `opencode/specs/v2/session.md`,
`opencode/packages/opencode/specs/effect/schema.md`,
`opencode/packages/opencode/src/session/schema.ts`.
- `crush/internal/db/`, `crush/internal/hooks/`,
`crush/internal/permission/`, `crush/internal/agent/tools/mcp/`.
- `claude-code/src/services/mcp/`, `claude-code/src/commands/mcp/`,
`claude-code/src/tools/ListMcpResourcesTool/`,
`claude-code/src/tools/ReadMcpResourceTool/`.
- Claude Code public documentation and observed command/transcript behavior.
- `letta-code/src/cli/components/McpConnectFlow.tsx`,
`letta-code/src/cli/helpers/mcpOauth.ts`,
`letta-code/src/agent/approval-recovery.ts`.
- `neovate-code/src/mcp.ts`, `neovate-code/src/commands/mcp.ts`,
`neovate-code/src/slash-commands/builtin/mcp.tsx`.
`neovate-code/src/slash-commands/builtin/mcp.tsx`,
`neovate-code/src/session.ts`, `neovate-code/src/config.ts`,
`neovate-code/src/tools/bash.ts`, and `neovate-code/src/ui/ApprovalModal.tsx`.
- `amazon-q-developer-cli/crates/agent/src/agent/mcp/`,
`amazon-q-developer-cli/crates/chat-cli/src/cli/mcp.rs`.
`amazon-q-developer-cli/crates/chat-cli/src/cli/mcp.rs`,
`amazon-q-developer-cli/schemas/agent-v1.json`,
`amazon-q-developer-cli/crates/chat-cli-ui/src/protocol.rs`, and
`amazon-q-developer-cli/crates/chat-cli/src/cli/chat/tools/delegate.rs`.
- `goose/crates/goose/src/config/goose_mode.rs`,
`goose/crates/goose/src/config/permission.rs`,
`goose/crates/goose-cli/src/session/mod.rs`,
`goose/crates/goose-cli/src/cli.rs`,
`goose/crates/goose/src/session/session_manager.rs`, and
`goose/crates/goose/src/recipe/mod.rs`.
- `crush/internal/cmd/root.go`, `crush/internal/cmd/run.go`,
`crush/internal/proto/proto.go`, `crush/internal/backend/permission.go`,
`crush/internal/db/migrations/20250424200609_initial.sql`,
`crush/internal/cmd/session.go`, and `crush/internal/config/config.go`.
- `ace-coder/docs/MCP_SERVER.md`,
`ace-coder/docs/plans/2026-04-05-mcp-daemon-refactor.md`,
`ace-coder/python/ai_dev/mcp/`.
@ -269,6 +349,17 @@ The follow-up subagent pass inspected these concrete local paths:
`symphony/elixir/AGENTS.md`, and `.codex/skills/land/SKILL.md`.
- `singularity/machine/README.md`, `package.json`, `templates/workflows/`,
`docs/architecture/engine-matrix.md`, and `docs/OPENAI_SPECS_DOWNLOAD.md`.
- `spec-kit/README.md`, `templates/commands/specify.md`,
`templates/commands/plan.md`, `templates/commands/tasks.md`, and
`scripts/bash/common.sh`.
- `OpenSpec/docs/concepts.md`, `OpenSpec/docs/commands.md`,
`OpenSpec/openspec/specs/`, and `OpenSpec/openspec/changes/`.
- `spec-kitty/architecture/adrs/2026-04-08-1-global-kittify-machine-level-runtime.md`,
`spec-kitty/architecture/adrs/2026-04-08-2-package-bundled-templates-sole-source.md`,
and `spec-kitty/architecture/adrs/2026-03-09-1-prompts-do-not-discover-context-commands-do.md`.
- `cc-sdd/AGENTS.md`, `cc-sdd/README.md`,
`cc-sdd/docs/guides/spec-driven.md`, and
`cc-sdd/docs/guides/skill-reference.md`.
## Context7 Cross-Check
@ -281,7 +372,7 @@ snapshot available on this machine.
approval policy, and sandbox modes (`read-only`, `workspace-write`,
`danger-full-access`) with writable roots and network controls.
- `/plandex-ai/plandex` confirmed the relevant Plandex public patterns:
semi/full autonomy levels, smart context loading, cumulative diff review
semi/full run-control levels, smart context loading, cumulative diff review
sandbox, review/apply/debug workflow, and large multi-file task focus.
- `/ai-christianson/ra.aid` confirmed the relevant RA.Aid public patterns:
research-only and research-and-plan-only modes, the research -> planning ->
@ -297,7 +388,7 @@ snapshot available on this machine.
- OpenAI Symphony: `/openai/symphony`.
- Kimi Code: `/moonshotai/kimi-cli`,
`/websites/moonshotai_github_io_kimi-cli_en`, and `/websites/kimi_code`.
- Spec Kit: `/github/spec-kit` and `/websites/github_github_io_spec-kit`.
- Spec Kit: `/github/spec-kit` and `/websites/github_github_io_spec-kit`.
- Upstream CodeMachine CLI did not resolve by name in Context7 during this
pass, but GitHub confirms `https://github.com/moazbuilds/CodeMachine-CLI`
as the public upstream-style repo for CodeMachine CLI. The local checkout
@ -358,6 +449,92 @@ The targeted `sift` pass found:
- Forge `src/resources/extensions/sf/tests/no-sf-mcp-server.test.mjs`: confirms
there is executable guard coverage preventing recreation of SF MCP server
paths.
## Local Surface/State Cross-Check
The detailed coder-agent pass supports Forge's five-axis operating model.
- Codex has the cleanest split: TUI flags, non-interactive `exec`, app-server
protocol, sandbox mode, approval policy, persistent state, and input history
are separate code paths and types. Forge should copy the typed separation, not
the exact crate structure.
- OpenCode treats the TUI as one client of a server, exposes HTTP/OpenAPI, and
bridges ACP as a protocol adapter over its server. Forge should adapt the
client/server clarity and generated SDK idea without making HTTP the only
mental model for machine access.
- Claude Code is strong on structured headless output, rich JSONL transcripts,
entrypoint metadata, permission modes, and sandbox schemas. Forge should adapt
the stream discipline and transcript fields while avoiding hidden feature-flag
sprawl.
- old `open-codex` is the cautionary example: autonomy, approval policy, output,
and sandbox behavior collapse into a small set of flags. Forge should keep
run control, output format, protocol, and permission profile independent.
The second coder-agent pass adds state/history guardrails:
- Aider's prompt-file, dry-run, and explicit yes primitives are useful batch
affordances, but its flat history files should remain convenience only.
- RA.Aid's overrideable project state directory is useful for CI/sandboxes, but
broad shell-approval bypasses are not.
- Plandex's decomposed context/apply/execution/commit modes map well to Forge's
need to keep run control separate from permission profile and Git policy.
- Agentless's stage JSONL artifacts are good eval/evidence inputs; import them
into DB-backed contracts instead of making JSONL the live state model.
- Neovate's JSONL session replay and high-risk bash classifier are useful; its
quiet-mode auto-approval is not.
The terminal-agent pass adds concrete machine/API patterns:
- Gemini and Qwen both expose `text`, `json`, and `stream-json` output formats.
Qwen goes further with stream-json input plus JSON fd/file side channels.
Forge should adopt the bidirectional contract shape for parent processes.
- Goose cleanly names run-control and permission behavior, and its
non-interactive path refuses approval-required tool calls unless the run was
explicitly allowed to proceed. Forge should copy that fail-closed posture.
- Crush shows the right architecture shape for TUI-over-backend/session-store
while keeping session/message/file history in SQLite. Forge already wants this
DB-first boundary; the lesson is to avoid making the machine protocol hidden
or text-only.
- Amazon Q has the richest declarative agent schema and useful lifecycle/text/
tool/state event taxonomy. Forge should adapt manifests and event taxonomy,
not recursive delegate subprocesses or raw passthrough protocol events.
## Local Spec-System Cross-Check
The spec-system pass reinforces the current Forge direction: specs and docs are
valuable contracts for humans, but command/runtime state must own execution.
- Spec Kit creates `specs/<feature>/spec.md`, stores the active feature pointer
in `.specify/feature.json`, runs plan/task setup scripts that return JSON, and
generates quality checklists before planning. The useful pattern is explicit
generated artifacts plus machine-readable path discovery, not a second SF
planning database. Its `analyze` command is a useful read-only consistency
shape for comparing spec, plan, tasks, and constitution; in Forge that should
compare `.sf`/DB state, generated docs, evidence, and code diffs.
- OpenSpec separates `openspec/specs/` as current behavior from
`openspec/changes/` as proposed modifications, then uses commands like
`propose`, `continue`, `ff`, `verify`, `sync`, and `archive` to move artifacts
through a schema-defined dependency graph. Its best pattern is deterministic
ready/blocked artifact queries, apply requirements, and delta validation.
Forge should keep the change/spec distinction as a human review model while
storing operational order/gates in `.sf/sf.db`.
- spec-kitty is strongest on runtime boundaries: one global machine runtime,
thin project overlay, package-bundled templates as the sole end-user source,
manifest-last generated artifact promotion, and command-owned action context.
Its newer runtime pattern is also important: append-only status events,
materialized status, expected-artifact manifests with blocking semantics, step
contracts, explicit write scopes, review evidence, and commit hooks that keep
planning artifacts out of lane branches. Forge should copy the boundary
discipline and avoid prompt-level discovery for any context a command can
resolve.
- cc-sdd is strongest on phase gates and role separation: steering vs feature
specs, discovery routing, requirements/design/tasks approvals, boundary-first
task contracts, and per-task implementer/reviewer/debugger contexts. Forge
should adapt the contract discipline into PDD fields and UOK gates without
making markdown checkboxes the operational state store. The most reusable
pieces are boundary/dependency annotations, observable completion, approval
booleans as explicit state, implementation-note propagation, and a manifest
planner for generated agent/startup artifacts with dry-run/conflict policy.
- Forge `docs/records/2026-05-07-cli-agent-code-survey.md`: now records the
MCP-client-only product boundary and roadmap pull-through.
@ -385,7 +562,7 @@ evidence contract before using them for planning.
The first execution-policy vocabulary slice also landed:
- `execution-policy.js` defines named `plan`, `build`, and `unrestricted`
- `execution-policy.js` defines named `plan`, `build`, `trusted`, and `unrestricted`
profiles with filesystem, network, git, and mutation posture.
- The `plan` profile reuses the existing queue-mode write gate, so read-only
commands and `.sf/` planning artifacts are allowed while source mutations are
@ -396,13 +573,13 @@ The first execution-policy vocabulary slice also landed:
calls, recording the profile, allow/deny result, risk, destructive labels,
tool name, call id, and policy-relevant command/path only.
Next slices should project these profile decisions into UOK evidence and
headless JSON output before broad enforcement.
Next slices should project these profile decisions into UOK evidence and the
machine-surface JSON/JSONL projections before broad enforcement.
## Resulting Direction
Forge should absorb proven patterns into UOK and the existing DB-first runtime:
structured state, explicit modes, stronger permissions, reproducible evidence,
and better review UX. The goal is not feature parity with every coder. The goal
is a purpose-to-software compiler whose autonomy is inspectable, recoverable,
is a purpose-to-software compiler whose run control and permission profile are inspectable, recoverable,
and safe enough to run repeatedly.

View file

@ -7,5 +7,5 @@ Repo-memory audits, decision ledgers, context-gardening notes, and records-keepe
| Date | Note | Summary |
|------|------|---------|
| 2026-05-01 | [repo-vcs and notifications](./2026-05-01-repo-vcs-and-notifications.md) | repo-vcs skill landed; notification specs drafted; JSDoc annotations added; placeholder docs filled |
| 2026-05-07 | [SF + ACE full-stack reference survey](./2026-05-07-cli-agent-code-survey.md) | repo-wise map of coding agents, orchestration systems, retrieval tools, model references, and platform/golden-path systems; priority pulls are execution permissions, typed headless events, DB-first state, trust gating, orchestration, cumulative diffs, eval pipelines, and MCP-client-only lifecycle hardening |
| 2026-05-07 | [SF + ACE full-stack reference survey](./2026-05-07-cli-agent-code-survey.md) | repo-wise map of coding agents, spec systems, orchestration systems, retrieval tools, model references, and platform/golden-path systems; priority pulls are execution permissions, typed machine events, DB-first state, generated human exports, trust gating, orchestration, cumulative diffs, eval pipelines, and MCP-client-only lifecycle hardening |
| 2026-05-07 | [strategy alignment](./2026-05-07-strategy-alignment.md) | aligned top-level docs and roadmap framing around Forge as product, UOK as kernel, and external CLIs as sharpening inputs |

View file

@ -1,16 +1,36 @@
# docs/specs/
Durable specifications and contracts.
Human-readable specifications and exported contracts.
SF plans and executes from `.sf`, database first. `.sf/sf.db` is the canonical
structured store for initialized repos; `.sf/*.md` files are working guidance,
knowledge, generated projections, or recovery inputs.
`docs/specs/` is for humans: reviewable exports, reports, public contracts, and
git-history artifacts. SF may read these files as repo evidence during analysis,
but any fact required for SF's own future operation must be captured into
`.sf`/DB-backed state before it is treated as operational knowledge.
Generated docs in this directory are allowed to change. That is part of their
purpose: Git keeps the human-facing history of exported contracts and reports.
SF can keep its own operational history in `.sf`/SQLite when it needs runtime
memory, ledgers, replay, or drift analysis.
## What belongs here
- Long-lived spec documents describing behavior contracts, APIs, or protocols.
- Documents that outlive any single implementation plan.
- Long-lived human-readable behavior contracts, APIs, or protocols.
- Generated or promoted exports from `.sf` working state.
- Reports that should be visible in git history for reviewers and future humans.
- Documents that outlive any single implementation plan but remain downstream
from `.sf` when SF needs to use them operationally.
## What does NOT belong here
- SF working plans, state, knowledge, or task control files. Keep those in
`.sf` and use runtime tools to mutate DB-owned records.
- Architecture decisions (use `docs/adr/`).
- Implementation plans (use `docs/plans/`).
- A parallel `BASE_SPEC.md` or product-spec hierarchy for new SF work.
## Naming convention
@ -18,4 +38,5 @@ Durable specifications and contracts.
## See also
- [AGENTS.md#sf-planning-state](../AGENTS.md#sf-planning-state)
- [SF operating model export](./sf-operating-model.md)
- [AGENTS.md#sf-planning-state](../../AGENTS.md#sf-planning-state)

View file

@ -0,0 +1,152 @@
# SF Operating Model
## Working Model Inputs
- Generator: `sf-spec-projections@1`
- Database: `.sf/sf.db` (present)
- Guidance: `.sf/PRINCIPLES.md` (present)
- Guidance: `.sf/TASTE.md` (present)
- Guidance: `.sf/ANTI-GOALS.md` (present)
- Optional knowledge: `.sf/KNOWLEDGE.md` (missing)
- Optional preferences: `.sf/PREFERENCES.md` (present)
- Database schema version: 42
- DB planning rows: milestones=1, slices=0, tasks=0
- DB spec rows: milestone_specs=1, slice_specs=0, task_specs=0
- Source roots analyzed as implementation evidence: `src/resources/extensions/sf/`, `src/headless*.ts`, `src/cli.ts`, `src/help-text.ts`, `web/`, `vscode-extension/`, `packages/`
This file is a human export for review, navigation, and git history. Generated docs are allowed to change because Git keeps the human-facing history. If SF needs operational history or future-use knowledge, store it in `.sf`/DB-backed state instead of relying on this export.
SF has one workflow engine and one database-first working model. TUI, CLI, web, editor integrations, and non-interactive automation must all drive the same flow:
```text
intent -> structured state/evidence -> UOK/policy gates -> execution -> journal/evidence -> projected status
```
The names below are separate axes. Do not use one as a synonym for another.
## Flow
The flow is the product behavior: how SF captures intent, plans work, applies policy, executes tasks, records evidence, and reports status. Flow behavior must not fork by UI surface. If a TUI run and a non-interactive run receive the same state, run control, and permission profile, they should follow the same control model.
## Surface
A surface is where a person or program drives or observes the flow.
- **TUI surface** — interactive terminal UI.
- **CLI surface** — explicit commands and single-shot prompt entrypoints.
- **Web surface** — browser UI around the same state and flow.
- **Editor surface** — IDE/editor integration, usually through an adapter.
- **Machine surface** — non-interactive runner for CI, scripts, schedulers, and parent processes.
`sf headless` is the current command name for the machine surface. It means "without the TUI"; it does not mean "JSON", "autonomous", or a separate flow.
## Protocol
A protocol is how a surface or adapter talks to SF or to another agent process.
- **RPC** — SF's current child-process control channel.
- **stdio JSON-RPC** — process transport used by existing RPC-style adapters.
- **ACP** — editor/client protocol adapter for agent/editor interoperability.
- **HTTP/RPC** — possible daemon/web integration protocol.
- **Wire** — a low-level internal message layer, if introduced, below surfaces and adapters.
Protocols are adapters around the flow. They should translate messages, not invent policy or planning semantics.
## Output Format
An output format is only the encoding of a response or event stream.
- `text` — human-readable progress and results.
- `json` — one machine-readable result object.
- `stream-json` / JSONL — event stream for parent processes and monitors.
JSON is not a surface, run control, or permission profile. It is one output format that the machine surface can emit.
## Run Control
Run control describes how far SF continues through the flow before stopping for the operator.
- **manual** — user approves each consequential step.
- **assisted** — SF proposes and executes bounded steps, with human approval for important or uncertain actions.
- **autonomous** — SF continues through the flow until policy, evidence, budget, or completion stops it.
`auto` is acceptable only as shorthand in commands, flags, and UI labels. The concept name is **autonomous**.
## Permission Profile
A permission profile describes what SF is allowed to touch when a run-control mode asks it to act.
- **restricted** — read-mostly or planning-artifact-only permissions.
- **normal** — default workspace permissions for ordinary project work.
- **trusted** — wider local permissions for a trusted operator/workspace.
- **unrestricted** — explicit danger profile; use for logs, policy records, or deliberate bypass flows, not as friendly default product language.
Run control and permission profile are independent. For example, `autonomous + restricted` can keep going with narrow permissions, while `manual + trusted` still asks before each consequential step but can perform broader approved actions.
## Naming Rules
- Say **flow** for the shared planning/execution engine.
- Say **surface** for TUI, CLI, web, editor, or machine entrypoints.
- Say **protocol** for ACP, RPC, stdio JSON-RPC, HTTP, or wire messages.
- Say **output format** for `text`, `json`, and `stream-json`.
- Say **run control** for `manual`, `assisted`, and `autonomous`.
- Say **permission profile** for `restricted`, `normal`, `trusted`, and `unrestricted`.
- Use **headless** only for the current `sf headless` command and implementation path. Product docs should explain it as the machine surface.
## Working State Contract
SF working state is database-first. An initialized SF repo has `.sf/sf.db`, and runtime tools use it as the canonical structured store for planning hierarchy, ordering, gates, ledgers, schedules, and validation-sensitive state.
Markdown under `.sf/` has two roles:
- working guidance and knowledge that the runtime loads, such as `PRINCIPLES.md`, `TASTE.md`, `ANTI-GOALS.md`, `KNOWLEDGE.md`, and `PREFERENCES.md`;
- human-readable projections from DB-owned records, such as rendered decisions, requirements, roadmap, plan, summary, and state files.
Markdown under `docs/specs/` is a human export for review, navigation, and git history. Generated docs can change; Git records that human-facing history. If SF needs its own operational history, it should store that in `.sf`/DB-backed state. Plans should record any surface, protocol, output-format, run-control, or permission-profile impact explicitly when a milestone changes integration behavior.
## Source Placement
SF source placement follows the same axis model. New code should extend the owning axis instead of creating parallel trees.
### Core Flow
- `src/resources/extensions/sf/` owns the SF workflow extension: planning tools, UOK/runtime state, `/sf` commands, prompts, templates, doctors, schedule, and DB-backed state.
- `src/resources/extensions/` owns bundled extension packages loaded into the runtime.
- `src/resources/agents/`, `src/resources/skills/`, and `src/resources/workflows/` own bundled runtime resources, not independent product flows.
### Surfaces
- `src/cli.ts` and `src/help-text.ts` own CLI/session entrypoint behavior and command help.
- `src/headless*.ts` owns the existing `sf headless` machine-surface command path. Keep the command name; describe it as the machine surface in product language.
- `web/` owns the browser surface.
- `vscode-extension/` owns the editor surface.
- `packages/pi-tui/` owns reusable TUI primitives and terminal UI components.
### Protocols And Adapters
- `packages/rpc-client/` owns reusable RPC client protocol code.
- RPC child-process orchestration stays in the machine-surface path unless promoted into a reusable protocol package.
- ACP, stdio JSON-RPC, HTTP, and future wire layers are protocol/adapters. They should translate messages to the same SF flow, not fork planning semantics.
### Workspace Packages
- `packages/pi-agent-core/` owns reusable agent-core primitives.
- `packages/pi-ai/` owns provider/model integration.
- `packages/pi-coding-agent/` owns reusable coding-agent substrate inherited from Pi.
- `packages/daemon/` owns daemonized background service code.
- `packages/native/` and `rust-engine/` own native/Rust performance paths.
### State And Projections
- `.sf/sf.db` is the canonical structured runtime state store for initialized SF repos. Treat a missing or unreadable DB as bootstrap/recovery, not a normal alternate source of truth.
- `.sf/DECISIONS.md`, `.sf/REQUIREMENTS.md`, milestone roadmaps, and similar files are rendered working projections when database-backed tools own the data. They are useful to humans and agents but must not compete with DB rows.
- `.sf/PRINCIPLES.md`, `.sf/TASTE.md`, `.sf/ANTI-GOALS.md`, `.sf/KNOWLEDGE.md`, and `.sf/PREFERENCES.md` are repo-local working guidance files when present.
- Generated `.sf/` runtime files are evidence, projections, or import/recovery artifacts.
- Durable human-facing exports belong in `docs/specs/`, `docs/adr/`, or `docs/plans/`. They are reviewable projections and git-history artifacts, not a second planning database.
### Placement Rules
- Do not create a second implementation because a feature appears in another surface. Add an adapter to the same flow.
- Do not name output encodings as surfaces. JSON belongs to output formats.
- Do not name permission expansion as run control. `autonomous` means the loop continues; `trusted` or `unrestricted` means the permission profile widened.
- Do not route human questions because of `headless`. Questions come from run-control and permission-policy gates; the surface only determines delivery.

View file

@ -24,7 +24,7 @@ Use `sf schedule` when something needs to happen at a specific future time but c
SF has no long-running daemon. Entries are not "fired" by a timer. Instead, the schedule store is queried at specific integration points:
1. **On launch**`loader.ts` calls `findDue()` and prints a banner if items are due
2. **Auto-mode boundaries**`sf headless query` (and the TUI status overlay) includes due/upcoming entries in its output
2. **Auto-mode boundaries**`sf headless query` (machine snapshot) and the TUI status overlay include due/upcoming entries in their output
3. **CLI query**`sf schedule list --due` shows items whose `due_at <= now`
This means: if an item is scheduled for 3 AM and you open SF at 9 AM, you will see the item as overdue. There is no fire-at-exact-time guarantee. This is an explicit trade-off — see the [pull-based ADR](../adr/0002-sf-schedule-pull-based.md) for the full decision record.
@ -210,7 +210,7 @@ On every SF startup, `loader.ts` calls `findDue()` for both project and global s
[forge] N scheduled item(s) due now. Manage: /sf schedule list
```
### Headless Query (`sf headless query`)
### Machine Snapshot (`sf headless query`)
`headless-query.ts` populates a `schedule` field in `QuerySnapshot`:

View file

@ -61,7 +61,7 @@ When your project has independent milestones, you can run them simultaneously. E
A lock file tracks the current unit. If the session dies, the next `/sf autonomous` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context.
**Headless auto-restart (v2.26):** When running `sf headless autonomous`, crashes trigger automatic restart with exponential backoff (5s → 10s → 30s cap, default 3 attempts). Configure with `--max-restarts N`. SIGINT/SIGTERM bypasses restart. Combined with crash recovery, this enables true overnight "run until done" execution.
**Machine-surface auto-restart (v2.26):** When running `sf headless autonomous`, crashes trigger automatic restart with exponential backoff (5s → 10s → 30s cap, default 3 attempts). Configure with `--max-restarts N`. SIGINT/SIGTERM bypasses restart. Combined with crash recovery, this enables true overnight "run until done" execution. `headless` selects the non-interactive surface; `autonomous` selects run control.
### Provider Error Recovery

View file

@ -30,7 +30,7 @@
| `/sf rate` | Rate last unit's model tier (over/ok/under) — improves adaptive routing |
| `/sf changelog` | Show categorized release notes |
| `/sf logs` | Browse activity logs, debug logs, and metrics |
| `/sf remote` | Control remote autonomous mode |
| `/sf remote` | Configure remote question delivery |
| `/sf help` | Categorized command reference with descriptions for all SF subcommands |
## Configuration & Diagnostics
@ -99,7 +99,7 @@ See [Parallel Orchestration](./parallel-orchestration.md) for full documentation
| `/sf workflow pause` | Pause custom workflow autonomous mode |
| `/sf workflow resume` | Resume paused custom workflow autonomous mode |
`/sf autonomous` is the product-development loop that chooses the next useful unit from project state. `/sf start` is guided workflow kickoff and may ask clarifying questions. `/sf workflow run` executes an explicit YAML workflow definition. `/sf autonomous` remains supported as shorthand for `/sf autonomous`.
`/sf autonomous` is the product-development loop that chooses the next useful unit from project state. `/sf start` is guided workflow kickoff and may ask clarifying questions. `/sf workflow run` executes an explicit YAML workflow definition. There is no separate `/sf auto` mode.
## Extensions
@ -168,7 +168,7 @@ Enable with `github.enabled: true` in preferences. Requires `gh` CLI installed a
| `sf --continue` (`-c`) | Resume the most recent session for the current directory |
| `sf --model <id>` | Override the default model for this session |
| `sf --print "msg"` (`-p`) | Single-shot prompt mode (no TUI) |
| `sf --mode <text\|json\|rpc>` | Output mode for non-interactive use |
| `sf --mode <text\|json\|rpc>` | Session I/O mode: text/json print format or RPC protocol |
| `sf --list-models [search]` | List available models and exit |
| `sf --web [path]` | Start browser-based web interface (optional project path) |
| `sf --worktree` (`-w`) [name] | Start session in a git worktree (auto-generates name if omitted) |
@ -182,17 +182,23 @@ Enable with `github.enabled: true` in preferences. Requires `gh` CLI installed a
| `sf --debug` | Enable structured JSONL diagnostic logging for troubleshooting dispatch and state issues |
| `sf config` | Set up global API keys for search and docs tools (saved to `~/.sf/agent/auth.json`, applies to all projects). See [Global API Keys](./configuration.md#global-api-keys-sf-config). |
| `sf update` | Update SF to the latest version |
| `sf headless new-milestone` | Create a new milestone from a context file (headless — no TUI required) |
| `sf headless new-milestone` | Create a new milestone from a context file through the machine surface |
## Headless Mode
## Machine Surface (`sf headless`)
`sf headless` runs `/sf` commands without a TUI — designed for CI, cron jobs, and scripted automation. It spawns a child process in RPC mode, auto-responds to interactive prompts, detects completion, and exits with meaningful exit codes.
`sf headless` is SF's current machine surface. It runs the same SF flow as the
TUI, but without rendering the TUI, so CI jobs, cron jobs, parent processes, and
scripts can drive it. Headless is not JSON, run control, or a permission profile; it is a
non-interactive surface. RPC/ACP/stdio/HTTP are protocols around the flow.
`text`, `json`, and `stream-json` are output formats. Manual, assisted, and
autonomous are run-control modes. Restricted, normal, trusted, and unrestricted
are permission profiles.
```bash
# Show headless command help
sf headless
# Run autonomous mode without the TUI
# Run autonomous mode through the machine surface
sf headless autonomous
# Run a single unit
@ -221,7 +227,8 @@ echo "Build a CLI tool" | sf headless new-milestone --context -
|------|-------------|
| `--timeout N` | Overall timeout in milliseconds (default: 300000 / 5 min) |
| `--max-restarts N` | Auto-restart on crash with exponential backoff (default: 3). Set 0 to disable |
| `--json` | Stream all events as JSONL to stdout |
| `--json` | Alias for streaming events as JSONL to stdout |
| `--output-format text\|json\|stream-json` | Select human text, one batch JSON result, or streaming JSONL events |
| `--model ID` | Override the model for the headless session |
| `--context <file>` | Context file for `new-milestone` (use `-` for stdin) |
| `--context-text <text>` | Inline context text for `new-milestone` |
@ -233,7 +240,7 @@ Any `/sf` subcommand works as a positional argument — `sf headless status`, `s
### `sf headless query`
Returns a single JSON object with the full project snapshot — no LLM session, no RPC child, instant response (~50ms). This is the recommended way for orchestrators and scripts to inspect SF state.
Returns a single JSON object with the full project snapshot — no LLM session, no RPC child, instant response (~50ms). This is the recommended machine-readable projection for orchestrators and scripts to inspect SF state.
```bash
sf headless query | jq '.state.phase'

View file

@ -540,7 +540,8 @@ notifications:
### `remote_questions`
Route interactive questions to Slack or Discord for headless autonomous mode:
Route interactive questions to Slack or Discord when run control and the
permission profile require human input:
```yaml
remote_questions:

View file

@ -1,6 +1,6 @@
# Remote Questions
Remote questions allow SF to ask for user input via Slack, Discord, or Telegram when running in headless autonomous-mode. When SF encounters a decision point that needs human input, it posts the question to your configured channel and polls for a response.
Remote questions allow SF to ask for user input via Slack, Discord, or Telegram when the current run-control mode and permission profile require a human decision. When SF encounters a decision point that needs human input, it posts the question to your configured channel and polls for a response.
## Setup

View file

@ -93,7 +93,7 @@ models:
- openrouter/minimax/minimax-m2.5
```
**Headless mode:** `sf headless autonomous` auto-restarts the entire process on crash (default 3 attempts with exponential backoff). Combined with provider error auto-resume, this enables true overnight unattended execution.
**Machine surface:** `sf headless autonomous` auto-restarts the entire process on crash (default 3 attempts with exponential backoff). Combined with provider error auto-resume, this enables true overnight unattended execution.
For common provider setup issues (role errors, streaming errors, model ID mismatches), see the [Provider Setup Guide — Common Pitfalls](./providers.md#common-pitfalls).

View file

@ -101,7 +101,7 @@ function printNonTtyErrorAndExit(
);
process.stderr.write("[sf] Non-interactive alternatives:\n");
process.stderr.write(
"[sf] sf autonomous Autonomous mode (pipeable, no TUI)\n",
"[sf] sf autonomous Autonomous mode via machine surface\n",
);
process.stderr.write(
'[sf] sf --print "your message" Single-shot prompt\n',
@ -112,14 +112,14 @@ function printNonTtyErrorAndExit(
);
}
process.stderr.write(
"[sf] sf --mode rpc JSON-RPC over stdin/stdout\n",
"[sf] sf --mode rpc Session I/O mode: JSON-RPC over stdin/stdout\n",
);
process.stderr.write(
'[sf] sf --mode text "message" Text output mode\n',
'[sf] sf --mode text "message" Session I/O mode: text print format\n',
);
if (includeWebHint) {
process.stderr.write(
"[sf] sf headless autonomous Autonomous mode without TUI\n",
"[sf] sf headless autonomous Machine surface for /sf autonomous\n",
);
}
process.exit(1);
@ -561,7 +561,7 @@ if (cliFlags.messages[0] === "sessions") {
cliFlags._selectedSessionPath = selected.path;
}
// `sf headless ...` — run explicit /sf commands without the TUI
// `sf headless ...` — machine surface for explicit /sf commands
if (cliFlags.messages[0] === "headless") {
await ensureRtkBootstrap();
// Sync bundled resources before headless runs (#3471). Without this,
@ -588,11 +588,17 @@ async function runHeadlessFromAutonomous(
process.exit(0);
}
// `sf autonomous [args...]` — shorthand for headless autonomous mode (#2732).
function rawArgsAfterSubcommand(subcommand: string): string[] {
const args = process.argv.slice(2);
const index = args.indexOf(subcommand);
return index >= 0 ? args.slice(index + 1) : [];
}
// `sf autonomous [args...]` — shorthand for autonomous mode via the machine surface (#2732).
if (cliFlags.messages[0] === "autonomous") {
await runHeadlessFromAutonomous([
"autonomous",
...cliFlags.messages.slice(1),
...rawArgsAfterSubcommand("autonomous"),
]);
}
@ -898,18 +904,18 @@ if (!cliFlags.worktree && !isPrintMode) {
}
// ---------------------------------------------------------------------------
// Autonomous redirect: autonomous mode with piped stdout → headless mode (#2732)
// Autonomous redirect: autonomous mode with piped stdout -> machine surface (#2732)
// When stdout is not a TTY (e.g. `sf autonomous | cat`),
// the TUI cannot render and the process hangs. Redirect to headless mode
// the TUI cannot render and the process hangs. Redirect to the machine surface
// which handles non-interactive output gracefully.
// ---------------------------------------------------------------------------
if (cliFlags.messages[0] === "autonomous" && !process.stdout.isTTY) {
process.stderr.write(
"[forge] stdout is not a terminal — running autonomous mode in headless mode.\n",
"[forge] stdout is not a terminal — running autonomous mode through the machine surface.\n",
);
await runHeadlessFromAutonomous([
"autonomous",
...cliFlags.messages.slice(1),
...rawArgsAfterSubcommand("autonomous"),
]);
}

View file

@ -11,9 +11,7 @@ import {
mkdirSync,
readdirSync,
readFileSync,
renameSync,
statSync,
writeFileSync,
} from "node:fs";
import { join, relative, resolve } from "node:path";
import { ensureAgenticDocsScaffold } from "./resources/extensions/sf/agentic-docs-scaffold.js";
@ -45,20 +43,30 @@ interface ContextOptions {
const AUTO_BOOTSTRAP_MAX_BYTES = 180_000;
const AUTO_BOOTSTRAP_MAX_FILE_BYTES = 40_000;
const AUTO_BOOTSTRAP_ROOT_FILES = [
const AUTO_BOOTSTRAP_SF_SPEC_FILES = [
".sf/PROJECT.md",
".sf/REQUIREMENTS.md",
".sf/DECISIONS.md",
".sf/KNOWLEDGE.md",
".sf/RUNTIME.md",
".sf/STATE.md",
".sf/PRINCIPLES.md",
".sf/TASTE.md",
".sf/PREFERENCES.md",
".sf/ANTI-GOALS.md",
".sf/CODEBASE.md",
];
const AUTO_BOOTSTRAP_ORIENTATION_FILES = [
"TODO.md",
"SPEC.md",
"VISION.md",
"PURPOSE.md",
"MISSION.md",
"ROADMAP.md",
"ARCHITECTURE.md",
"BUILD_PLAN.md",
"README.md",
"AGENTS.md",
"CLAUDE.md",
"CONTRIBUTING.md",
];
const AUTO_BOOTSTRAP_PRIORITY_FILES = [
...AUTO_BOOTSTRAP_SF_SPEC_FILES,
...AUTO_BOOTSTRAP_ORIENTATION_FILES,
];
const AUTO_BOOTSTRAP_SOURCE_EXTENSIONS = new Set([
".go",
".ts",
@ -184,15 +192,19 @@ export function buildAutoBootstrapContext(basePath: string): string {
"# Autonomous Repo Bootstrap",
"",
"SF headless autonomous found no milestones. Use the repository files below as the seed context.",
"Research every relevant markdown document and every source file path before creating the initial milestone plan.",
"Research SF working specs first, then every relevant markdown document and every source file path before creating the initial milestone plan.",
"Use tool-based repository inspection for source contents; do not assume the seed excerpt is complete.",
"Treat .sf/PROJECT.md, .sf/REQUIREMENTS.md, .sf/DECISIONS.md, .sf/KNOWLEDGE.md, and .sf/RUNTIME.md as SF's canonical working spec/state docs when present.",
"Treat any root-level SPEC.md, BASE_SPEC.md, PRODUCT_SPEC.md, docs/specs files, or other docs as repo evidence for humans. Project facts SF needs later into SF's .sf working model and DB-backed state; do not create a parallel base-spec system.",
"For product-facing or workflow-facing work, research the product category and representative competitors before locking requirements or slices. Capture table stakes, differentiators, common failure modes, and what not to copy.",
"Extract the project purpose, vision, architecture, constraints, current TODOs, risks, eval/gate ideas, and implementation backlog.",
"Apply the ACE spec-first TDD shape when planning: purpose and consumer first, behavior contract before implementation, tests as specs, evidence after gates.",
"Every proposed milestone, slice, and task must state its purpose before implementation detail.",
"For each proposed slice, capture Observed/Inferred/Proposed facts, a falsifier, acceptance criteria, and the verification command or eval that proves it.",
"Use explorer-style subagents or equivalent high-context research passes before planning when the runtime supports them.",
"Recommended explorer passes: docs/purpose/vision; source architecture and dependency map; tests/gates/tooling; risks/backlog/eval candidates.",
"Merge explorer findings into one repo map with cited file paths before creating milestones.",
"Follow harness-engineering principles: keep AGENTS.md short as a table of contents, make docs/ the system of record, create versioned plans/evals, prefer mechanically enforced architecture/taste rules, and add cleanup/gardening work when repo knowledge is stale.",
"Follow harness-engineering principles: keep AGENTS.md short as a table of contents, use docs/ for human exports and git-history reports, capture operational knowledge in .sf/DB-backed state, prefer mechanically enforced architecture/taste rules, and add cleanup/gardening work when repo knowledge is stale.",
"Optimize for agent legibility: every milestone should improve the next agent's ability to understand, validate, and safely modify the repo.",
"Create actionable milestones and slices from the repo's docs and source tree rather than asking the user to restate them.",
"",
@ -251,7 +263,7 @@ function collectAutoBootstrapFiles(basePath: string): string[] {
const seen = new Set<string>();
const files: string[] = [];
for (const name of AUTO_BOOTSTRAP_ROOT_FILES) {
for (const name of AUTO_BOOTSTRAP_PRIORITY_FILES) {
const path = join(basePath, name);
if (existsMarkdownFile(path)) {
seen.add(path);
@ -319,129 +331,11 @@ function walkFiles(
return found;
}
// ---------------------------------------------------------------------------
// Serena MCP Auto-Enrollment
// ---------------------------------------------------------------------------
/**
* Register the project in Serena's global config and add it to .sf/mcp.json.
* Called from bootstrapProject so every `sf init` or auto-bootstrap enrolls
* the repo in Serena MCP automatically no extra flags needed.
*
* Uses `claude-code` context: disables tools SF already provides (read_file,
* execute_shell_command, etc.) so only Serena's unique symbol-level code
* intelligence tools are exposed via MCP.
*
* Availability check is deferred to first MCP connection `uvx --from serena-agent`
* resolves lazily. If Serena is not installed, the MCP client will surface the
* error; the user can then run `uv tool install serena-agent`.
*/
function ensureSerenaMcp(basePath: string): void {
// 1. Register project path in ~/.serena/serena_config.yml (no-op if already present)
const serenaConfigPath = join(
process.env.HOME ?? "/root",
".serena",
"serena_config.yml",
);
const projectPath = resolve(basePath);
if (existsSync(serenaConfigPath)) {
const content = readFileSync(serenaConfigPath, "utf-8");
const lines = content.split("\n");
// Check if project already registered
const _projectRe = /^(\s*)-\s*(.+)$/;
let inProjects = false;
let alreadyListed = false;
for (const line of lines) {
if (line.trim() === "projects:") {
inProjects = true;
} else if (inProjects) {
if (line.trim().startsWith("- ")) {
if (line.trim().slice(2).trim() === projectPath) {
alreadyListed = true;
}
} else if (
!line.trim().startsWith("-") &&
line.trim() !== "" &&
!line.startsWith("#") &&
!line.startsWith(" ")
) {
// End of projects list (next top-level key)
break;
}
}
}
if (!alreadyListed) {
// Find the projects: line and add our path after it
const newLines: string[] = [];
for (const line of lines) {
newLines.push(line);
if (line.trim() === "projects:") {
newLines.push(`- ${projectPath}`);
}
}
writeFileSync(serenaConfigPath, newLines.join("\n"), "utf-8");
}
} else {
// Create minimal global config with projects list
const serenaDir = join(serenaConfigPath, "..");
mkdirSync(serenaDir, { recursive: true });
writeFileSync(serenaConfigPath, `projects:\n- ${projectPath}\n`, "utf-8");
}
// 2. Add/update serena MCP server in .sf/mcp.json
const sfDir = join(basePath, ".sf");
const mcpPath = join(sfDir, "mcp.json");
let mcpConfig: Record<string, unknown> = {};
if (existsSync(mcpPath)) {
try {
mcpConfig = JSON.parse(readFileSync(mcpPath, "utf-8"));
} catch {
// Corrupt JSON — overwrite below
}
}
// Avoid overwriting if already configured
const servers = (mcpConfig.mcpServers ?? mcpConfig.servers ?? {}) as Record<
string,
unknown
>;
if (!servers["serena"]) {
servers["serena"] = {
command: "uvx",
args: [
"--from",
"serena-agent",
"serena",
"start-mcp-server",
"--transport",
"stdio",
"--project-from-cwd",
"--context",
"desktop-app",
],
};
mcpConfig.mcpServers = servers;
writeFileSync(
mcpPath,
JSON.stringify(mcpConfig, null, "\t") + "\n",
"utf-8",
);
}
}
/**
* Bootstrap .sf/ directory structure for headless new-milestone.
* Mirrors the bootstrap logic from guided-flow.ts showSmartEntry().
* Auto-migrates legacy project state directories to .sf/ on first encounter.
*/
export function bootstrapProject(basePath: string): void {
const sfDir = join(basePath, ".sf");
const legacyDir = join(basePath, "." + ["g", "sd"].join(""));
if (!existsSync(sfDir) && existsSync(legacyDir)) {
renameSync(legacyDir, sfDir);
process.stderr.write("[headless] Migrated legacy project state to .sf/\n");
}
if (!nativeIsRepo(basePath) || isInheritedRepo(basePath)) {
nativeInit(basePath, "main");
}
@ -454,7 +348,6 @@ export function bootstrapProject(basePath: string): void {
ensurePreferences(basePath);
ensureAgenticDocsScaffold(basePath);
ensureSiftIndexWarmup(basePath, {});
ensureSerenaMcp(basePath);
untrackRuntimeFiles(basePath);
// Run scaffold check after init — surfaces which files need real content

View file

@ -1,11 +1,11 @@
/**
* headless-types.ts shared types for the headless orchestrator surface.
* headless-types.ts shared types for the machine orchestrator surface.
*
* Purpose: provide a single source of truth for the structured result type
* emitted in --output-format json mode and the output format discriminator,
* emitted by the `sf headless` machine surface and the output format discriminator,
* so headless.ts, consumers, and tests agree on shape without circular deps.
*
* Consumer: headless.ts (orchestrator), external CI scripts parsing batch JSON.
* Consumer: headless.ts (machine orchestrator), external CI scripts parsing batch JSON.
*/
// ---------------------------------------------------------------------------
@ -37,11 +37,11 @@ export interface NotificationMetadata {
// ---------------------------------------------------------------------------
/**
* Discriminates the three headless output modes.
* Discriminates the three machine-surface output formats.
*
* Purpose: let callers declare how they want to receive session results
* (human-readable text, single JSON blob, or streaming JSONL) so the
* orchestrator can select the right serializer upfront.
* Purpose: let callers declare the encoding of session results
* (human-readable text, single JSON blob, or streaming JSONL) without coupling
* output format to surface, protocol, run control, or permission profile.
*
* Consumer: parseHeadlessArgs in headless.ts when handling --output-format.
*/

View file

@ -16,7 +16,6 @@ import {
existsSync,
mkdirSync,
readFileSync,
renameSync,
unlinkSync,
writeFileSync,
} from "node:fs";
@ -452,6 +451,8 @@ export function parseHeadlessArgs(argv: string[]): HeadlessOptions {
options.resumeSession = args[++i];
} else if (arg === "--bare") {
options.bare = true;
} else if (commandSeen) {
options.commandArgs.push(arg);
}
} else if (!commandSeen) {
if (arg === "autonomous") {
@ -728,14 +729,8 @@ async function runHeadlessOnce(
// Validate .sf/ directory (skip for new-milestone since we just bootstrapped it)
const sfDir = join(process.cwd(), ".sf");
const legacyDir = join(process.cwd(), "." + ["g", "sd"].join(""));
if (!isNewMilestone && !existsSync(sfDir)) {
if (existsSync(legacyDir)) {
renameSync(legacyDir, sfDir);
process.stderr.write(
"[headless] Migrated legacy project state to .sf/\n",
);
} else if (repairMissingSfSymlinkForHeadless(process.cwd())) {
if (repairMissingSfSymlinkForHeadless(process.cwd())) {
if (!options.json) {
process.stderr.write(
"[headless] Re-linked .sf to existing external project state\n",

View file

@ -158,12 +158,13 @@ const SUBCOMMAND_HELP: Record<string, string> = {
plan: [
"Usage: sf plan <command>",
"",
"Manage SF milestone planning artifacts and promote state to docs/.",
"Manage SF milestone planning artifacts and human docs exports.",
"",
"Commands:",
" promote <source> Copy a file from .sf/ to docs/plans/, docs/adr/, or docs/specs/",
" list List milestone and slice files in .sf/",
" diff Compare .sf/ state with promoted docs/ artifacts",
" specs <cmd> Generate, diff, or check docs/specs exports from .sf state",
"",
"See docs/plans/README.md, docs/adr/README.md, and docs/specs/README.md for conventions.",
].join("\n"),
@ -202,7 +203,7 @@ const SUBCOMMAND_HELP: Record<string, string> = {
headless: [
"Usage: sf headless [flags] [command] [args...]",
"",
"Run /sf commands without the TUI. Pass an explicit command such as autonomous, next, status, or query.",
"Machine surface for /sf commands. Runs the same SF flow without rendering the TUI.",
"",
"Flags:",
" --timeout N Overall timeout in ms (default: 300000)",
@ -221,7 +222,7 @@ const SUBCOMMAND_HELP: Record<string, string> = {
" next Run one unit",
" status Show progress dashboard",
" new-milestone Create a milestone from a specification document",
" query JSON snapshot: state + next dispatch + costs (no LLM)",
" query Machine snapshot: JSON state + next dispatch + costs (no LLM)",
"",
"new-milestone flags:",
" --context <path> Path to spec/PRD file (use '-' for stdin)",
@ -231,16 +232,16 @@ const SUBCOMMAND_HELP: Record<string, string> = {
"",
"Output formats:",
" text Human-readable progress on stderr (default)",
" json Collect events silently, emit structured HeadlessJsonResult on stdout at exit",
" json Collect events silently, emit one structured result on stdout at exit",
" stream-json Stream JSONL events to stdout in real time (same as --json)",
"",
"Examples:",
" sf headless Show this help",
" sf headless autonomous Run /sf autonomous without the TUI",
" sf headless autonomous Run /sf autonomous through the machine surface",
" sf headless next Run one unit",
" sf headless --output-format json autonomous Structured JSON result on stdout",
" sf headless --json status Machine-readable JSONL stream",
" sf headless --timeout 60000 With 1-minute timeout",
" sf headless --timeout 60000 autonomous Run autonomous with 1-minute timeout",
" sf headless --bare autonomous Minimal context (CI/ecosystem use)",
" sf headless --resume abc123 autonomous Resume a prior session",
" sf headless new-milestone --context spec.md Create milestone from file",
@ -249,7 +250,7 @@ const SUBCOMMAND_HELP: Record<string, string> = {
" sf headless --supervised autonomous Supervised orchestrator mode",
" sf headless --answers answers.json autonomous With pre-supplied answers",
" sf headless --events agent_end,extension_ui_request autonomous Filtered event stream",
" sf headless query Instant JSON state snapshot",
" sf headless query Instant machine JSON state snapshot",
"",
"Exit codes: 0 = success, 1 = error/timeout, 10 = blocked, 11 = cancelled",
].join("\n"),
@ -263,7 +264,7 @@ export function printHelp(version: string): void {
process.stdout.write("Usage: sf [options] [message...]\n\n");
process.stdout.write("Options:\n");
process.stdout.write(
" --mode <text|json|rpc> Output mode (default: interactive)\n",
" --mode <text|json|rpc> Session I/O mode: text/json print format or RPC protocol\n",
);
process.stdout.write(" --print, -p Single-shot print mode\n");
process.stdout.write(
@ -317,16 +318,16 @@ export function printHelp(version: string): void {
" worktree <cmd> Manage worktrees (list, merge, clean, remove)\n",
);
process.stdout.write(
" autonomous [args] Run autonomous mode without TUI (pipeable)\n",
" autonomous [args] Run autonomous mode through the machine surface (pipeable)\n",
);
process.stdout.write(
" headless [cmd] [args] Run explicit /sf commands without TUI\n",
" headless [cmd] [args] Machine surface for explicit /sf commands\n",
);
process.stdout.write(
" graph <subcommand> Manage knowledge graph (build, query, status, diff)\n",
);
process.stdout.write(
" plan <cmd> Manage SF planning artifacts (promote, list, diff)\n",
" plan <cmd> Manage SF planning artifacts and docs exports\n",
);
process.stdout.write(
" schedule <cmd> Manage time-bound reminders (add, list, done, cancel, snooze, run)\n",

View file

@ -1,27 +1,27 @@
# SF Workflow — Manual Bootstrap Protocol
# SF Workflow — DB-First Operating Protocol
> This document teaches you how to operate the SF planning methodology manually using files on disk.
> This document teaches the SF planning methodology from runtime projections on disk. SQLite (`.sf/sf.db`) is the canonical structured store; markdown files are working projections, evidence, or human-readable exports.
>
> **When to read this:** At the start of any session working on SF-managed work, or when loaded by `/sf`.
>
> **After reading this, always read `.sf/STATE.md` to find out what's next.**
> If the milestone has a `M###-CONTEXT.md`, read that too. If the active slice has an `S##-CONTEXT.md`, read that as well — these files contain project-specific decisions, reference paths, and implementation guidance that this generic methodology doc does not.
> **After reading this, read `.sf/STATE.md` only as a generated dashboard.**
> If the milestone has a projected roadmap, context, or plan file, read it for working context, but persist planning changes through SF engine tools so the DB and projections stay aligned.
---
## Quick Start: "What's next?"
Read these files in order and act on what they say:
Read these projections in order and act on what they say:
1. **`.sf/STATE.md`** — Where are we? What's the next action?
2. **`.sf/milestones/<active>/M###-ROADMAP.md`** — What's the plan? Which slices are done? (`STATE.md` tells you which milestone is active)
1. **`.sf/STATE.md`** — Generated dashboard: where are we, what's the next action?
2. **`.sf/milestones/<active>/M###-ROADMAP.md`** — Projected roadmap for the active milestone.
3. **`.sf/milestones/<active>/M###-CONTEXT.md`** — Milestone-level project decisions, reference paths, constraints. Read this before doing implementation work.
4. If a slice is active and has one, read **`S##-CONTEXT.md`** — Slice-specific decisions and constraints.
5. If a slice is active, read its **`S##-PLAN.md`** — Which tasks exist? Which are done?
6. If `.sf/CODEBASE.md` exists, skim it for fast structural orientation before broad code exploration.
7. If a task was interrupted, check for **`continue.md`** in the active slice directory — Resume from there.
Then do the thing `STATE.md` says to do next.
Then do the next engine-owned action. Do not edit `.sf/STATE.md` or `.sf/sf.db` directly.
---
@ -39,7 +39,7 @@ Milestone → a shippable version (4-10 slices)
## File Locations
All artifacts live in `.sf/` at the project root:
SF-owned working state lives in `.sf/` at the project root. DB rows are canonical when schema, ordering, priority, joins, or validation matter; files below are generated projections, evidence, or human-authored guidance:
```text
.sf/
@ -48,7 +48,7 @@ All artifacts live in `.sf/` at the project root:
CODEBASE.md # Generated codebase map cache (auto-refreshed by SF)
milestones/
M001/
M001-ROADMAP.md # Milestone plan (checkboxes = state)
M001-ROADMAP.md # Milestone plan projection (DB-owned state)
M001-CONTEXT.md # Optional: user decisions from discuss phase
M001-RESEARCH.md # Optional: codebase/tech research
M001-SUMMARY.md # Milestone rollup (updated as slices complete)
@ -192,7 +192,7 @@ Critical wiring between artifacts:
**Must-haves are what make verification mechanically checkable.** Truths are checked by running commands or reading output. Artifacts are checked by confirming files exist with real content. Key links are checked by confirming imports/references actually connect the pieces.
### `STATE.md`
### `STATE.md` projection
```markdown
# SF State
@ -479,7 +479,7 @@ The one-liner must be substantive: "JWT auth with refresh rotation using jose" n
1. Write slice `S##-SUMMARY.md` (compresses all task summaries).
2. Write slice `S##-UAT.md` — a non-blocking human test script derived from the slice's must-haves and demo sentence. The agent does NOT wait for UAT results.
3. Mark the slice checkbox in `M###-ROADMAP.md` as `[x]`.
4. Update `STATE.md` with new position.
4. Regenerate `STATE.md` from the engine-owned state.
5. Update milestone `M###-SUMMARY.md` with the completed slice's contributions.
6. Continue to next slice immediately. The user tests the UAT whenever convenient.
7. If the user reports UAT failures later, create fix tasks in the current or a new slice.
@ -536,14 +536,18 @@ The EXACT first thing to do when resuming. Not vague. Specific.
It is NOT the source of truth. It's a convenience dashboard.
**Sources of truth:**
**Canonical source:**
- `.sf/sf.db` → milestones, slices, task contracts, evidence, scheduling, and ordering when structured validation matters.
**Generated or supporting projections:**
- `M###-ROADMAP.md` → which slices exist and which are done
- `S##-PLAN.md` → which tasks exist within a slice
- `T##-SUMMARY.md` → what happened during a task
- `S##-SUMMARY.md` and `M###-SUMMARY.md` → compressed slice and milestone outcomes
**Update `STATE.md`** after every significant action:
**Regenerate `STATE.md`** through SF after every significant action:
- Active milestone/slice/task
- Recent decisions (last 3-5)
@ -557,54 +561,54 @@ If files disagree, **pause and surface to the user**:
- Roadmap says slice done but task summaries missing → inconsistency
- Task marked done but no summary → treat as incomplete
- Continue file exists for completed task → delete continue file
- State points to nonexistent slice/task → rebuild state from files
- State points to nonexistent slice/task → regenerate projections from the engine-owned state
---
## Git Strategy: Branch-Per-Slice with Squash Merge
## Git Strategy: Branch-Per-Milestone with Squash Merge
**Principle:** Main is always clean and working. Each slice gets an isolated branch. The user never runs a git command — the agent handles everything.
**Principle:** Main is always clean and working. Each milestone gets an isolated branch or worktree. Slices and tasks commit sequentially inside that milestone workspace. The user never runs a git command — the agent handles everything.
### Branch Lifecycle
1. **Slice starts** → create branch `sf/M001/S01` from main
2. **Per-task commits** on the branch — atomic, descriptive, bisectable
3. **Slice completes** → squash merge to main as one clean commit
4. **Branch deleted** — squash commit on main is the permanent record
1. **Milestone starts** → create `milestone/M001` from main, usually in `.sf/worktrees/M001/`
2. **Per-task commits** on the milestone branch — atomic, descriptive, bisectable
3. **Milestone completes** → squash merge to main as one clean commit
4. **Branch/worktree deleted** — squash commit on main is the permanent record
### What Main Looks Like
```git
feat(M001/S03): milestone and slice discuss commands
feat(M001/S02): extension scaffold and command routing
feat(M001/S01): file I/O foundation
feat(M003): autonomous dispatch contract
feat(M002): extension scaffold and command routing
feat(M001): file I/O foundation
```
One commit per slice. Individually revertable. Reads like a changelog.
One commit per milestone on main. Individually revertable. Reads like a changelog.
### What the Branch Looks Like
```git
sf/M001/S01:
test(S01/T03): round-trip tests passing
feat(S01/T03): file writer with round-trip fidelity
feat(S01/T02): markdown parser for plan files
feat(S01/T01): core types and interfaces
docs(S01): add slice plan
milestone/M001:
test(M001/S01/T03): round-trip tests passing
feat(M001/S01/T03): file writer with round-trip fidelity
feat(M001/S01/T02): markdown parser for plan files
feat(M001/S01/T01): core types and interfaces
docs(M001/S01): add slice plan projection
```
### Commit Conventions
| When | Format | Example |
|------|--------|---------|
| Task completed | `{type}(S01/T02): <one-liner from summary>` | Type inferred from title (`feat`, `fix`, `test`, etc.) |
| Plan/docs committed | `docs(S01): add slice plan` | Planning artifacts |
| Slice squash to main | `type(M001/S01): <slice title>` | Type inferred from title |
| State rebuild | `chore(S01/T02): auto-commit after state-rebuild` | Bookkeeping only |
| Task completed | `{type}(M001/S01/T02): <one-liner from summary>` | Type inferred from title (`feat`, `fix`, `test`, etc.) |
| Plan/docs committed | `docs(M001/S01): refresh slice plan projection` | Planning artifacts |
| Milestone squash to main | `type(M001): <milestone title>` | Type inferred from title |
| Projection rebuild | `chore(M001/S01/T02): refresh generated SF projections` | Bookkeeping only |
The system reads the task summary after execution and builds a meaningful commit message:
- **Subject**: `{type}({sliceId}/{taskId}): {one-liner}` — the one-liner from the summary frontmatter
- **Subject**: `{type}({milestoneId}/{sliceId}/{taskId}): {one-liner}` — the one-liner from the summary frontmatter
- **Type**: Inferred from the task title and one-liner (`feat`, `fix`, `test`, `refactor`, `docs`, `perf`, `chore`)
- **Body**: Key files from the summary frontmatter (up to 8 files listed)
@ -613,7 +617,7 @@ Commit types: `feat`, `fix`, `test`, `refactor`, `docs`, `perf`, `chore`
### Squash Merge Message
```git
feat(M001/S01): file I/O foundation
feat(M001): file I/O foundation
Agent can parse, format, load, and save all SF file types with round-trip fidelity.
@ -627,9 +631,9 @@ Tasks completed:
| Problem | Fix |
|---------|-----|
| Bad task | `git reset --hard HEAD~1` to previous commit on the branch |
| Bad slice | `git revert <squash commit>` on main |
| UAT failure after merge | Fix tasks on `sf/M001/S01-fix` branch, squash as `fix(M001/S01): <fix>` |
| Bad task | revert the task commit on the milestone branch |
| Bad milestone | `git revert <squash commit>` on main |
| UAT failure after merge | create follow-up fix tasks on a new milestone branch, squash as `fix(M001): <fix>` |
---
@ -668,7 +672,7 @@ This methodology doc is generic. Project-specific guidance belongs in the milest
## Checklist for a Fresh Session
1. Read `.sf/STATE.md`what's the next action?
1. Read `.sf/STATE.md`generated dashboard for the next action.
2. Check for `continue.md` in the active slice — is there interrupted work?
3. If resuming: read `continue.md`, delete it, pick up from "Next Action".
4. If starting fresh: read the active slice's `S##-PLAN.md`, find the next incomplete task.
@ -677,13 +681,13 @@ This methodology doc is generic. Project-specific guidance belongs in the milest
7. Do the work.
8. Verify the must-haves.
9. Write the summary.
10. Mark done, update `STATE.md`, advance.
11. If context is getting full or you're done for now: write `continue.md` if mid-task, or update `STATE.md` with next action if between tasks.
10. Mark done through the engine path, regenerate projections, advance.
11. If context is getting full or you're done for now: write `continue.md` if mid-task, or let SF regenerate `STATE.md` with the next action if between tasks.
## When Context Gets Large
If you sense context pressure (many files read, long execution, lots of tool output):
1. **If mid-task:** Write `continue.md` with exact resume state. Tell the user: "Context is getting full. I've saved progress to continue.md. Start a new session and run `/sf` to pick up where you left off, or `/sf auto` to resume in auto-execution mode."
2. **If between tasks:** Just update `STATE.md` with the next action. No continue file needed — the next session will read STATE.md and pick up the next task cleanly.
1. **If mid-task:** Write `continue.md` with exact resume state. Tell the user: "Context is getting full. I've saved progress to continue.md. Start a new session and run `/sf` to pick up where you left off, or `/sf autonomous` to resume in autonomous mode."
2. **If between tasks:** Let SF regenerate `STATE.md` with the next action. No continue file needed — the next session will read STATE.md and pick up the next task cleanly.
3. **Don't fight it.** The whole system is designed for this. A fresh session with the right files loaded is better than a stale session with degraded reasoning.

View file

@ -8,10 +8,7 @@ import {
writeFileSync,
} from "node:fs";
import { dirname, join } from "node:path";
import {
migrateLegacyGsdScaffold,
migrateLegacyScaffold,
} from "./scaffold-drift.js";
import { migrateLegacyScaffold } from "./scaffold-drift.js";
import {
bodyHash,
extractMarker,
@ -585,13 +582,6 @@ export function ensureAgenticDocsScaffold(basePath) {
error: err.message,
});
}
try {
migrateLegacyGsdScaffold(basePath);
} catch (err) {
logWarning("scaffold", "legacy GSD migration failed", {
error: err.message,
});
}
removeLegacyRootHarnessScaffold(basePath);
// Step 2: missing-file creation + pending-state silent upgrade.
for (const file of SCAFFOLD_FILES) {

View file

@ -13,20 +13,30 @@ const AUTO_BOOTSTRAP_MAX_INVENTORY_BYTES = readPositiveIntEnv(
"SF_AUTO_BOOTSTRAP_MAX_INVENTORY_BYTES",
12_000,
);
const AUTO_BOOTSTRAP_ROOT_FILES = [
const AUTO_BOOTSTRAP_SF_SPEC_FILES = [
".sf/PROJECT.md",
".sf/REQUIREMENTS.md",
".sf/DECISIONS.md",
".sf/KNOWLEDGE.md",
".sf/RUNTIME.md",
".sf/STATE.md",
".sf/PRINCIPLES.md",
".sf/TASTE.md",
".sf/PREFERENCES.md",
".sf/ANTI-GOALS.md",
".sf/CODEBASE.md",
];
const AUTO_BOOTSTRAP_ORIENTATION_FILES = [
"TODO.md",
"SPEC.md",
"VISION.md",
"PURPOSE.md",
"MISSION.md",
"ROADMAP.md",
"ARCHITECTURE.md",
"BUILD_PLAN.md",
"README.md",
"AGENTS.md",
"CLAUDE.md",
"CONTRIBUTING.md",
];
const AUTO_BOOTSTRAP_PRIORITY_FILES = [
...AUTO_BOOTSTRAP_SF_SPEC_FILES,
...AUTO_BOOTSTRAP_ORIENTATION_FILES,
];
const AUTO_BOOTSTRAP_SOURCE_EXTENSIONS = new Set([
".go",
".ts",
@ -95,15 +105,19 @@ export function buildAutoBootstrapContext(basePath) {
"# Autonomous Repo Bootstrap",
"",
"SF headless autonomous found no milestones. Use the repository files below as the seed context.",
"Research every relevant markdown document and every source file path before creating the initial milestone plan.",
"Research SF working specs first, then every relevant markdown document and every source file path before creating the initial milestone plan.",
"Use tool-based repository inspection for source contents; do not assume the seed excerpt is complete.",
"Treat .sf/PROJECT.md, .sf/REQUIREMENTS.md, .sf/DECISIONS.md, .sf/KNOWLEDGE.md, and .sf/RUNTIME.md as SF's canonical working spec/state docs when present.",
"Treat any root-level SPEC.md, BASE_SPEC.md, PRODUCT_SPEC.md, docs/specs files, or other docs as repo evidence for humans. Project facts SF needs later into SF's .sf working model and DB-backed state; do not create a parallel base-spec system.",
"For product-facing or workflow-facing work, research the product category and representative competitors before locking requirements or slices. Capture table stakes, differentiators, common failure modes, and what not to copy.",
"Extract the project purpose, vision, architecture, constraints, current TODOs, risks, eval/gate ideas, and implementation backlog.",
"Apply the ACE spec-first TDD shape when planning: purpose and consumer first, behavior contract before implementation, tests as specs, evidence after gates.",
"Every proposed milestone, slice, and task must state its purpose before implementation detail.",
"For each proposed slice, capture Observed/Inferred/Proposed facts, a falsifier, acceptance criteria, and the verification command or eval that proves it.",
"Use explorer-style subagents or equivalent high-context research passes before planning when the runtime supports them.",
"Recommended explorer passes: docs/purpose/vision; source architecture and dependency map; tests/gates/tooling; risks/backlog/eval candidates.",
"Merge explorer findings into one repo map with cited file paths before creating milestones.",
"Follow harness-engineering principles: keep AGENTS.md short as a table of contents, make docs/ the system of record, create versioned plans/evals, prefer mechanically enforced architecture/taste rules, and add cleanup/gardening work when repo knowledge is stale.",
"Follow harness-engineering principles: keep AGENTS.md short as a table of contents, use docs/ for human exports and git-history reports, capture operational knowledge in .sf/DB-backed state, prefer mechanically enforced architecture/taste rules, and add cleanup/gardening work when repo knowledge is stale.",
"Optimize for agent legibility: every milestone should improve the next agent's ability to understand, validate, and safely modify the repo.",
"Create actionable milestones and slices from the repo's docs and source tree rather than asking the user to restate them.",
"",
@ -163,7 +177,7 @@ function readPositiveIntEnv(name, fallback) {
function collectAutoBootstrapFiles(basePath) {
const seen = new Set();
const files = [];
for (const name of AUTO_BOOTSTRAP_ROOT_FILES) {
for (const name of AUTO_BOOTSTRAP_PRIORITY_FILES) {
const path = join(basePath, name);
if (existsMarkdownFile(path)) {
seen.add(path);

View file

@ -558,11 +558,11 @@ export const DISPATCH_RULES = [
},
},
{
// ADR-011 Phase 2 (gsd-2 ADR): mid-execution escalation handling.
// ADR-011 Phase 2 (SF ADR): mid-execution escalation handling.
// Autonomous mode is autonomous, so by default we accept the agent's
// recommendation and continue — the user can review/override later via
// `/sf escalate list --all`. Set `phases.escalation_auto_accept: false`
// to keep gsd-2's pause-and-ask behavior.
// to keep SF's pause-and-ask behavior.
// Must evaluate FIRST — phase-agnostic rules below (rewrite-docs gate,
// UAT checks, reassess) cannot run while a task is paused.
name: "escalating-task → auto-accept-or-pause",
@ -1123,7 +1123,7 @@ export const DISPATCH_RULES = [
},
},
{
// gsd-2 ADR-011 progressive planning: when a slice was created as a sketch
// SF ADR-011 progressive planning: when a slice was created as a sketch
// (slices.is_sketch=1) and the phases.progressive_planning preference is
// enabled, dispatch refine-slice instead of plan-slice. The refine unit
// expands the stored sketch_scope into a full plan using prior slice

View file

@ -1470,7 +1470,7 @@ export async function buildPlanSlicePrompt(
options,
) {
const prependBlocks = [];
// gsd-2 ADR-011 (progressive planning): when the refining-phase dispatch rule gracefully downgrades to
// SF ADR-011 (progressive planning): when the refining-phase dispatch rule gracefully downgrades to
// plan-slice (progressive_planning was toggled off mid-milestone), it
// forwards the stored sketch_scope as a SOFT hint — context, not a hard
// constraint. The planner is free to expand beyond it.
@ -1510,7 +1510,7 @@ export async function buildPlanSlicePrompt(
});
}
/**
* gsd-2 ADR-011 refine-slice: expand a sketch into a full plan using the current
* SF ADR-011 refine-slice: expand a sketch into a full plan using the current
* codebase state and prior slice summary. Mechanically similar to plan-slice
* but framed as a *transformation* (sketch full plan) rather than a
* blank-sheet planning pass. Reuses inlineDependencySummaries for prior
@ -1694,7 +1694,7 @@ export async function buildExecuteTaskPrompt(
).content;
}
let phaseAnchorSection = planAnchor ? formatAnchorForPrompt(planAnchor) : "";
// gsd-2 ADR-011 Phase 2: inject any resolved-but-unapplied escalation override
// SF ADR-011 Phase 2: inject any resolved-but-unapplied escalation override
// into this task's prompt. Claim is atomic via DB UPDATE WHERE IS NULL, so
// if a parallel build already injected it, we skip. Feature-gated by
// phases.mid_execution_escalation. Prepended to phaseAnchorSection so it
@ -1749,7 +1749,7 @@ export async function buildExecuteTaskPrompt(
return "## Project Memories\n(unavailable)";
}
})();
// gsd-2 ADR-011 P2: when the feature is enabled, teach the executor that it can
// SF ADR-011 P2: when the feature is enabled, teach the executor that it can
// surface non-obvious choices via the `escalation` field on sf_task_complete
// rather than silently picking. Auto-mode auto-accepts the recommendation
// (see phases.escalation_auto_accept), so this is low-cost overhead — but

View file

@ -1396,11 +1396,6 @@ export async function startAuto(ctx, pi, base, verboseMode, options) {
if (!s.paused) clearToolBaseline(pi);
const requestedStepMode = options?.step ?? false;
const interruptedAssessment = options?.interrupted ?? null;
// Pin full-autonomy on the session up-front. The branches below that set
// stepMode never override fullAutonomy — it carries through resume paths,
// fresh starts, and crash recovery so the milestone-complete code path can
// consult it without re-reading command-line options.
s.fullAutonomy = options?.fullAutonomy === true;
// Default: agent CAN ask the user. Autonomous mode flips this off so the
// agent must self-resolve via code/web/lookup.
s.canAskUser = options?.canAskUser !== false;

View file

@ -39,13 +39,6 @@ export class AutoSession {
active = false;
paused = false;
stepMode = false;
/**
* Full-autonomy mode: auto-merge milestone branches and chain to the next
* milestone without pausing for human review. Set from the `/sf autonomous full`
* command line. Consumed at milestone-complete to skip the review pause and
* auto-trigger merge + next-milestone dispatch. Git revert is the safety net.
*/
fullAutonomy = false;
/**
* When false, the agent is forbidden from calling ask_user_questions.
* Step mode sets this true; `/sf autonomous` sets it false.

View file

@ -1057,13 +1057,17 @@ export function registerDbTools(pi) {
promptGuidelines: [
"Use sf_plan_milestone for milestone planning instead of writing ROADMAP.md directly.",
"Keep parameters flat and provide the full milestone planning payload. Use either explicit slices or templateId-based scaffolding for common feat/fix/refactor patterns.",
"Use productResearch for product/category/competitor research; do not hide those findings inside visionMeeting.researcher.",
"The tool validates input, writes milestone and slice planning data transactionally, renders ROADMAP.md from DB, and clears both state and parse caches after success.",
],
parameters: Type.Object({
// ── Core identification + content (required) ──────────────────────
milestoneId: Type.String({ description: "Milestone ID (e.g. M001)" }),
title: Type.String({ description: "Milestone title" }),
vision: Type.String({ description: "Milestone vision" }),
vision: Type.String({
description:
"Milestone purpose/vision. Start with why this milestone exists before implementation detail.",
}),
slices: Type.Optional(
Type.Array(
Type.Object({
@ -1076,7 +1080,10 @@ export function registerDbTools(pi) {
demo: Type.String({
description: "Roadmap demo text / After this",
}),
goal: Type.String({ description: "Slice goal" }),
goal: Type.String({
description:
"Slice purpose/goal. Start with the capability this slice makes true before implementation detail.",
}),
successCriteria: Type.String({
description: "Slice success criteria block",
}),
@ -1158,6 +1165,68 @@ export function registerDbTools(pi) {
boundaryMapMarkdown: Type.Optional(
Type.String({ description: "Boundary map markdown block" }),
),
productResearch: Type.Optional(
Type.Object(
{
purpose: Type.String({
description:
"Why this product/category research exists for this milestone",
}),
consumer: Type.String({
description:
"Who consumes the research and what decision it informs",
}),
contract: Type.String({
description:
"What the research must answer before requirements or slices are locked",
}),
failureBoundary: Type.String({
description:
"What must happen if research is weak, stale, contradictory, or not applicable",
}),
evidence: Type.String({
description:
"Concrete sources/comparables inspected and what evidence was accepted",
}),
nonGoals: Type.String({
description:
"What this research intentionally does not decide or copy",
}),
invariants: Type.String({
description:
"Rules that remain true regardless of competitor findings",
}),
assumptions: Type.String({
description:
"Assumptions made where research could not fully resolve uncertainty",
}),
category: Type.String({
description: "Product category or workflow category",
}),
targetJob: Type.String({
description: "Target user job-to-be-done",
}),
comparables: Type.Array(Type.String(), {
description:
"3-5 competitors, comparable OSS tools, or production teams with specific findings",
}),
tableStakes: Type.Array(Type.String(), {
description: "Expected category behaviors users will assume",
}),
differentiators: Type.Array(Type.String(), {
description:
"Differentiators this roadmap should preserve or create",
}),
antiPatterns: Type.Array(Type.String(), {
description: "Failure modes or competitor patterns not to copy",
}),
},
{
description:
"Purpose-shaped product and competitor research. Required when the milestone is product-facing, workflow-facing, developer-experience, or market-positioning; omit only for purely internal mechanical fixes.",
},
),
),
schedule: Type.Optional(
Type.Array(
Type.Object({
@ -1216,7 +1285,7 @@ export function registerDbTools(pi) {
}),
researcher: Type.String({
description:
"Comparable products, OSS tools, market expectations, and external research",
"Product category, competitors/comparable OSS tools, market expectations, and external research with specific findings",
}),
deliveryLead: Type.String({
description:
@ -1346,7 +1415,10 @@ export function registerDbTools(pi) {
// ── Core identification + content (required) ──────────────────────
milestoneId: Type.String({ description: "Milestone ID (e.g. M001)" }),
sliceId: Type.String({ description: "Slice ID (e.g. S01)" }),
goal: Type.String({ description: "Slice goal" }),
goal: Type.String({
description:
"Slice purpose/goal. Start with the capability this slice makes true before implementation detail.",
}),
adversarialReview: Type.Optional(
Type.Object(
{
@ -1434,7 +1506,8 @@ export function registerDbTools(pi) {
taskId: Type.String({ description: "Task ID (e.g. T01)" }),
title: Type.String({ description: "Task title" }),
description: Type.String({
description: "Task description / steps block",
description:
"Task description / steps block. Start with Purpose: why this task exists and which part of the slice contract it closes.",
}),
estimate: Type.String({ description: "Task estimate string" }),
files: Type.Array(Type.String(), {
@ -1549,7 +1622,8 @@ export function registerDbTools(pi) {
taskId: Type.String({ description: "Task ID (e.g. T01)" }),
title: Type.String({ description: "Task title" }),
description: Type.String({
description: "Task description / steps block",
description:
"Task description / steps block. Start with Purpose: why this task exists and which part of the slice contract it closes.",
}),
estimate: Type.String({ description: "Task estimate string" }),
files: Type.Array(Type.String(), { description: "Files likely touched" }),
@ -1633,7 +1707,7 @@ export function registerDbTools(pi) {
description: "Whether a plan-invalidating blocker was discovered",
}),
),
// gsd-2 ADR-011 Phase 2: mid-execution escalation — agent flags an ambiguity
// SF ADR-011 Phase 2: mid-execution escalation — agent flags an ambiguity
// for the user. Only honored when phases.mid_execution_escalation=true.
escalation: Type.Optional(
Type.Object(
@ -1672,7 +1746,7 @@ export function registerDbTools(pi) {
},
{
description:
"gsd-2 ADR-011 P2: optional escalation payload. Only honored when phases.mid_execution_escalation is true.",
"SF ADR-011 P2: optional escalation payload. Only honored when phases.mid_execution_escalation is true.",
},
),
),

View file

@ -151,7 +151,7 @@ export function registerExecTools(pi) {
/**
* Reload the agent snapshot state, restart, and resume the same session.
*
* In headless mode: writes sessionId to a sentinel file and exits with EXIT_RELOAD.
* In the machine surface: writes sessionId to a sentinel file and exits with EXIT_RELOAD.
* The supervisor detects EXIT_RELOAD, reads the sessionId, and restarts with --resume.
* The agent resumes the same session with fresh extension code.
*
@ -172,7 +172,7 @@ export function registerExecTools(pi) {
"Snapshot and reload the pi-agent so it resumes the same session with fresh extension code",
promptGuidelines: [
"Use this to reload extension code, external MCP server config, or native tools without losing the session.",
"The supervisor will resume the same session automatically in headless mode.",
"The machine-surface supervisor will resume the same session automatically.",
"In interactive TUI: the process exits and you restart manually.",
],
parameters: Type.Object({}),

View file

@ -2,10 +2,10 @@
* SF Command /sf backlog
*
* Structured backlog management with 999.x numbering.
* Items live in `.sf/sf.db`; `.sf/WORK-QUEUE.md` is a legacy import fallback.
* Items live in `.sf/sf.db`.
* Items can be promoted to active slices via add-slice.
*/
import { existsSync, mkdirSync, readFileSync } from "node:fs";
import { mkdirSync } from "node:fs";
import { join } from "node:path";
import { sfRoot } from "./paths.js";
import {
@ -22,50 +22,8 @@ function ensureBacklogDb(basePath) {
return openDatabase(join(sfDir, "sf.db"));
}
function legacyBacklogPath(basePath) {
return join(sfRoot(basePath), "WORK-QUEUE.md");
}
function parseLegacyBacklog(basePath) {
const filePath = legacyBacklogPath(basePath);
if (!existsSync(filePath)) return [];
const content = readFileSync(filePath, "utf-8");
const items = [];
for (const line of content.split("\n")) {
const match = line.match(
/^- \[([ x])\] (999\.\d+) — (.+?)(?:\s*\((.+)\))?$/,
);
if (match) {
items.push({
id: match[2],
title: match[3].trim(),
status: match[1] === "x" ? "promoted" : "pending",
note: match[4] ?? "",
});
}
}
return items;
}
function importLegacyBacklogIfNeeded(basePath) {
if (listBacklogItems().length > 0) return 0;
let imported = 0;
for (const item of parseLegacyBacklog(basePath)) {
addBacklogItemToDb({
id: item.id,
title: item.title,
status: item.status,
note: item.note,
source: "legacy-work-queue",
});
imported += 1;
}
return imported;
}
function currentItems(basePath) {
if (!ensureBacklogDb(basePath)) return parseLegacyBacklog(basePath);
importLegacyBacklogIfNeeded(basePath);
if (!ensureBacklogDb(basePath)) return [];
return listBacklogItems();
}
@ -99,7 +57,6 @@ async function addBacklogItem(basePath, title, ctx) {
ctx.ui.notify("Backlog DB is unavailable; cannot add item.", "warning");
return;
}
importLegacyBacklogIfNeeded(basePath);
const cleanTitle = title.replace(/^['"]|['"]$/g, "");
const date = new Date().toISOString().slice(0, 10);
const id = addBacklogItemToDb({
@ -123,7 +80,6 @@ async function promoteBacklogItem(basePath, itemId, ctx, _pi) {
ctx.ui.notify("Backlog DB is unavailable; cannot promote item.", "warning");
return;
}
importLegacyBacklogIfNeeded(basePath);
const item = listBacklogItems().find((i) => i.id === itemId);
if (!item) {
ctx.ui.notify(`Backlog item ${itemId} not found.`, "warning");
@ -153,7 +109,6 @@ async function removeBacklogItem(basePath, itemId, ctx) {
ctx.ui.notify("Backlog DB is unavailable; cannot remove item.", "warning");
return;
}
importLegacyBacklogIfNeeded(basePath);
const item = listBacklogItems().find((i) => i.id === itemId);
if (!item) {
ctx.ui.notify(`Backlog item ${itemId} not found.`, "warning");

View file

@ -41,7 +41,7 @@ const TOP_LEVEL_SUBCOMMANDS = [
{ cmd: "init", desc: "Project init wizard" },
{ cmd: "setup", desc: "Global setup status and configuration" },
{ cmd: "migrate", desc: "Migrate a v1 .planning directory to .sf format" },
{ cmd: "remote", desc: "Control remote autonomous mode" },
{ cmd: "remote", desc: "Configure remote question delivery" },
{ cmd: "steer", desc: "Hard-steer plan documents during execution" },
{ cmd: "inspect", desc: "Show SQLite DB diagnostics" },
{ cmd: "knowledge", desc: "Add persistent project knowledge" },

View file

@ -1,4 +1,4 @@
// SF Command — `/sf escalate` (gsd-2 ADR-011 P2)
// SF Command — `/sf escalate` (SF ADR-011 P2)
//
// Subcommands:
// list [--all] — show active escalations; --all also includes resolved

View file

@ -17,6 +17,7 @@ import {
readdirSync,
readFileSync,
statSync,
writeFileSync,
} from "node:fs";
import { homedir } from "node:os";
import {
@ -30,6 +31,7 @@ import {
} from "node:path";
import { projectRoot } from "./commands/context.js";
import { repoIdentity } from "./repo-identity.js";
import { PROMOTED_SPEC_PROJECTIONS } from "./spec-projections.js";
function getSfHome() {
return process.env.SF_HOME || join(homedir(), ".sf");
@ -157,6 +159,23 @@ function _formatDiffLine(line) {
return ` ${line}`;
}
function unifiedDiffStrings(labelA, contentA, labelB, contentB) {
if (contentA === contentB) return "";
const aLines = contentA.split("\n");
const bLines = contentB.split("\n");
const lines = [`--- ${labelA}`, `+++ ${labelB}`];
const max = Math.max(aLines.length, bLines.length);
for (let i = 0; i < max; i++) {
const a = aLines[i];
const b = bLines[i];
if (a === b) continue;
lines.push(`@@ line ${i + 1} @@`);
if (a !== undefined) lines.push(`-${a}`);
if (b !== undefined) lines.push(`+${b}`);
}
return `${lines.join("\n")}\n`;
}
// ─── Subcommand: promote ────────────────────────────────────────────────────
export async function handlePlanPromote(args, ctx) {
@ -365,10 +384,98 @@ export async function handlePlanDiff(args, ctx) {
ctx.ui.notify(filtered.join("\n"), "info");
}
// ─── Subcommand: specs ──────────────────────────────────────────────────────
/**
* Generate, diff, or check docs/specs human exports.
*
* Purpose: make docs/specs reproducible human exports from SF's DB-first
* working model instead of unmanaged snapshots.
*
* Consumer: handlePlan router for `/sf plan specs ...`.
*/
export async function handlePlanSpecs(args, ctx) {
const subcmd = args.trim() || "diff";
const root = projectRoot();
const dbPath = join(root, ".sf", "sf.db");
if (!existsSync(dbPath)) {
ctx.ui.notify(
"Cannot project specs: .sf/sf.db is missing. Initialize or recover SF state first.",
"error",
);
return { ok: false, changed: [] };
}
const changed = [];
const diffs = [];
for (const projection of PROMOTED_SPEC_PROJECTIONS) {
const targetPath = join(root, projection.path);
const generated = projection.generate(root);
const existing = existsSync(targetPath)
? readFileSync(targetPath, "utf-8")
: "";
if (existing !== generated) {
changed.push(projection.path);
if (subcmd === "generate" || subcmd === "write") {
mkdirSync(dirname(targetPath), { recursive: true });
writeFileSync(targetPath, generated);
} else {
diffs.push(
unifiedDiffStrings(
projection.path,
existing,
`${projection.path} (generated)`,
generated,
),
);
}
}
}
if (subcmd === "generate" || subcmd === "write") {
ctx.ui.notify(
changed.length === 0
? "Spec exports already match generated projections."
: `Generated spec exports:\n${changed.map((p) => ` ${p}`).join("\n")}`,
"info",
);
return { ok: true, changed };
}
if (subcmd === "check") {
if (changed.length === 0) {
ctx.ui.notify("Spec exports match generated projections.", "info");
return { ok: true, changed };
}
ctx.ui.notify(
`Spec exports are stale:\n${changed.map((p) => ` ${p}`).join("\n")}\nRun /sf plan specs generate.`,
"error",
);
return { ok: false, changed };
}
if (subcmd === "diff") {
ctx.ui.notify(
changed.length === 0
? "Spec exports match generated projections."
: diffs.join("\n"),
changed.length === 0 ? "info" : "warning",
);
return { ok: changed.length === 0, changed };
}
ctx.ui.notify("Usage: /sf plan specs generate|diff|check", "warning");
return { ok: false, changed: [] };
}
// ─── Top-level router ───────────────────────────────────────────────────────
export async function handlePlan(args, ctx) {
const trimmed = args.trim();
if (trimmed === "specs" || trimmed.startsWith("specs ")) {
await handlePlanSpecs(trimmed.replace(/^specs\s*/, ""), ctx);
return true;
}
if (trimmed.startsWith("promote ") || trimmed === "promote") {
await handlePlanPromote(trimmed.replace(/^promote\s*/, ""), ctx);
return true;
@ -382,7 +489,7 @@ export async function handlePlan(args, ctx) {
return true;
}
if (trimmed === "") {
ctx.ui.notify("Usage: /sf plan promote|list|diff ...", "info");
ctx.ui.notify("Usage: /sf plan promote|list|diff|specs ...", "info");
return true;
}
return false;

View file

@ -46,7 +46,7 @@ export const TOP_LEVEL_SUBCOMMANDS = [
{ cmd: "scan", desc: "Run source and project scans" },
{
cmd: "escalate",
desc: "List, show, or resolve task escalations (gsd-2 ADR-011 P2)",
desc: "List, show, or resolve task escalations (SF ADR-011 P2)",
},
{ cmd: "changelog", desc: "Show categorized release notes" },
{ cmd: "triage", desc: "Manually trigger triage of pending captures" },
@ -108,7 +108,7 @@ export const TOP_LEVEL_SUBCOMMANDS = [
},
{ cmd: "setup", desc: "Global setup status and configuration" },
{ cmd: "migrate", desc: "Migrate a v1 .planning directory to .sf format" },
{ cmd: "remote", desc: "Control remote autonomous mode" },
{ cmd: "remote", desc: "Configure remote question delivery" },
{ cmd: "steer", desc: "Hard-steer plan documents during execution" },
{ cmd: "inspect", desc: "Show SQLite DB diagnostics" },
{
@ -199,14 +199,6 @@ export const TOP_LEVEL_SUBCOMMANDS = [
*/
const NESTED_COMPLETIONS = {
autonomous: [
{
cmd: "full",
desc: "Auto-merge milestones; chain end-to-end without review",
},
{
cmd: "--full",
desc: "Auto-merge milestones; chain end-to-end without review",
},
{ cmd: "--verbose", desc: "Show detailed execution output" },
{ cmd: "--debug", desc: "Enable debug logging" },
],

View file

@ -107,7 +107,7 @@ export async function guardRemoteSession(ctx, _pi) {
);
} else if (result.error) {
ctx.ui.notify(
`Failed to stop remote autonomous mode: ${result.error}`,
`Failed to stop remote autonomous run: ${result.error}`,
"error",
);
} else {

View file

@ -46,6 +46,12 @@ export function parseMilestoneTarget(input) {
const rest = input.replace(match[0], "").replace(/\s+/g, " ").trim();
return { milestoneId: match[1], rest };
}
function hasRemovedFullFlag(input) {
return input
.split(/\s+/)
.filter(Boolean)
.some((token) => token === "full" || token === "--full");
}
/**
* Dispatch entry point for the autonomous command family.
*
@ -55,26 +61,27 @@ export function parseMilestoneTarget(input) {
* searching), `false` when the command isn't autonomous-related.
*
* Recognised flags on autonomous:
* - `full` or `--full` full-autonomy mode (auto-merge + chain milestones)
* - `--verbose` verbose execution output
* - `--debug` enable debug logging via SF_DEBUG
* - `M001` (positional) milestone target lock (only run that milestone)
* - `--yolo=<file>` yolo seed; bootstraps a fresh milestone from a brief
* - `--yolo <file>` yolo seed; bootstraps a fresh milestone from a brief
*
* The handler validates milestone targets exist, gates remote sessions, then
* dispatches via `launchAuto` (which routes between headless and detached
* dispatches via `launchAuto` (which routes between machine-surface and detached
* spawn paths).
*/
export async function handleAutoCommand(trimmed, ctx, pi) {
const isAutonomousVerb =
trimmed === "autonomous" || trimmed.startsWith("autonomous ");
/**
* Route an autonomous launch through either the headless (in-process) or
* detached (spawned subprocess) entry point depending on `SF_HEADLESS`.
* Route an autonomous launch through either the machine-surface
* (in-process) or detached (spawned subprocess) entry point depending on
* `SF_HEADLESS`.
*
* Headless mode runs the auto loop in the current process (used by CI,
* tests, and `sf headless`); detached mode forks a long-running child so the
* interactive shell stays responsive while autonomous mode runs.
* The machine surface runs the autonomous loop in the current process
* (used by CI, tests, and `sf headless`); detached mode forks a long-running
* child so the interactive shell stays responsive while autonomous run
* control continues.
*/
const launchAuto = async (verboseMode, options) => {
if (process.env.SF_HEADLESS === "1") {
@ -117,12 +124,14 @@ export async function handleAutoCommand(trimmed, ctx, pi) {
parseMilestoneTarget(afterYolo);
const verboseMode = afterMilestone.includes("--verbose");
const debugMode = afterMilestone.includes("--debug");
// `/sf autonomous full` (or `--full`): full-autonomy mode — auto-merges
// milestone branches and chains to the next milestone without pausing
// for human review. Git revert is the safety net.
const fullAutonomy =
/\bfull\b/.test(afterMilestone) || afterMilestone.includes("--full");
const canAskUser = false;
if (hasRemovedFullFlag(afterMilestone)) {
ctx.ui.notify(
"`/sf autonomous full` was removed. Use `/sf autonomous`; autonomous run control already continues through eligible milestones until policy, evidence, budget, blockers, or completion stops it.",
"warning",
);
return true;
}
if (debugMode) enableDebug(projectRoot());
if (!(await guardRemoteSession(ctx, pi))) return true;
// Validate the milestone target exists and is not already complete.
@ -157,11 +166,10 @@ export async function handleAutoCommand(trimmed, ctx, pi) {
} else if (milestoneId) {
await launchAuto(verboseMode, {
milestoneLock: milestoneId,
fullAutonomy,
canAskUser,
});
} else {
await launchAuto(verboseMode, { fullAutonomy, canAskUser });
await launchAuto(verboseMode, { canAskUser });
}
return true;
}
@ -175,7 +183,7 @@ export async function handleAutoCommand(trimmed, ctx, pi) {
);
} else if (result.error) {
ctx.ui.notify(
`Failed to stop remote autonomous mode: ${result.error}`,
`Failed to stop remote autonomous run: ${result.error}`,
"error",
);
} else {

View file

@ -117,7 +117,7 @@ export function showHelp(ctx, args = "") {
" /sf cleanup Remove merged branches or snapshots [branches|snapshots]",
" /sf worktree Manage worktrees from the TUI [list|merge|clean|remove]",
" /sf migrate Migrate .planning/ (v1) to .sf/ (v2) format",
" /sf remote Control remote autonomous mode [slack|discord|status|disconnect]",
" /sf remote Configure remote question delivery [slack|discord|status|disconnect]",
" /sf inspect Show SQLite DB diagnostics (schema, row counts, recent entries)",
" /sf update Update SF to the latest version via npm",
];

View file

@ -137,7 +137,7 @@ export async function handleNotificationsCommand(args, ctx, _pi) {
// Fall through to text output if overlay fails
}
}
// Text fallback (RPC/headless mode)
// Text fallback (RPC/machine surface)
const unread = getUnreadCount();
const entries = readNotifications().slice(0, 10);
if (entries.length === 0) {

View file

@ -431,7 +431,7 @@ Examples:
ctx,
);
if (handled) return true;
ctx.ui.notify("Usage: /sf plan promote|list|diff ...", "info");
ctx.ui.notify("Usage: /sf plan promote|list|diff|specs ...", "info");
return true;
}
return false;

View file

@ -180,7 +180,7 @@ Setting `prefer_skills: []` does **not** disable skill discovery — it just mea
- `indexer_backend`: `"sift"` or `"none"` — codebase-indexer backend used for prompt guidance and `/sf codebase indexer status`. Default: `"sift"`.
- `/sf codebase indexer status` reports Sift status. Install `rupurt/sift` on `PATH` or set `SIFT_PATH`.
- `remote_questions`: route interactive questions to Slack/Discord for headless autonomous mode. Keys:
- `remote_questions`: route interactive questions to Slack/Discord for machine-surface autonomous runs. Keys:
- `channel`: `"slack"` or `"discord"` — channel type.
- `channel_id`: string or number — channel ID.
- `timeout_minutes`: number — question timeout in minutes (clamped 1-30).
@ -661,7 +661,7 @@ remote_questions:
---
```
Routes interactive questions to a Slack channel for headless autonomous mode sessions. Questions time out after 15 minutes if unanswered.
Routes interactive questions to a Slack channel for machine-surface autonomous sessions. Questions time out after 15 minutes if unanswered.
---

View file

@ -312,7 +312,7 @@ function checkRemoteQuestionsProvider() {
category: "remote",
status: autoResolvable ? "unconfigured" : "warning",
message: autoResolvable
? `${label} — token not found (headless questions auto-resolve on timeout)`
? `${label} — token not found (remote questions auto-resolve on timeout)`
: `${label} — channel configured but token not found`,
detail: info?.envVar
? `Set ${info.envVar} or run /sf keys`

View file

@ -78,7 +78,6 @@ const SF_FORM_LINT_SKIP_DIRS = new Set([
"sift",
".sift",
]);
const SF_LEGACY_BRANDING_RE = /\bGSD\b/g;
function pruneEmptyDir(path) {
try {
@ -315,29 +314,6 @@ function checkSfFormSyntax(basePath, issues, fixesApplied, shouldFix) {
fixesApplied.push(`repaired .sf form syntax in .sf/${relPath}`);
}
}
if (
(relPath.endsWith("AGENTS.md") || relPath.endsWith("CLAUDE.md")) &&
SF_LEGACY_BRANDING_RE.test(content)
) {
issues.push({
severity: "warning",
code: "legacy_gsd_scaffold_text",
scope: "project",
unitId: "project",
message: `.sf/${relPath} contains legacy GSD naming; generated SF guidance should use SF naming.`,
file: `.sf/${relPath}`,
fixable: true,
});
if (shouldFix("legacy_gsd_scaffold_text")) {
writeFileSync(
filePath,
content.replace(SF_LEGACY_BRANDING_RE, "SF"),
"utf-8",
);
fixesApplied.push(`rewrote legacy GSD naming in .sf/${relPath}`);
}
}
}
}

View file

@ -1,4 +1,4 @@
// SF Extension — gsd-2 ADR-011 Phase 2 Mid-Execution Escalation
// SF Extension — SF ADR-011 Phase 2 Mid-Execution Escalation
//
// Owns: artifact I/O (read/build/write), detection, producer-side flag
// flips, user-facing resolution, carry-forward injection (claim/format),
@ -8,7 +8,7 @@
// into downstream prompts via getRelevantMemoriesRanked.
//
// SF's local ADR-011 is "Swarm Chat and Debate Mode" — unrelated.
// The reject-blocker choice from gsd-2 is deferred — needs a
// The reject-blocker choice from SF is deferred — needs a
// blocker_source column SF doesn't yet have.
import { existsSync, mkdirSync, readFileSync } from "node:fs";
import { dirname, join } from "node:path";
@ -215,7 +215,7 @@ export function detectPendingEscalation(tasks, _basePath) {
}
return null;
}
/** gsd-2 ADR-011 P2 carry-forward injection: when a previous task in this slice
/** SF ADR-011 P2 carry-forward injection: when a previous task in this slice
* had an escalation that the user resolved, atomically claim the override
* (race-safe via DB UPDATE) and return the markdown block to prepend to
* the next executor's prompt. Returns null when no unapplied override

View file

@ -1,12 +1,13 @@
/**
* execution-policy.js named execution policy profiles for SF tool calls.
* execution-policy.js internal execution permission profiles for SF tool calls.
*
* Purpose: give auto/headless a Codex/Crush-style vocabulary for filesystem,
* network, git, and mutation posture before those policies are enforced at
* every tool boundary.
* Purpose: give autonomous run control and the machine surface a Codex/Crush-style
* vocabulary for filesystem, network, git, and mutation posture before those
* policies are enforced at every tool boundary.
*
* Consumer: write-gate, auto-mode, headless event DTOs, and future DB evidence
* rows that need stable policy names instead of ad hoc booleans.
* Consumer: write-gate, autonomous run-control hooks, machine-surface event DTOs,
* and future DB evidence rows that need stable permission-profile names instead
* of ad hoc booleans.
*/
import { shouldBlockQueueExecutionInSnapshot } from "./bootstrap/write-gate.js";
@ -15,6 +16,7 @@ import { classifyCommand } from "./safety/destructive-guard.js";
export const EXECUTION_POLICY_PROFILES = {
plan: {
id: "plan",
permissionProfile: "restricted",
filesystem: "read-mostly",
network: "read-only",
git: "read-only",
@ -22,6 +24,15 @@ export const EXECUTION_POLICY_PROFILES = {
},
build: {
id: "build",
permissionProfile: "normal",
filesystem: "workspace-write",
network: "allowed",
git: "normal",
mutation: "workspace",
},
trusted: {
id: "trusted",
permissionProfile: "trusted",
filesystem: "workspace-write",
network: "allowed",
git: "normal",
@ -29,6 +40,7 @@ export const EXECUTION_POLICY_PROFILES = {
},
unrestricted: {
id: "unrestricted",
permissionProfile: "unrestricted",
filesystem: "danger-full-access",
network: "allowed",
git: "dangerous",
@ -91,6 +103,7 @@ export function classifyExecutionPolicyCall(profileId, toolName, input = "") {
if (queueDecision.block) {
return {
profile: profile.id,
permissionProfile: profile.permissionProfile,
allowed: false,
reason: queueDecision.reason,
risk: "mutation",
@ -108,6 +121,7 @@ export function classifyExecutionPolicyCall(profileId, toolName, input = "") {
profile.mutation !== "planning-artifacts-only";
return {
profile: profile.id,
permissionProfile: profile.permissionProfile,
allowed,
reason: allowed ? "allowed" : "destructive command requires approval",
risk: bashRisk.risk,

View file

@ -686,10 +686,10 @@ export function bootstrapProject(basePath) {
untrackRuntimeFiles(basePath);
}
/**
* Headless milestone creation from a seed specification document.
* Machine-surface milestone creation from a seed specification document.
* Bootstraps the project if needed, generates the next milestone ID,
* and dispatches the headless discuss prompt. Headless mode may ask only
* the final depth-verification gate before promoting draft knowledge.
* and dispatches the machine-surface discuss prompt. Run-control policy may
* ask only the final depth-verification gate before promoting draft knowledge.
*/
export async function showHeadlessMilestoneCreation(
ctx,

View file

@ -191,6 +191,10 @@ function renderRoadmapMarkdown(milestone, slices) {
lines.push("");
lines.push(`**Vision:** ${milestone.vision}`);
lines.push("");
if (milestone.product_research) {
lines.push(...renderProductResearchLines(milestone.product_research));
lines.push("");
}
if (milestone.vision_meeting) {
lines.push("## Vision Alignment Meeting");
lines.push("");
@ -278,6 +282,39 @@ function renderRoadmapMarkdown(milestone, slices) {
}
return `${lines.join("\n").trimEnd()}\n`;
}
function renderProductResearchLines(research) {
const lines = [
"## Product / Competitor Research",
"",
"### Purpose Contract",
"",
`- Purpose: ${research.purpose}`,
`- Consumer: ${research.consumer}`,
`- Contract: ${research.contract}`,
`- Failure boundary: ${research.failureBoundary}`,
`- Evidence: ${research.evidence}`,
`- Non-goals: ${research.nonGoals}`,
`- Invariants: ${research.invariants}`,
`- Assumptions: ${research.assumptions}`,
"",
"### Category Findings",
"",
`- Category / job: ${research.category}${research.targetJob}`,
];
for (const item of research.comparables ?? []) {
lines.push(`- Comparable: ${item}`);
}
for (const item of research.tableStakes ?? []) {
lines.push(`- Table stake: ${item}`);
}
for (const item of research.differentiators ?? []) {
lines.push(`- Differentiator: ${item}`);
}
for (const item of research.antiPatterns ?? []) {
lines.push(`- Do not copy: ${item}`);
}
return lines;
}
function renderTaskPlanMarkdown(task, taskGates = []) {
const estimatedSteps = Math.max(
1,

View file

@ -1,6 +1,6 @@
# Headless Milestone Creation
# Machine-Surface Milestone Creation
You are creating a SF milestone from a provided specification document. This is a **headless** flow: do not ask exploratory user questions. Make best-judgment calls and document them as assumptions. The only allowed user question is the final `depth_verification_{{milestoneId}}_confirm` write gate when your evidence is strong enough to promote a final `CONTEXT.md`; if that gate is unavailable or not confirmed, write a draft and stop.
You are creating a SF milestone from a provided specification document through the machine surface. This uses the same SF planning flow as interactive discussion, but without exploratory user questions. Make best-judgment calls and document them as assumptions. The only allowed user question is the final `depth_verification_{{milestoneId}}_confirm` write gate when your evidence is strong enough to promote a final `CONTEXT.md`; if that gate is unavailable or not confirmed, write a draft and stop.
## Active Skills
@ -133,7 +133,7 @@ Print a structured depth summary in chat covering:
This is your audit trail. Print it — do not skip it.
The final gate is the only question in headless mode. It is not an exploratory question round. Ask it only after printing the compact depth summary, and only to confirm whether the already-investigated context is final enough to write or should remain a draft.
The final gate is the only question in machine-surface milestone creation. It is not an exploratory question round. Ask it only after printing the compact depth summary, and only to confirm whether the already-investigated context is final enough to write or should remain a draft.
Before writing final `CONTEXT.md`, decide confidence:
- **HIGH**: You have verified the project knowledge above from actual files/tests/research, and the milestone scope is specific enough for downstream agents. Call `ask_user_questions` once with question ID `depth_verification_{{milestoneId}}_confirm`; make the recommended first option "Proceed with final context (Recommended)" and the second option "Keep as draft". If the confirmed answer is not received, do not bypass the gate.
@ -307,7 +307,7 @@ After writing final context and roadmap, say exactly: "Milestone {{milestoneId}}
## Critical Rules
- **Do not ask exploratory user questions** — this is headless mode. Make judgment calls and document them. The only allowed user question is `depth_verification_{{milestoneId}}_confirm`, and only when evidence is strong enough to finalize.
- **Do not ask exploratory user questions** — this is machine-surface milestone creation. Make judgment calls and document them. The only allowed user question is `depth_verification_{{milestoneId}}_confirm`, and only when evidence is strong enough to finalize.
- **Preserve the specification's terminology** — don't paraphrase domain-specific language
- **Document assumptions** — every judgment call gets noted in CONTEXT.md under "Assumptions" with reasoning
- **Investigate thoroughly** — scout codebase, check library docs, web search. Same rigor as interactive mode.

View file

@ -1,11 +1,11 @@
Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.sf/DECISIONS.md` if it exists — respect existing decisions. Read `.sf/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, continue in legacy compatibility mode but explicitly note missing requirement coverage. Use the **Roadmap** output template below to shape the milestone planning payload you send to `sf_plan_milestone`. Call `sf_plan_milestone` to persist the milestone planning fields and render `{{milestoneId}}-ROADMAP.md` from DB state. Do **not** write `{{milestoneId}}-ROADMAP.md`, `ROADMAP.md`, or other planning artifacts manually. If planning produces structural decisions, append them to `.sf/DECISIONS.md`. {{skillActivation}} Fill the Horizontal Checklist section with cross-cutting concerns considered during planning (requirements re-read, decisions re-evaluated, graceful shutdown, revenue paths, auth boundary, shared resources, reconnection). Omit for trivial milestones.
Plan milestone {{milestoneId}} ("{{milestoneTitle}}"). Read `.sf/DECISIONS.md` if it exists — respect existing decisions. Read `.sf/REQUIREMENTS.md` if it exists and treat Active requirements as the capability contract. If `REQUIREMENTS.md` is missing, treat that as a planning gap: derive the minimum requirement coverage from current project evidence, persist it through SF planning tools, and explicitly note missing coverage. Use the **Roadmap** output template below to shape the milestone planning payload you send to `sf_plan_milestone`. Start the `vision` field with the milestone purpose before implementation detail, include the structured `productResearch` payload when the work is product-facing, workflow-facing, developer-experience, or market-positioning, and make each slice `goal` state the slice purpose before mechanics. If the milestone changes how SF is driven, observed, integrated, or automated, keep the axes separate in the roadmap: surface (TUI/CLI/web/editor/machine), protocol (ACP/RPC/stdio/HTTP/wire), output format (text/json/stream-json), run control (manual/assisted/autonomous), and permission profile (restricted/normal/trusted/unrestricted). Call `sf_plan_milestone` to persist the milestone planning fields and render `{{milestoneId}}-ROADMAP.md` from DB state. Do **not** write `{{milestoneId}}-ROADMAP.md`, `ROADMAP.md`, or other planning artifacts manually. If planning produces structural decisions, append them to `.sf/DECISIONS.md`. {{skillActivation}} Fill the Horizontal Checklist section with cross-cutting concerns considered during planning (requirements re-read, decisions re-evaluated, graceful shutdown, revenue paths, auth boundary, shared resources, reconnection). Omit for trivial milestones.
Before calling `sf_plan_milestone`, run a bounded **Vision Alignment Meeting** for the milestone and roadmap as a real multi-agent review. Use the `subagent` tool in `mode: "debate"` with `rounds: 2` and a separate task for each participant lens below. Do **not** merely simulate every participant inside this planner response. Use only supported agent names: `planner`, `reviewer`, `researcher`, and `scout`. Put the stakeholder role name inside the task text; do not invent agent names such as `combatant`, `delivery-lead`, `product-manager`, or `customer-panel`. If the `subagent` tool is unavailable or fails after one retry, record that explicitly in `trigger` and run the structured meeting inline as a degraded fallback. This is allowed to be broader and more nuanced than slice planning. Include at least these participant lenses:
- Product Manager
- User Advocate
- Customer Panel
- Business
- Researcher
- Researcher, including product category and competitor/comparable-system research when applicable
- Delivery Lead
- Partner
- Combatant

View file

@ -1,4 +1,4 @@
Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.sf/DECISIONS.md` if it exists — respect existing decisions. Read `.sf/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Use the **Slice Plan** and **Task Plan** output templates below. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. For each task, decide whether execution can safely swarm: mark it swarmable only if it can split into 2-3 independent shards with disjoint file/directory ownership, shard-local verification, and no shared-interface, lockfile, migration, generated-artifact, or sequencing conflict; otherwise make the task explicitly single-agent. Call `sf_plan_slice` to persist the slice plan — the tool writes `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files to disk and persists to DB. The `sf_plan_slice` payload MUST include `planningMeeting` as a populated object; empty, null, or missing planningMeeting is not acceptable. Use the canonical M004 meeting roles: Trigger, Product Manager, User Advocate, Customer Panel, Business, Researcher, Delivery Lead, Partner, Combatant, Architect, Moderator, Recommended Route, and Confidence. The tool's Product Manager field is named `pm`, and the Confidence field is named `confidenceSummary`; keep existing tool field names while covering the canonical roles. If you are tempted to skip the meeting because the slice is simple, write a brief one-line per role explaining why it is simple. Do **not** write plan files manually — use the DB-backed tool so state stays consistent. If planning produces structural decisions, call `sf_decision_save` for each — the tool auto-assigns IDs and regenerates `.sf/DECISIONS.md` automatically. {{skillActivation}} Before finishing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 25 steps and 38 files (68 steps or 810 files — consider splitting; 10+ steps or 12+ files — must split), any swarmable task has disjoint Expected Output paths/directories and explains shard ownership, the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts, and quality gate coverage — for non-trivial slices, Threat Surface (Q3: abuse, data exposure, input trust) and Requirement Impact (Q4: requirements touched, re-verify, decisions revisited) sections are present. For non-trivial tasks, Failure Modes (Q5), Load Profile (Q6), and Negative Tests (Q7) are filled in task plans.
Plan slice {{sliceId}} ("{{sliceTitle}}") of milestone {{milestoneId}}. Read `.sf/DECISIONS.md` if it exists — respect existing decisions. Read `.sf/REQUIREMENTS.md` if it exists — identify which Active requirements the roadmap says this slice owns or supports, and ensure the plan delivers them. Read the roadmap boundary map, any existing context/research files, and dependency summaries. Use the **Slice Plan** and **Task Plan** output templates below. Decompose into tasks with must-haves. Fill the `Proof Level` and `Integration Closure` sections truthfully so the plan says what class of proof this slice really delivers and what end-to-end wiring still remains. If the slice changes how SF is driven, observed, integrated, or automated, fill `Interface Axes` and keep surface (TUI/CLI/web/editor/machine), protocol (ACP/RPC/stdio/HTTP/wire), output format (text/json/stream-json), run control (manual/assisted/autonomous), and permission profile (restricted/normal/trusted/unrestricted) separate. For each task, decide whether execution can safely swarm: mark it swarmable only if it can split into 2-3 independent shards with disjoint file/directory ownership, shard-local verification, and no shared-interface, lockfile, migration, generated-artifact, or sequencing conflict; otherwise make the task explicitly single-agent. Call `sf_plan_slice` to persist the slice plan — the tool writes `{{sliceId}}-PLAN.md` and individual `T##-PLAN.md` files to disk and persists to DB. The `sf_plan_slice` payload MUST include `planningMeeting` as a populated object; empty, null, or missing planningMeeting is not acceptable. Use the canonical M004 meeting roles: Trigger, Product Manager, User Advocate, Customer Panel, Business, Researcher, Delivery Lead, Partner, Combatant, Architect, Moderator, Recommended Route, and Confidence. The tool's Product Manager field is named `pm`, and the Confidence field is named `confidenceSummary`; keep existing tool field names while covering the canonical roles. If you are tempted to skip the meeting because the slice is simple, write a brief one-line per role explaining why it is simple. Do **not** write plan files manually — use the DB-backed tool so state stays consistent. If planning produces structural decisions, call `sf_decision_save` for each — the tool auto-assigns IDs and regenerates `.sf/DECISIONS.md` automatically. {{skillActivation}} Before finishing, self-audit the plan: every must-have maps to at least one task, every task has complete sections (steps, must-haves, verification, observability impact, inputs, and expected output), task ordering is consistent with no circular references, every pair of artifacts that must connect has an explicit wiring step, task scope targets 25 steps and 38 files (68 steps or 810 files — consider splitting; 10+ steps or 12+ files — must split), any swarmable task has disjoint Expected Output paths/directories and explains shard ownership, the plan honors locked decisions from context/research/decisions artifacts, the proof-level wording does not overclaim live integration if only fixture/contract proof is planned, every Active requirement this slice owns has at least one task with verification that proves it is met, and every task produces real user-facing progress — if the slice has a UI surface at least one task builds the real UI, if it has an API at least one task connects it to a real data source, and showing the completed result to a non-technical stakeholder would demonstrate real product progress rather than developer artifacts, and quality gate coverage — for non-trivial slices, Threat Surface (Q3: abuse, data exposure, input trust) and Requirement Impact (Q4: requirements touched, re-verify, decisions revisited) sections are present. For non-trivial tasks, Failure Modes (Q5), Load Profile (Q6), Negative Tests (Q7), and Interface Impact when relevant are filled in task plans.
### Report sf-internal observations

View file

@ -34,7 +34,7 @@ Before decomposing, build your understanding:
2. **Library docs - DeepWiki first.** Use `ask_question` / `read_wiki_structure` / `read_wiki_contents` (DeepWiki) as the default for any GitHub-hosted library. Fall back to `resolve_library` / `get_library_docs` (Context7) for npm/pypi/crates packages DeepWiki doesn't have. Context7 free tier is capped at 1000 req/month - spend those on cases DeepWiki can't cover. Skip both for libraries already used in this codebase.
3. **Skill Discovery ({{skillDiscoveryMode}}):**{{skillDiscoveryInstructions}}
4. **Requirements analysis.** If `.sf/REQUIREMENTS.md` exists, research against it. Identify which Active requirements are table stakes, likely omissions, overbuilt risks, or domain-standard behaviors.
5. **Comparable systems and nuanced stakeholder scan.** Before locking the roadmap, research what similar products, OSS tools, and production teams do in this category. Surface table stakes, common failure modes, customer expectations, business constraints, and differentiators worth preserving.
5. **Product and competitor research.** Before locking requirements or slices for product-facing, workflow-facing, developer-experience, or market-positioning work, research the product category and 3-5 representative competitors, comparable OSS tools, or production teams. Surface table stakes, common failure modes, customer expectations, business constraints, differentiators worth preserving, and patterns not to copy. For purely internal mechanical fixes, explicitly mark this not applicable with the reason.
### Strategic Questions to Answer
@ -61,7 +61,7 @@ Each role has a **purpose** (why it exists in the ceremony) and a **depth contra
- **User Advocate:** what must matter for the user experience and trust surface? *Must name at least one concrete user scenario where the current system fails or frustrates.*
- **Customer Panel:** multiple likely customer viewpoints, not a single flattened "user". *Must present at least 2 distinct customer perspectives with different priorities or constraints.*
- **Business:** wedge, retention, expansion path, or viability concerns. *Must name at least one specific failure mode this milestone addresses and the cost of not addressing it.*
- **Researcher:** comparable products, OSS tools, market expectations, DeepWiki/Context7 findings, and focused web research. *Must cite at least one external source or comparable system by name with a specific finding.*
- **Researcher:** product category, competitors/comparable OSS tools, market expectations, DeepWiki/Context7 findings, and focused web research. *Must cite at least one external source or comparable system by name with a specific finding, or explicitly say why competitor research is not applicable for this milestone.*
- **Delivery Lead:** smallest credible milestone sequence and scope cuts. *Must state the critical path with task counts and wall-clock estimates, and identify what could be cut if timeboxed.*
- **Partner:** strongest case for the roadmap. *Must cite specific evidence (source files, prior milestones, production incidents, or research findings) and name the single highest-impact slice with reasoning.*
- **Combatant:** why the roadmap is wrong, overbuilt, or solving the wrong thing. *Must name at least 2 specific risks with concrete failure scenarios (not vague "might be hard"), and propose a mitigation or alternative for each.*
@ -81,11 +81,31 @@ If confidence stays low after research and weighted discussion, assume the miles
Then:
1. Use the **Roadmap** output template from the inlined context above
2. {{skillActivation}}
3. Create the roadmap: decompose into demoable vertical slices - as many as the work genuinely needs, no more. A simple feature might be 1 slice. Don't decompose for decomposition's sake.
3. Create the roadmap: start with the milestone purpose in the `vision` field, include a structured `productResearch` payload when applicable, then decompose into demoable vertical slices - as many as the work genuinely needs, no more. A simple feature might be 1 slice. Don't decompose for decomposition's sake.
4. Order by risk (high-risk first)
5. Call `sf_plan_milestone` to persist the milestone planning fields, slice rows, and **horizontal checklist** in the DB-backed planning path. Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually - the planning tool owns roadmap rendering and persistence.
5. Call `sf_plan_milestone` to persist the milestone planning fields, slice rows, and **horizontal checklist** in the DB-backed planning path. Every slice `goal` must state the slice purpose before implementation detail. If the milestone changes how SF is driven, observed, integrated, or automated, keep the axes separate in the roadmap: surface (TUI/CLI/web/editor/machine), protocol (ACP/RPC/stdio/HTTP/wire), output format (text/json/stream-json), run control (manual/assisted/autonomous), and permission profile (restricted/normal/trusted/unrestricted). Do **not** write `{{outputPath}}`, `ROADMAP.md`, or other planning artifacts manually - the planning tool owns roadmap rendering and persistence.
6. If planning produced structural decisions (e.g. slice ordering rationale, technology choices, scope exclusions), call `sf_decision_save` for each decision - the tool auto-assigns IDs and regenerates `.sf/DECISIONS.md` automatically.
### productResearch payload
When the milestone is product-facing, workflow-facing, developer-experience, or market-positioning, include `productResearch` in the `sf_plan_milestone` call. Do not bury this inside the meeting prose.
Required fields:
- `purpose`: why this research exists for this milestone
- `consumer`: who uses the research and what decision it informs
- `contract`: what must be answered before requirements or slices are locked
- `failureBoundary`: what happens if research is weak, stale, contradictory, or not applicable
- `evidence`: concrete sources/comparables inspected and what evidence was accepted
- `nonGoals`: what this research intentionally does not decide or copy
- `invariants`: rules that remain true regardless of competitor findings
- `assumptions`: assumptions made where research could not fully resolve uncertainty
- `category`: product/workflow category
- `targetJob`: target user job-to-be-done
- `comparables`: 3-5 competitors, comparable OSS tools, or production teams with specific findings
- `tableStakes`: expected category behaviors users will assume
- `differentiators`: differentiators this roadmap should preserve or create
- `antiPatterns`: failure modes or competitor patterns not to copy
## Requirement Mapping Rules
- **BUILD_PLAN reference required (D015).** Every new milestone must cite which BUILD_PLAN.md tier/item it implements. Include "BUILD_PLAN.md Tier X.Y" or "BUILD_PLAN §Tier X" in the vision text. If the milestone is not from BUILD_PLAN (e.g. a bugfix milestone), explicitly state "Not from BUILD_PLAN — <reason>" in the vision. Existing milestones without this reference are grandfathered; new ones must have it.

View file

@ -81,12 +81,15 @@ Then:
- **Threat Surface:** Identify abuse scenarios, data exposure risks, and input trust boundaries. Required when the slice handles user input, authentication, authorization, or sensitive data. Omit entirely for internal refactoring or simple changes.
- **Requirement Impact:** List which existing requirements this slice touches, what must be re-verified after shipping, and which prior decisions should be reconsidered. Omit entirely if no existing requirements are affected.
- For each task in a non-trivial slice, fill Failure Modes (Q5), Load Profile (Q6), and Negative Tests (Q7) in the task plan when the task has external dependencies, shared resources, or non-trivial input handling. Omit for simple tasks.
- If the slice changes how SF is driven, observed, integrated, or automated, fill Interface Axes and keep surface, protocol, output format, run control, and permission profile separate.
6. Decompose the slice into tasks, each fitting one context window. Each task needs:
- a concrete, action-oriented title
- a purpose-first description that starts with why this task exists and which part of the slice contract it closes
- the inline task entry fields defined in the plan.md template (Why / Files / Do / Verify / Done when)
- a matching task plan file with description, steps, must-haves, verification, inputs, and expected output
- **Inputs and Expected Output must list concrete backtick-wrapped file paths** (e.g. `` `src/types.ts` ``). These are machine-parsed to derive task dependencies — vague prose without paths breaks parallel execution. Every task must have at least one output file path.
- Observability Impact section **only if the task touches runtime boundaries, async flows, or error paths** — omit it otherwise
- Interface Impact section **only if the task changes a surface, protocol, output format, run-control mode, or permission profile** — omit it otherwise
- Swarm guidance when relevant: if a task can safely split into 2-3 independent execution shards, say so in the task plan's Steps or Description with explicit file/directory ownership per shard. If the work touches shared interfaces, lockfiles, migrations, generated artifacts, or sequence-dependent code, state that it should execute single-agent.
7. **Run adversarial review before persisting the plan.** Record all three lenses in the `adversarialReview` payload you send to `sf_plan_slice`. Each role has a purpose and depth contract — a review that agrees without raising specific objections is a rubber stamp, not a review.
- **Partner:** strongest case for why this plan is sufficient. *Must cite specific evidence from the code you explored — file paths, function names, test coverage gaps, or prior slice learnings. Not just "the plan looks good."*

View file

@ -13,6 +13,11 @@ function normalizeStringArray(value) {
if (!Array.isArray(value)) return [];
return value.filter((item) => typeof item === "string");
}
function normalizeObject(value) {
return value && typeof value === "object" && !Array.isArray(value)
? value
: null;
}
function normalizeMilestone(milestoneRow) {
return {
@ -39,6 +44,9 @@ function normalizeMilestone(milestoneRow) {
milestoneRow.boundary_map_markdown ??
"",
),
productResearch: normalizeObject(
milestoneRow.productResearch ?? milestoneRow.product_research,
),
};
}

View file

@ -31,7 +31,7 @@ export function validateContent(unitType, artifactPath) {
}
}
const LEAKED_JSON_FIELD_RE =
/(^|[,{]\s*)"(?:successCriteria|proofLevel|integrationClosure|observabilityImpact|planningMeeting|visionMeeting|tasks|updatedTasks|removedTaskIds|verify|expectedOutput)"\s*:|",\s*"[A-Za-z][A-Za-z0-9_]*"\s*:/m;
/(^|[,{]\s*)"(?:successCriteria|proofLevel|integrationClosure|observabilityImpact|planningMeeting|visionMeeting|productResearch|tasks|updatedTasks|removedTaskIds|verify|expectedOutput)"\s*:|",\s*"[A-Za-z][A-Za-z0-9_]*"\s*:/m;
const MARKDOWNLINT_CONFIG = {
default: true,
// Generated plans often contain long commands/tables and repeated headings

View file

@ -6,7 +6,7 @@
* buckets. The result is structured and side-effect-free Phase C wires
* the report into the scaffold sync pipeline; Phase B is data-plane only.
*/
import { existsSync, readFileSync, writeFileSync } from "node:fs";
import { existsSync, readFileSync } from "node:fs";
import { join } from "node:path";
import { SCAFFOLD_FILES } from "./agentic-docs-scaffold.js";
import {
@ -52,42 +52,6 @@ function emptyCounts() {
};
}
const LEGACY_GSD_MANAGED_MARKER_RE = /<!--\s*\/?GSD:/i;
/**
* Return true when a marker-less scaffold file still contains old managed GSD
* section markers.
*
* Purpose: let scaffold sync identify generated legacy guidance in projects
* that predate ADR-021's `sf-doc` marker without treating arbitrary prose as
* safe to rewrite.
*
* Consumer: detectScaffoldDrift and migrateLegacyGsdScaffold.
*/
export function hasLegacyGsdManagedMarkers(body) {
return LEGACY_GSD_MANAGED_MARKER_RE.test(body);
}
/**
* Rewrite legacy GSD-generated scaffold text to SF naming.
*
* Purpose: preserve repo-local custom prose while removing obsolete product
* names from sections that carry old managed GSD markers.
*
* Consumer: migrateLegacyGsdScaffold during startup and `/sf scaffold sync`.
*/
export function rewriteLegacyGsdManagedBody(body) {
return body
.replace(/<!--\s*GSD:/g, "<!-- SF:")
.replace(/<!--\s*\/GSD:/g, "<!-- /SF:")
.replace(/\.gsd\//g, ".sf/")
.replace(/\.gsd\b/g, ".sf")
.replace(/\/gsd-([a-z0-9-]+)/gi, "/sf $1")
.replace(/\bgsd\s+headless\s+auto\b/gi, "sf headless auto")
.replace(/\bgsd\s+auto\b/gi, "sf autonomous")
.replace(/\bGSD\b/g, "SF")
.replace(/\bgsd\b/g, "sf");
}
/**
* Classify every `SCAFFOLD_FILES` entry against its on-disk state.
*
@ -165,19 +129,6 @@ export function detectScaffoldDrift(basePath) {
body = "";
}
}
if (hasLegacyGsdManagedMarkers(body)) {
items.push({
path: file.path,
template: file.path,
bucket: "upgradable",
currentVersion: "legacy-gsd",
shipVersion,
hashDrifted: false,
legacyMigration: "gsd-managed-markers",
});
counts.upgradable += 1;
continue;
}
items.push({
path: file.path,
template: file.path,
@ -384,62 +335,3 @@ export function migrateLegacyScaffold(basePath) {
}
return { migrated, skipped };
}
/**
* Upgrade marker-less scaffold files that still carry legacy GSD managed
* section markers.
*
* Purpose: make `/sf scaffold sync` responsible for stale generated SF/GSD
* guidance without overwriting repo-specific content around those sections.
*
* Consumer: ensureAgenticDocsScaffold before normal missing-file and pending
* version refresh handling.
*/
export function migrateLegacyGsdScaffold(basePath) {
const shipVersion = process.env.SF_VERSION || "0.0.0";
const migrated = [];
const skipped = [];
const appliedAt = new Date().toISOString();
for (const file of SCAFFOLD_FILES) {
if (SKIP_MARKER_PATHS.has(file.path)) continue;
const target = join(basePath, file.path);
if (!existsSync(target)) continue;
let body;
let markerPresent = false;
try {
const extracted = extractMarker(target);
markerPresent = extracted.marker !== null;
body = extracted.body;
} catch (err) {
logWarning("scaffold", "failed to read file during GSD migration", {
file: file.path,
error: err.message,
});
skipped.push(file.path);
continue;
}
if (markerPresent) continue;
if (!hasLegacyGsdManagedMarkers(body)) continue;
const upgraded = rewriteLegacyGsdManagedBody(body);
try {
writeFileSync(target, upgraded, "utf-8");
stampScaffoldFile(target, file.path, shipVersion, "completed");
recordScaffoldApply(basePath, {
path: file.path,
template: file.path,
version: shipVersion,
appliedAt,
stateAtApply: "completed",
contentHash: bodyHash(upgraded),
});
migrated.push(file.path);
} catch (err) {
logWarning("scaffold", "failed to upgrade legacy GSD scaffold file", {
file: file.path,
error: err.message,
});
skipped.push(file.path);
}
}
return { migrated, skipped };
}

View file

@ -78,7 +78,7 @@ function openRawDb(path) {
loadProvider();
return new DatabaseSync(path);
}
const SCHEMA_VERSION = 41;
const SCHEMA_VERSION = 42;
function indexExists(db, name) {
return !!db
.prepare(
@ -373,6 +373,7 @@ function ensureSpecSchemaTables(db) {
requirement_coverage TEXT DEFAULT '',
boundary_map_markdown TEXT DEFAULT '',
vision_meeting_json TEXT DEFAULT '',
product_research_json TEXT DEFAULT '',
spec_version INTEGER NOT NULL DEFAULT 1,
created_at TEXT NOT NULL,
PRIMARY KEY (id),
@ -653,6 +654,7 @@ function initSchema(db, fileBacked) {
requirement_coverage TEXT NOT NULL DEFAULT '',
boundary_map_markdown TEXT NOT NULL DEFAULT '',
vision_meeting_json TEXT NOT NULL DEFAULT '',
product_research_json TEXT NOT NULL DEFAULT '',
sequence INTEGER DEFAULT 0
)
`);
@ -680,8 +682,8 @@ function initSchema(db, fileBacked) {
planning_meeting_json TEXT NOT NULL DEFAULT '',
sequence INTEGER DEFAULT 0, -- Ordering hint: tools may set this to control execution order
replan_triggered_at TEXT DEFAULT NULL,
is_sketch INTEGER NOT NULL DEFAULT 0, -- gsd-2 ADR-011: 1 = slice is a sketch awaiting refine-slice
sketch_scope TEXT NOT NULL DEFAULT '', -- gsd-2 ADR-011: 2-3 sentence scope hint from plan-milestone
is_sketch INTEGER NOT NULL DEFAULT 0, -- SF ADR-011: 1 = slice is a sketch awaiting refine-slice
sketch_scope TEXT NOT NULL DEFAULT '', -- SF ADR-011: 2-3 sentence scope hint from plan-milestone
PRIMARY KEY (milestone_id, id),
FOREIGN KEY (milestone_id) REFERENCES milestones(id)
)
@ -715,10 +717,10 @@ function initSchema(db, fileBacked) {
created_at TEXT NOT NULL DEFAULT '',
verification_status TEXT NOT NULL DEFAULT '',
sequence INTEGER DEFAULT 0, -- Ordering hint: tools may set this to control execution order
escalation_pending INTEGER NOT NULL DEFAULT 0, -- ADR-011 P2 (gsd-2): pause-on-escalation flag
escalation_awaiting_review INTEGER NOT NULL DEFAULT 0, -- ADR-011 P2 (gsd-2): continueWithDefault=true marker (no pause)
escalation_override_applied INTEGER NOT NULL DEFAULT 0, -- gsd-2 ADR-011 P2: 1 once carry-forward injected into a downstream prompt
escalation_artifact_path TEXT DEFAULT NULL, -- ADR-011 P2 (gsd-2): path to T##-ESCALATION.json
escalation_pending INTEGER NOT NULL DEFAULT 0, -- ADR-011 P2 (SF): pause-on-escalation flag
escalation_awaiting_review INTEGER NOT NULL DEFAULT 0, -- ADR-011 P2 (SF): continueWithDefault=true marker (no pause)
escalation_override_applied INTEGER NOT NULL DEFAULT 0, -- SF ADR-011 P2: 1 once carry-forward injected into a downstream prompt
escalation_artifact_path TEXT DEFAULT NULL, -- ADR-011 P2 (SF): path to T##-ESCALATION.json
PRIMARY KEY (milestone_id, slice_id, id),
FOREIGN KEY (milestone_id, slice_id) REFERENCES slices(milestone_id, id)
)
@ -1009,6 +1011,49 @@ function columnExists(db, table, column) {
function ensureColumn(db, table, column, ddl) {
if (!columnExists(db, table, column)) db.exec(ddl);
}
function hasPlanningPayload(planning = {}) {
return (
Boolean(planning.vision) ||
(planning.successCriteria?.length ?? 0) > 0 ||
(planning.keyRisks?.length ?? 0) > 0 ||
(planning.proofStrategy?.length ?? 0) > 0 ||
Boolean(planning.verificationContract) ||
Boolean(planning.verificationIntegration) ||
Boolean(planning.verificationOperational) ||
Boolean(planning.verificationUat) ||
(planning.definitionOfDone?.length ?? 0) > 0 ||
Boolean(planning.requirementCoverage) ||
Boolean(planning.boundaryMapMarkdown) ||
Boolean(planning.visionMeeting) ||
Boolean(planning.productResearch)
);
}
function parseJsonOrFallback(raw, fallback) {
if (typeof raw !== "string" || raw.trim().length === 0) return fallback;
try {
return JSON.parse(raw);
} catch {
return fallback;
}
}
function isEmptyMilestoneSpec(row) {
if (!row) return true;
return (
(row["vision"] ?? "") === "" &&
parseJsonOrFallback(row["success_criteria"], []).length === 0 &&
parseJsonOrFallback(row["key_risks"], []).length === 0 &&
parseJsonOrFallback(row["proof_strategy"], []).length === 0 &&
(row["verification_contract"] ?? "") === "" &&
(row["verification_integration"] ?? "") === "" &&
(row["verification_operational"] ?? "") === "" &&
(row["verification_uat"] ?? "") === "" &&
parseJsonOrFallback(row["definition_of_done"], []).length === 0 &&
(row["requirement_coverage"] ?? "") === "" &&
(row["boundary_map_markdown"] ?? "") === "" &&
(row["vision_meeting_json"] ?? "") === "" &&
(row["product_research_json"] ?? "") === ""
);
}
function ensureTaskCreatedAtColumn(db) {
ensureColumn(
db,
@ -1058,13 +1103,13 @@ function populateSpecTablesFromExisting(db) {
INSERT OR IGNORE INTO milestone_specs (
id, vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, product_research_json,
spec_version, created_at
)
SELECT
id, vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, '',
1, COALESCE(created_at, ?)
FROM milestones
WHERE id NOT IN (SELECT id FROM milestone_specs)
@ -1843,7 +1888,7 @@ function migrateSchema(db) {
});
}
if (currentVersion < 22) {
// gsd-2 ADR-011: progressive planning. is_sketch=1 means the slice is a 2-3
// SF ADR-011: progressive planning. is_sketch=1 means the slice is a 2-3
// sentence sketch awaiting refine-slice expansion; refine fills in the
// real plan and clears the flag. sketch_scope holds the milestone
// planner's stored scope hint that refine treats as a hard boundary.
@ -1867,7 +1912,7 @@ function migrateSchema(db) {
});
}
if (currentVersion < 23) {
// ADR-011 Phase 2 (gsd-2 ADR): mid-execution escalation. escalation_pending=1
// ADR-011 Phase 2 (SF ADR): mid-execution escalation. escalation_pending=1
// marks a task that paused for a user decision; escalation_artifact_path
// points to the T##-ESCALATION.json file containing options + recommendation.
// State derivation will emit phase='escalating-task' when any task in the
@ -1900,7 +1945,7 @@ function migrateSchema(db) {
});
}
if (currentVersion < 24) {
// ADR-011 P2 (gsd-2 ADR): the third escalation flag for the
// ADR-011 P2 (SF ADR): the third escalation flag for the
// continueWithDefault=true case — an artifact is recorded for human
// review later, but the loop is NOT paused. Mutually exclusive with
// escalation_pending (the writer flips one or the other).
@ -1918,7 +1963,7 @@ function migrateSchema(db) {
});
}
if (currentVersion < 25) {
// gsd-2 ADR-011 P2 carry-forward: when an escalation is resolved, the user's
// SF ADR-011 P2 carry-forward: when an escalation is resolved, the user's
// choice should be visible to the next execute-task agent in the same
// slice. escalation_override_applied=0 marks "resolved but not yet
// injected into a downstream prompt"; the prompt builder calls
@ -2150,6 +2195,26 @@ function migrateSchema(db) {
":applied_at": new Date().toISOString(),
});
}
if (currentVersion < 42) {
ensureColumn(
db,
"milestones",
"product_research_json",
`ALTER TABLE milestones ADD COLUMN product_research_json TEXT NOT NULL DEFAULT ''`,
);
ensureColumn(
db,
"milestone_specs",
"product_research_json",
`ALTER TABLE milestone_specs ADD COLUMN product_research_json TEXT DEFAULT ''`,
);
db.prepare(
"INSERT INTO schema_version (version, applied_at) VALUES (:version, :applied_at)",
).run({
":version": 42,
":applied_at": new Date().toISOString(),
});
}
db.exec("COMMIT");
} catch (err) {
db.exec("ROLLBACK");
@ -2556,12 +2621,12 @@ export function insertMilestone(m) {
id, title, status, depends_on, created_at,
vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, sequence
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, product_research_json, sequence
) VALUES (
:id, :title, :status, :depends_on, :created_at,
:vision, :success_criteria, :key_risks, :proof_strategy,
:verification_contract, :verification_integration, :verification_operational, :verification_uat,
:definition_of_done, :requirement_coverage, :boundary_map_markdown, :vision_meeting_json, :sequence
:definition_of_done, :requirement_coverage, :boundary_map_markdown, :vision_meeting_json, :product_research_json, :sequence
)`)
.run({
":id": m.id,
@ -2585,41 +2650,75 @@ export function insertMilestone(m) {
":vision_meeting_json": m.planning?.visionMeeting
? JSON.stringify(m.planning.visionMeeting)
: "",
":product_research_json": m.planning?.productResearch
? JSON.stringify(m.planning.productResearch)
: "",
":sequence": m.sequence ?? 0,
});
insertMilestoneSpecIfAbsent(m.id, m.planning ?? {});
if (hasPlanningPayload(m.planning)) {
insertMilestoneSpecIfAbsent(m.id, m.planning ?? {});
}
}
function insertMilestoneSpecIfAbsent(milestoneId, planning = {}) {
if (!hasPlanningPayload(planning)) return;
const existing = currentDb
.prepare("SELECT * FROM milestone_specs WHERE id = ?")
.get(milestoneId);
if (existing && !isEmptyMilestoneSpec(existing)) return;
const params = {
":id": milestoneId,
":vision": planning.vision ?? "",
":success_criteria": JSON.stringify(planning.successCriteria ?? []),
":key_risks": JSON.stringify(planning.keyRisks ?? []),
":proof_strategy": JSON.stringify(planning.proofStrategy ?? []),
":verification_contract": planning.verificationContract ?? "",
":verification_integration": planning.verificationIntegration ?? "",
":verification_operational": planning.verificationOperational ?? "",
":verification_uat": planning.verificationUat ?? "",
":definition_of_done": JSON.stringify(planning.definitionOfDone ?? []),
":requirement_coverage": planning.requirementCoverage ?? "",
":boundary_map_markdown": planning.boundaryMapMarkdown ?? "",
":vision_meeting_json": planning.visionMeeting
? JSON.stringify(planning.visionMeeting)
: "",
":product_research_json": planning.productResearch
? JSON.stringify(planning.productResearch)
: "",
":created_at": new Date().toISOString(),
};
if (existing) {
currentDb
.prepare(`UPDATE milestone_specs SET
vision = :vision,
success_criteria = :success_criteria,
key_risks = :key_risks,
proof_strategy = :proof_strategy,
verification_contract = :verification_contract,
verification_integration = :verification_integration,
verification_operational = :verification_operational,
verification_uat = :verification_uat,
definition_of_done = :definition_of_done,
requirement_coverage = :requirement_coverage,
boundary_map_markdown = :boundary_map_markdown,
vision_meeting_json = :vision_meeting_json,
product_research_json = :product_research_json
WHERE id = :id`)
.run(params);
return;
}
currentDb
.prepare(`INSERT OR IGNORE INTO milestone_specs (
id, vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, product_research_json,
spec_version, created_at
) VALUES (
:id, :vision, :success_criteria, :key_risks, :proof_strategy,
:verification_contract, :verification_integration, :verification_operational, :verification_uat,
:definition_of_done, :requirement_coverage, :boundary_map_markdown, :vision_meeting_json,
:definition_of_done, :requirement_coverage, :boundary_map_markdown, :vision_meeting_json, :product_research_json,
1, :created_at
)`)
.run({
":id": milestoneId,
":vision": planning.vision ?? "",
":success_criteria": JSON.stringify(planning.successCriteria ?? []),
":key_risks": JSON.stringify(planning.keyRisks ?? []),
":proof_strategy": JSON.stringify(planning.proofStrategy ?? []),
":verification_contract": planning.verificationContract ?? "",
":verification_integration": planning.verificationIntegration ?? "",
":verification_operational": planning.verificationOperational ?? "",
":verification_uat": planning.verificationUat ?? "",
":definition_of_done": JSON.stringify(planning.definitionOfDone ?? []),
":requirement_coverage": planning.requirementCoverage ?? "",
":boundary_map_markdown": planning.boundaryMapMarkdown ?? "",
":vision_meeting_json": planning.visionMeeting
? JSON.stringify(planning.visionMeeting)
: "",
":created_at": new Date().toISOString(),
});
.run(params);
}
export function upsertMilestonePlanning(milestoneId, planning) {
if (!currentDb) throw new SFError(SF_STALE_STATE, "sf-db: No database open");
@ -2639,7 +2738,8 @@ export function upsertMilestonePlanning(milestoneId, planning) {
definition_of_done = COALESCE(:definition_of_done, definition_of_done),
requirement_coverage = COALESCE(:requirement_coverage, requirement_coverage),
boundary_map_markdown = COALESCE(:boundary_map_markdown, boundary_map_markdown),
vision_meeting_json = COALESCE(:vision_meeting_json, vision_meeting_json)
vision_meeting_json = COALESCE(:vision_meeting_json, vision_meeting_json),
product_research_json = COALESCE(:product_research_json, product_research_json)
WHERE id = :id`)
.run({
":id": milestoneId,
@ -2667,6 +2767,9 @@ export function upsertMilestonePlanning(milestoneId, planning) {
":vision_meeting_json": planning.visionMeeting
? JSON.stringify(planning.visionMeeting)
: null,
":product_research_json": planning.productResearch
? JSON.stringify(planning.productResearch)
: null,
});
}
export function insertSlice(s) {
@ -2779,14 +2882,14 @@ function insertSliceSpecIfAbsent(milestoneId, sliceId, planning = {}) {
});
}
/**
* gsd-2 ADR-011: clear the is_sketch flag after refine-slice fills in the full plan.
* SF ADR-011: clear the is_sketch flag after refine-slice fills in the full plan.
* Idempotent safe to call on already-refined slices.
*/
export function clearSliceSketch(milestoneId, sliceId) {
setSliceSketchFlag(milestoneId, sliceId, false);
}
/**
* gsd-2 ADR-011: generalized sketch-flag setter flip true or false.
* SF ADR-011: generalized sketch-flag setter flip true or false.
* Idempotent. Use this instead of clearSliceSketch when you also need to
* mark a slice as a sketch (e.g., a re-plan flow that wants to revert to
* sketch-then-refine).
@ -2804,7 +2907,7 @@ export function setSliceSketchFlag(milestoneId, sliceId, isSketch) {
});
}
/**
* gsd-2 ADR-011 auto-heal: reconcile stale is_sketch=1 rows whose PLAN file already
* SF ADR-011 auto-heal: reconcile stale is_sketch=1 rows whose PLAN file already
* exists on disk. The caller passes a predicate that uses the canonical path
* resolver so path logic stays in one place. Safe to call repeatedly only
* flips rows that meet the predicate.
@ -2823,7 +2926,7 @@ export function autoHealSketchFlags(milestoneId, hasPlanFile) {
}
}
/**
* gsd-2 ADR-011 P2: list tasks across a milestone that have an
* SF ADR-011 P2: list tasks across a milestone that have an
* escalation artifact path. By default returns only ACTIVE escalations
* (pending OR awaiting_review); pass includeResolved=true to also return
* resolved-but-still-recorded entries (audit trail).
@ -2980,7 +3083,7 @@ export function updateTaskStatus(
":id": taskId,
});
}
/** gsd-2 ADR-011 P2: set pause-on-escalation state on a task. The two flags are
/** SF ADR-011 P2: set pause-on-escalation state on a task. The two flags are
* mutually exclusive pending=1 forces awaiting_review=0. */
export function setTaskEscalationPending(
milestoneId,
@ -3002,7 +3105,7 @@ export function setTaskEscalationPending(
":tid": taskId,
});
}
/** gsd-2 ADR-011 P2: continueWithDefault=true marker artifact exists but no pause.
/** SF ADR-011 P2: continueWithDefault=true marker artifact exists but no pause.
* Mutually exclusive with escalation_pending. */
export function setTaskEscalationAwaitingReview(
milestoneId,
@ -3024,7 +3127,7 @@ export function setTaskEscalationAwaitingReview(
":tid": taskId,
});
}
/** gsd-2 ADR-011 P2: clear both escalation flags (called when an escalation is
/** SF ADR-011 P2: clear both escalation flags (called when an escalation is
* resolved or its artifact is removed). Leaves escalation_artifact_path so
* the resolution audit trail survives. */
export function clearTaskEscalationFlags(milestoneId, sliceId, taskId) {
@ -3036,7 +3139,7 @@ export function clearTaskEscalationFlags(milestoneId, sliceId, taskId) {
WHERE milestone_id = :mid AND slice_id = :sid AND id = :tid`)
.run({ ":mid": milestoneId, ":sid": sliceId, ":tid": taskId });
}
/** gsd-2 ADR-011 P2 carry-forward: find a task in this slice that has a resolved
/** SF ADR-011 P2 carry-forward: find a task in this slice that has a resolved
* escalation override that has NOT yet been injected into a downstream
* prompt. Returns the first match by sequence (lowest first), or null when
* no carry-forward is pending.
@ -3063,7 +3166,7 @@ export function findUnappliedEscalationOverride(milestoneId, sliceId) {
if (!row || !row.escalation_artifact_path) return null;
return { taskId: row.id, artifactPath: row.escalation_artifact_path };
}
/** gsd-2 ADR-011 P2 carry-forward: atomically claim the override for injection.
/** SF ADR-011 P2 carry-forward: atomically claim the override for injection.
* Returns true when this caller successfully flipped 01 (race winner) or
* false when another caller claimed it first (race loser). Use this to
* guarantee the override is injected exactly once. */
@ -3475,6 +3578,14 @@ function parseVisionMeeting(raw) {
return null;
}
}
function parseProductResearch(raw) {
if (typeof raw !== "string" || raw.trim().length === 0) return null;
try {
return JSON.parse(raw);
} catch {
return null;
}
}
function rowToMilestone(row) {
return {
id: row["id"],
@ -3495,6 +3606,7 @@ function rowToMilestone(row) {
requirement_coverage: row["requirement_coverage"] ?? "",
boundary_map_markdown: row["boundary_map_markdown"] ?? "",
vision_meeting: parseVisionMeeting(row["vision_meeting_json"]),
product_research: parseProductResearch(row["product_research_json"]),
sequence: row["sequence"] ?? 0,
};
}
@ -3861,6 +3973,12 @@ export function reconcileWorktreeDb(mainDbPath, worktreeDbPath) {
try {
const wtInfo = adapter.prepare("PRAGMA wt.table_info('decisions')").all();
const hasMadeBy = wtInfo.some((col) => col["name"] === "made_by");
const wtMilestoneInfo = adapter
.prepare("PRAGMA wt.table_info('milestones')")
.all();
const hasProductResearch = wtMilestoneInfo.some(
(col) => col["name"] === "product_research_json",
);
const decConf = adapter
.prepare(
`SELECT m.id FROM decisions m INNER JOIN wt.decisions w ON m.id = w.id WHERE m.decision != w.decision OR m.choice != w.choice OR m.rationale != w.rationale OR ${hasMadeBy ? "m.made_by != w.made_by" : "'agent' != 'agent'"} OR m.superseded_by IS NOT w.superseded_by`,
@ -3934,14 +4052,14 @@ export function reconcileWorktreeDb(mainDbPath, worktreeDbPath) {
id, title, status, depends_on, created_at, completed_at,
vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, product_research_json
)
SELECT id, title, status, depends_on, created_at, completed_at,
vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json
FROM wt.milestones
`)
SELECT id, title, status, depends_on, created_at, completed_at,
vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, ${hasProductResearch ? "product_research_json" : "''"}
FROM wt.milestones
`)
.run(),
);
// Merge slices — preserve worktree progress but never downgrade completed status (#2558).
@ -5895,8 +6013,8 @@ export function restoreManifest(manifest) {
db.prepare(`INSERT INTO milestones (id, title, status, depends_on, created_at, completed_at,
vision, success_criteria, key_risks, proof_strategy,
verification_contract, verification_integration, verification_operational, verification_uat,
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`);
definition_of_done, requirement_coverage, boundary_map_markdown, vision_meeting_json, product_research_json)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`);
for (const m of manifest.milestones) {
msStmt.run(
m.id,
@ -5917,6 +6035,7 @@ export function restoreManifest(manifest) {
m.requirement_coverage,
m.boundary_map_markdown,
m.vision_meeting ? JSON.stringify(m.vision_meeting) : "",
m.product_research ? JSON.stringify(m.product_research) : "",
);
}
// Restore slices

View file

@ -7,8 +7,6 @@
* For the user's home directory, use os.homedir() directly it handles
* platform-specific env lookup (USERPROFILE on Windows, HOME on POSIX)
* with appropriate fallbacks.
*
* @see https://github.com/gsd-build/gsd-2/issues/5015
*/
import { homedir } from "node:os";
import { join, resolve } from "node:path";

View file

@ -24,7 +24,7 @@ sf headless [flags] [command] [args...]
- `--answers <path>` — pre-supply answers and secrets from JSON file
- `--events <types>` — filter JSONL output to specific event types (comma-separated, implies `--json`)
**Exit codes:** 0=complete, 1=error/timeout, 2=blocked
**Exit codes:** 0=complete, 1=error/timeout, 10=blocked, 11=cancelled, 12=reload
## Core Workflows

View file

@ -66,7 +66,7 @@ Both `--answers` and `--supervised` can be active simultaneously. Priority order
## Without Answer Injection
Headless mode has built-in auto-responders:
Machine surface has built-in auto-responders:
- **select** → picks first option
- **confirm** → auto-confirms
- **input** → empty string

View file

@ -0,0 +1,247 @@
/**
* spec-projections.js docs/specs exports from SF working state.
*
* Purpose: export human-readable docs/specs files from the same SF working
* model agents use in `.sf`, so git history captures the contract without
* making docs a second source of truth.
*
* Consumer: commands-plan.js for `/sf plan specs generate|diff|check`.
*/
import { existsSync } from "node:fs";
import { join } from "node:path";
import { DatabaseSync } from "node:sqlite";
const SPEC_GENERATOR_VERSION = "1";
function present(path) {
return existsSync(path) ? "present" : "missing";
}
function countTable(db, table) {
try {
return (
db.prepare(`SELECT COUNT(*) AS count FROM ${table}`).get()?.count ?? 0
);
} catch {
return "missing";
}
}
function readDbSummary(basePath) {
const dbPath = join(basePath, ".sf", "sf.db");
if (!existsSync(dbPath)) {
return ["- Database state: missing"].join("\n");
}
let db;
try {
db = new DatabaseSync(dbPath);
const version =
db.prepare("SELECT MAX(version) AS version FROM schema_version").get()
?.version ?? "unknown";
const milestones = countTable(db, "milestones");
const slices = countTable(db, "slices");
const tasks = countTable(db, "tasks");
const milestoneSpecs = countTable(db, "milestone_specs");
const sliceSpecs = countTable(db, "slice_specs");
const taskSpecs = countTable(db, "task_specs");
return [
`- Database schema version: ${version}`,
`- DB planning rows: milestones=${milestones}, slices=${slices}, tasks=${tasks}`,
`- DB spec rows: milestone_specs=${milestoneSpecs}, slice_specs=${sliceSpecs}, task_specs=${taskSpecs}`,
].join("\n");
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
return `- Database state: unreadable (${message})`;
} finally {
db?.close();
}
}
/**
* Build the source input block shared by generated spec exports.
*
* Purpose: make every generated spec disclose which `.sf` working model,
* database, and source inputs it was projected from so reviewers can audit
* drift.
*
* Consumer: generateOperatingModelSpec.
*/
export function buildWorkingModelInputs(basePath) {
const sfDir = join(basePath, ".sf");
return [
"## Working Model Inputs",
"",
`- Generator: \`sf-spec-projections@${SPEC_GENERATOR_VERSION}\``,
`- Database: \`.sf/sf.db\` (${present(join(sfDir, "sf.db"))})`,
`- Guidance: \`.sf/PRINCIPLES.md\` (${present(join(sfDir, "PRINCIPLES.md"))})`,
`- Guidance: \`.sf/TASTE.md\` (${present(join(sfDir, "TASTE.md"))})`,
`- Guidance: \`.sf/ANTI-GOALS.md\` (${present(join(sfDir, "ANTI-GOALS.md"))})`,
`- Optional knowledge: \`.sf/KNOWLEDGE.md\` (${present(join(sfDir, "KNOWLEDGE.md"))})`,
`- Optional preferences: \`.sf/PREFERENCES.md\` (${present(join(sfDir, "PREFERENCES.md"))})`,
readDbSummary(basePath),
"- Source roots analyzed as implementation evidence: `src/resources/extensions/sf/`, `src/headless*.ts`, `src/cli.ts`, `src/help-text.ts`, `web/`, `vscode-extension/`, `packages/`",
"",
"This file is a human export for review, navigation, and git history. Generated docs are allowed to change because Git keeps the human-facing history. If SF needs operational history or future-use knowledge, store it in `.sf`/DB-backed state instead of relying on this export.",
"",
].join("\n");
}
/**
* Generate the SF operating model human export.
*
* Purpose: provide one deterministic human-readable export of SF's terminology,
* state, and source-placement contract from the repo-local working model.
*
* Consumer: `/sf plan specs generate` and `/sf plan specs check`.
*/
export function generateOperatingModelSpec(basePath) {
return [
"# SF Operating Model",
"",
buildWorkingModelInputs(basePath).trimEnd(),
"",
"SF has one workflow engine and one database-first working model. TUI, CLI, web, editor integrations, and non-interactive automation must all drive the same flow:",
"",
"```text",
"intent -> structured state/evidence -> UOK/policy gates -> execution -> journal/evidence -> projected status",
"```",
"",
"The names below are separate axes. Do not use one as a synonym for another.",
"",
"## Flow",
"",
"The flow is the product behavior: how SF captures intent, plans work, applies policy, executes tasks, records evidence, and reports status. Flow behavior must not fork by UI surface. If a TUI run and a non-interactive run receive the same state, run control, and permission profile, they should follow the same control model.",
"",
"## Surface",
"",
"A surface is where a person or program drives or observes the flow.",
"",
"- **TUI surface** — interactive terminal UI.",
"- **CLI surface** — explicit commands and single-shot prompt entrypoints.",
"- **Web surface** — browser UI around the same state and flow.",
"- **Editor surface** — IDE/editor integration, usually through an adapter.",
"- **Machine surface** — non-interactive runner for CI, scripts, schedulers, and parent processes.",
"",
'`sf headless` is the current command name for the machine surface. It means "without the TUI"; it does not mean "JSON", "autonomous", or a separate flow.',
"",
"## Protocol",
"",
"A protocol is how a surface or adapter talks to SF or to another agent process.",
"",
"- **RPC** — SF's current child-process control channel.",
"- **stdio JSON-RPC** — process transport used by existing RPC-style adapters.",
"- **ACP** — editor/client protocol adapter for agent/editor interoperability.",
"- **HTTP/RPC** — possible daemon/web integration protocol.",
"- **Wire** — a low-level internal message layer, if introduced, below surfaces and adapters.",
"",
"Protocols are adapters around the flow. They should translate messages, not invent policy or planning semantics.",
"",
"## Output Format",
"",
"An output format is only the encoding of a response or event stream.",
"",
"- `text` — human-readable progress and results.",
"- `json` — one machine-readable result object.",
"- `stream-json` / JSONL — event stream for parent processes and monitors.",
"",
"JSON is not a surface, run control, or permission profile. It is one output format that the machine surface can emit.",
"",
"## Run Control",
"",
"Run control describes how far SF continues through the flow before stopping for the operator.",
"",
"- **manual** — user approves each consequential step.",
"- **assisted** — SF proposes and executes bounded steps, with human approval for important or uncertain actions.",
"- **autonomous** — SF continues through the flow until policy, evidence, budget, or completion stops it.",
"",
"`auto` is acceptable only as shorthand in commands, flags, and UI labels. The concept name is **autonomous**.",
"",
"## Permission Profile",
"",
"A permission profile describes what SF is allowed to touch when a run-control mode asks it to act.",
"",
"- **restricted** — read-mostly or planning-artifact-only permissions.",
"- **normal** — default workspace permissions for ordinary project work.",
"- **trusted** — wider local permissions for a trusted operator/workspace.",
"- **unrestricted** — explicit danger profile; use for logs, policy records, or deliberate bypass flows, not as friendly default product language.",
"",
"Run control and permission profile are independent. For example, `autonomous + restricted` can keep going with narrow permissions, while `manual + trusted` still asks before each consequential step but can perform broader approved actions.",
"",
"## Naming Rules",
"",
"- Say **flow** for the shared planning/execution engine.",
"- Say **surface** for TUI, CLI, web, editor, or machine entrypoints.",
"- Say **protocol** for ACP, RPC, stdio JSON-RPC, HTTP, or wire messages.",
"- Say **output format** for `text`, `json`, and `stream-json`.",
"- Say **run control** for `manual`, `assisted`, and `autonomous`.",
"- Say **permission profile** for `restricted`, `normal`, `trusted`, and `unrestricted`.",
"- Use **headless** only for the current `sf headless` command and implementation path. Product docs should explain it as the machine surface.",
"",
"## Working State Contract",
"",
"SF working state is database-first. An initialized SF repo has `.sf/sf.db`, and runtime tools use it as the canonical structured store for planning hierarchy, ordering, gates, ledgers, schedules, and validation-sensitive state.",
"",
"Markdown under `.sf/` has two roles:",
"",
"- working guidance and knowledge that the runtime loads, such as `PRINCIPLES.md`, `TASTE.md`, `ANTI-GOALS.md`, `KNOWLEDGE.md`, and `PREFERENCES.md`;",
"- human-readable projections from DB-owned records, such as rendered decisions, requirements, roadmap, plan, summary, and state files.",
"",
"Markdown under `docs/specs/` is a human export for review, navigation, and git history. Generated docs can change; Git records that human-facing history. If SF needs its own operational history, it should store that in `.sf`/DB-backed state. Plans should record any surface, protocol, output-format, run-control, or permission-profile impact explicitly when a milestone changes integration behavior.",
"",
"## Source Placement",
"",
"SF source placement follows the same axis model. New code should extend the owning axis instead of creating parallel trees.",
"",
"### Core Flow",
"",
"- `src/resources/extensions/sf/` owns the SF workflow extension: planning tools, UOK/runtime state, `/sf` commands, prompts, templates, doctors, schedule, and DB-backed state.",
"- `src/resources/extensions/` owns bundled extension packages loaded into the runtime.",
"- `src/resources/agents/`, `src/resources/skills/`, and `src/resources/workflows/` own bundled runtime resources, not independent product flows.",
"",
"### Surfaces",
"",
"- `src/cli.ts` and `src/help-text.ts` own CLI/session entrypoint behavior and command help.",
"- `src/headless*.ts` owns the existing `sf headless` machine-surface command path. Keep the command name; describe it as the machine surface in product language.",
"- `web/` owns the browser surface.",
"- `vscode-extension/` owns the editor surface.",
"- `packages/pi-tui/` owns reusable TUI primitives and terminal UI components.",
"",
"### Protocols And Adapters",
"",
"- `packages/rpc-client/` owns reusable RPC client protocol code.",
"- RPC child-process orchestration stays in the machine-surface path unless promoted into a reusable protocol package.",
"- ACP, stdio JSON-RPC, HTTP, and future wire layers are protocol/adapters. They should translate messages to the same SF flow, not fork planning semantics.",
"",
"### Workspace Packages",
"",
"- `packages/pi-agent-core/` owns reusable agent-core primitives.",
"- `packages/pi-ai/` owns provider/model integration.",
"- `packages/pi-coding-agent/` owns reusable coding-agent substrate inherited from Pi.",
"- `packages/daemon/` owns daemonized background service code.",
"- `packages/native/` and `rust-engine/` own native/Rust performance paths.",
"",
"### State And Projections",
"",
"- `.sf/sf.db` is the canonical structured runtime state store for initialized SF repos. Treat a missing or unreadable DB as bootstrap/recovery, not a normal alternate source of truth.",
"- `.sf/DECISIONS.md`, `.sf/REQUIREMENTS.md`, milestone roadmaps, and similar files are rendered working projections when database-backed tools own the data. They are useful to humans and agents but must not compete with DB rows.",
"- `.sf/PRINCIPLES.md`, `.sf/TASTE.md`, `.sf/ANTI-GOALS.md`, `.sf/KNOWLEDGE.md`, and `.sf/PREFERENCES.md` are repo-local working guidance files when present.",
"- Generated `.sf/` runtime files are evidence, projections, or import/recovery artifacts.",
"- Durable human-facing exports belong in `docs/specs/`, `docs/adr/`, or `docs/plans/`. They are reviewable projections and git-history artifacts, not a second planning database.",
"",
"### Placement Rules",
"",
"- Do not create a second implementation because a feature appears in another surface. Add an adapter to the same flow.",
"- Do not name output encodings as surfaces. JSON belongs to output formats.",
"- Do not name permission expansion as run control. `autonomous` means the loop continues; `trusted` or `unrestricted` means the permission profile widened.",
"- Do not route human questions because of `headless`. Questions come from run-control and permission-policy gates; the surface only determines delivery.",
"",
].join("\n");
}
export const PROMOTED_SPEC_PROJECTIONS = [
{
path: "docs/specs/sf-operating-model.md",
generate: generateOperatingModelSpec,
},
];

View file

@ -1815,7 +1815,7 @@ export async function _deriveStateImpl(basePath) {
};
}
}
// ── Mid-execution escalation (ADR-011 P2 — gsd-2 ADR) ────────────────
// ── Mid-execution escalation (ADR-011 P2 — SF ADR) ────────────────
// Pause the loop if any task in the active slice has escalation_pending=1
// and an unresolved escalation artifact. The user must run /sf escalate
// resolve before auto-mode will continue. Falls through (returns null

View file

@ -32,13 +32,13 @@ phases:
skip_reassess:
reassess_after_slice:
skip_slice_research:
# gsd-2 ADR-011 P1: progressive planning. When true, the milestone planner
# SF ADR-011 P1: progressive planning. When true, the milestone planner
# can mark slices as sketches (is_sketch=1, sketch_scope=<2-3 sentence
# hint>) and refine-slice expands each one just-in-time using prior slice
# summaries as context. When false (default), all slices get full plans up
# front. (Note: SF's local ADR-011 is "Swarm Chat" — unrelated.)
progressive_planning:
# gsd-2 ADR-011 P2: mid-execution escalation. When true, sf_task_complete honors
# SF ADR-011 P2: mid-execution escalation. When true, sf_task_complete honors
# an optional escalation: { question, options, recommendation, ... } payload.
# The agent's choice carries forward as a hard constraint into the next
# executor. See escalation_auto_accept for whether auto-mode pauses or
@ -47,7 +47,7 @@ phases:
# When true (default), an escalation in auto-mode auto-accepts the agent's
# recommendation and continues — auto-mode is autonomous. The user can
# review/override later via `/sf escalate list --all`. Set false to keep
# gsd-2's pause-and-ask behavior (loop halts until user runs
# SF's pause-and-ask behavior (loop halts until user runs
# `/sf escalate resolve`).
escalation_auto_accept:
# Deep-mode planning gate (top-level, not under phases). When set to "deep",

View file

@ -3,6 +3,10 @@
**Goal:** {{goal}}
**Demo:** {{demo}}
<!-- State the slice purpose before implementation detail: the user/system
capability this slice makes true, why it belongs in this milestone, and
the proof level claimed by the demo. -->
## Must-Haves
- {{mustHave}}
@ -32,6 +36,22 @@
- Real runtime required: {{yes/no}}
- Human/UAT required: {{yes/no}}
## Interface Axes
<!-- Omit this section entirely when the slice does not change drive/observe/integration behavior.
Keep the axes separate:
- surface: TUI, CLI, web, editor, machine
- protocol: ACP, RPC, stdio JSON-RPC, HTTP, wire
- output format: text, json, stream-json
- run control: manual, assisted, autonomous
- permission profile: restricted, normal, trusted, unrestricted -->
- Surface impact: {{surface impact or none}}
- Protocol impact: {{protocol impact or none}}
- Output-format impact: {{output-format impact or none}}
- Run-control impact: {{run-control impact or none}}
- Permission-profile impact: {{permission-profile impact or none}}
## Verification
<!-- Define what "done" looks like BEFORE detailing tasks.
@ -99,6 +119,7 @@
- "Improve UI"
Each task should usually include:
- Purpose: why this task exists and what part of the slice contract it closes
- Why: why this task exists / what part of the slice it closes
- Files: the main files likely touched
- Do: concrete implementation steps and important constraints
@ -110,12 +131,14 @@
-->
- [ ] **T01: {{taskTitle}}** `est:{{estimate}}`
- Purpose: {{purposeThisTaskServes}}
- Why: {{whyThisTaskExists}}
- Files: `{{filePath}}`, `{{filePath}}`
- Do: {{specificImplementationStepsAndConstraints}}
- Verify: {{testCommandOrRuntimeCheck}}
- Done when: {{measurableAcceptanceCondition}}
- [ ] **T02: {{taskTitle}}** `est:{{estimate}}`
- Purpose: {{purposeThisTaskServes}}
- Why: {{whyThisTaskExists}}
- Files: `{{filePath}}`, `{{filePath}}`
- Do: {{specificImplementationStepsAndConstraints}}

View file

@ -2,6 +2,9 @@
**Vision:** {{vision}}
<!-- State the milestone purpose before implementation detail: who benefits,
what capability changes, and what proof makes the milestone worth doing. -->
## Success Criteria
<!-- Write success criteria as observable truths, not implementation tasks.
@ -19,6 +22,35 @@
- {{criterion}}
- {{criterion}}
## Product / Competitor Research
<!-- Required for product-facing, workflow-facing, developer-experience, or
market-positioning work. Keep it concise and evidence-based. For purely
internal mechanical fixes, write "Not applicable — <reason>".
This section is backed by the structured productResearch payload. It uses
the same purpose-shaped contract as SF planning: purpose, consumer,
contract, failure boundary, evidence, non-goals, invariants, assumptions. -->
### Purpose Contract
- Purpose: {{whyThisResearchExists}}
- Consumer: {{whoUsesThisResearchAndForWhatDecision}}
- Contract: {{whatMustBeAnsweredBeforeRequirementsOrSlicesAreLocked}}
- Failure boundary: {{whatHappensIfResearchIsWeakOrNotApplicable}}
- Evidence: {{sourcesAndComparableEvidenceAccepted}}
- Non-goals: {{whatThisResearchDoesNotDecideOrCopy}}
- Invariants: {{rulesThatRemainTrueRegardlessOfCompetitorFindings}}
- Assumptions: {{assumptionsMadeWhereResearchCouldNotResolveUncertainty}}
### Category Findings
- Category / job: {{category}} — {{targetUserJob}}
- Comparable: {{competitorOrComparable}} — {{specificFinding}}
- Table stake: {{expectedBehavior}}
- Differentiator: {{whatThisRoadmapShouldDoBetter}}
- Do not copy: {{antiPatternOrFailureMode}}
## Key Risks / Unknowns
<!-- List the real risks and uncertainties that shape how slices are ordered.
@ -44,6 +76,20 @@
- Operational verification: {{service lifecycle / restart / reconnect / supervision / deploy-install behavior, or none}}
- UAT / human verification: {{what needs real human judgment, or none}}
## Interface Axes
<!-- Include concrete entries only when this milestone changes how SF is driven,
observed, integrated, or automated. Keep the axes separate:
surface = TUI/CLI/web/editor/machine; protocol = ACP/RPC/stdio/HTTP/wire;
output format = text/json/stream-json; run control = manual/assisted/autonomous;
permission profile = restricted/normal/trusted/unrestricted. -->
- Surfaces affected: {{TUI, CLI, web, editor, machine, or none}}
- Protocols affected: {{ACP, RPC, stdio JSON-RPC, HTTP, wire, or none}}
- Output formats affected: {{text, json, stream-json, or none}}
- Run-control modes affected: {{manual, assisted, autonomous, or none}}
- Permission profiles affected: {{restricted, normal, trusted, unrestricted, or none}}
## Milestone Definition of Done
This milestone is complete only when all are true:

View file

@ -15,6 +15,8 @@ skills_used:
## Description
Purpose: {{purposeThisTaskServes}}
{{description}}
## Failure Modes
@ -79,6 +81,21 @@ skills_used:
- How a future agent inspects this: {{command, endpoint, file, UI state}}
- Failure state exposed: {{what becomes visible on failure}}
## Interface Impact
<!-- OMIT THIS SECTION ENTIRELY for tasks that do not change how SF is driven,
observed, integrated, or automated.
Keep axes separate: surface (TUI/CLI/web/editor/machine), protocol
(ACP/RPC/stdio/HTTP/wire), output format (text/json/stream-json), run control
(manual/assisted/autonomous), permission profile
(restricted/normal/trusted/unrestricted). -->
- Surface: {{surface changed or none}}
- Protocol: {{protocol changed or none}}
- Output format: {{output format changed or none}}
- Run control: {{run control changed or none}}
- Permission profile: {{permission profile changed or none}}
## Inputs
<!-- Every input MUST be a backtick-wrapped file path. These paths are machine-parsed to

View file

@ -14,8 +14,7 @@ import {
SCAFFOLD_FILES,
} from "../agentic-docs-scaffold.js";
import { checkScaffoldFreshness } from "../doctor-runtime-checks.js";
import { detectScaffoldDrift } from "../scaffold-drift.js";
import { extractMarker, stampScaffoldFile } from "../scaffold-versioning.js";
import { stampScaffoldFile } from "../scaffold-versioning.js";
const tmpRoots = [];
@ -48,59 +47,6 @@ test("ensureAgenticDocsScaffold_removes_owned_root_harness_and_writes_sf_harness
assert.equal(existsSync(join(root, ".sf/harness/specs/bootstrap.md")), true);
});
test("detectScaffoldDrift_when_unmarked_gsd_managed_section_marks_upgradable", () => {
const root = makeProject();
const target = join(root, "AGENTS.md");
writeFileSync(
target,
[
"# Local Agents",
"",
"<!-- GSD:project-start source:PROJECT.md -->",
"Run `gsd headless auto` and keep `.gsd/` state current.",
"<!-- /GSD:project-end -->",
"",
].join("\n"),
"utf-8",
);
const item = detectScaffoldDrift(root).items.find(
(candidate) => candidate.path === "AGENTS.md",
);
assert.equal(item.bucket, "upgradable");
assert.equal(item.legacyMigration, "gsd-managed-markers");
});
test("ensureAgenticDocsScaffold_rewrites_and_stamps_legacy_gsd_managed_section", () => {
const root = makeProject();
const target = join(root, "AGENTS.md");
writeFileSync(
target,
[
"# Local Agents",
"",
"Keep this repo note.",
"<!-- GSD:project-start source:PROJECT.md -->",
"Run `gsd headless auto` and keep `.gsd/` state current.",
"<!-- /GSD:project-end -->",
"",
].join("\n"),
"utf-8",
);
ensureAgenticDocsScaffold(root);
const { marker, body } = extractMarker(target);
assert.equal(marker?.state, "completed");
assert.equal(marker?.template, "AGENTS.md");
assert.match(body, /Keep this repo note\./);
assert.match(body, /<!-- SF:project-start source:PROJECT\.md -->/);
assert.match(body, /sf headless auto/);
assert.match(body, /\.sf\//);
assert.doesNotMatch(body, /GSD|gsd|\.gsd/);
});
test("ensureAgenticDocsScaffold_removes_unmarked_generated_root_harness", () => {
const root = makeProject();
const target = join(root, "harness/specs/bootstrap.md");

View file

@ -0,0 +1,29 @@
import assert from "node:assert/strict";
import { test } from "vitest";
import { handleAutoCommand } from "../commands/handlers/auto.js";
function makeCtx() {
const notifications = [];
return {
ctx: {
ui: {
notify(message, level = "info") {
notifications.push({ message, level });
},
},
},
notifications,
};
}
test("handleAutoCommand_when_autonomous_full_requested_reports_removed_flag", async () => {
const { ctx, notifications } = makeCtx();
const handled = await handleAutoCommand("autonomous --full", ctx, {});
assert.equal(handled, true);
assert.equal(notifications.length, 1);
assert.equal(notifications[0].level, "warning");
assert.match(notifications[0].message, /was removed/);
assert.match(notifications[0].message, /Use `\/sf autonomous`/);
});

View file

@ -59,7 +59,7 @@ test("backlog_add_when_db_available_persists_without_work_queue_markdown", async
assert.equal(messages.at(-1).level, "success");
});
test("backlog_list_when_legacy_work_queue_exists_imports_once_to_db", async () => {
test("backlog_list_when_work_queue_markdown_exists_ignores_it_and_uses_db", async () => {
const project = makeProject();
const sfDir = join(project, ".sf");
mkdirSync(sfDir, { recursive: true });
@ -80,10 +80,8 @@ test("backlog_list_when_legacy_work_queue_exists_imports_once_to_db", async () =
}
const items = listBacklogItems();
assert.equal(items.length, 1);
assert.equal(items[0].id, "999.7");
assert.equal(items[0].source, "legacy-work-queue");
assert.match(messages.at(-1).message, /999\.7/);
assert.equal(items.length, 0);
assert.match(messages.at(-1).message, /Backlog is empty/);
});
test("backlog_add_when_switching_projects_uses_current_project_db", async () => {

View file

@ -1,6 +1,5 @@
import assert from "node:assert/strict";
import {
existsSync,
mkdirSync,
mkdtempSync,
readFileSync,
@ -84,32 +83,6 @@ describe("doctor .sf form lint", () => {
]);
});
test("runSFDoctor_fix_rewrites_legacy_gsd_scaffold_text_in_sf_guidance", async () => {
const project = makeProject();
writeFileSync(
join(project, ".sf", "AGENTS.md"),
"# GSD guidance\n\nGSD owns this scaffold.\n",
"utf-8",
);
const report = await runSFDoctor(project, {
fix: true,
fixLevel: "all",
scope: "project",
});
assert.ok(
report.fixesApplied.includes(
"rewrote legacy GSD naming in .sf/AGENTS.md",
),
);
assert.equal(existsSync(join(project, ".sf", "AGENTS.md")), true);
assert.equal(
readFileSync(join(project, ".sf", "AGENTS.md"), "utf-8"),
"# SF guidance\n\nSF owns this scaffold.\n",
);
});
test("runSFDoctor_fix_repairs_json_object_written_as_jsonl", async () => {
const project = makeProject();
const target = join(project, ".sf", "self-feedback-resolved.jsonl");

View file

@ -17,6 +17,7 @@ describe("execution policy profiles", () => {
assert.equal(decision.allowed, true);
assert.equal(decision.profile, "plan");
assert.equal(decision.permissionProfile, "restricted");
});
test("classifyExecutionPolicyCall_plan_blocks_source_write", () => {
@ -44,10 +45,26 @@ describe("execution policy profiles", () => {
);
assert.equal(decision.allowed, true);
assert.equal(decision.permissionProfile, "normal");
assert.equal(decision.risk, "destructive");
assert.deepEqual(decision.destructiveLabels, ["hard reset"]);
});
test("resolveExecutionPolicyProfile_when_trusted_returns_trusted_permission_profile", () => {
const profile = resolveExecutionPolicyProfile("trusted");
const decision = classifyExecutionPolicyCall(
"trusted",
"bash",
"git status",
);
assert.equal(profile.id, "trusted");
assert.equal(profile.permissionProfile, "trusted");
assert.equal(decision.profile, "trusted");
assert.equal(decision.permissionProfile, "trusted");
assert.equal(decision.allowed, true);
});
test("extractExecutionPolicyInput_uses_command_or_path_without_full_payload", () => {
assert.equal(
extractExecutionPolicyInput("bash", {
@ -83,5 +100,6 @@ describe("execution policy profiles", () => {
assert.equal(entry.data.input, "src/app.ts");
assert.equal(entry.data.decision.allowed, false);
assert.equal(entry.data.decision.profile, "plan");
assert.equal(entry.data.decision.permissionProfile, "restricted");
});
});

View file

@ -45,6 +45,50 @@ describe("roadmap JSON projection", () => {
planning: {
vision: "Keep dispatch off rendered Markdown.",
successCriteria: ["JSON projection exists."],
visionMeeting: {
trigger: "Need category research before locking the roadmap.",
pm: "The roadmap must make planning state dependable.",
userAdvocate:
"Users need dispatch to stay stable after rendering changes.",
customerPanel:
"Maintainers value deterministic state; operators value readable projections.",
business: "Broken dispatch blocks adoption.",
researcher:
"Comparable systems: Spec Kit keeps specs as durable workflow artifacts; Forge should keep DB state authoritative while rendering readable projections.",
deliveryLead: "One projection slice is enough.",
partner: "DB-origin projection is the right direction.",
combatant:
"Rendered markdown could drift from DB if not regenerated.",
architect:
"The DB projection is the stable contract; markdown is a view.",
moderator:
"Proceed with DB-origin projection and explicit rendering.",
weightedSynthesis:
"Keep dispatch on JSON and render markdown for humans.",
confidenceByArea: "High for projection, medium for migration.",
recommendedRoute: "planning",
},
productResearch: {
purpose: "Decide how roadmap projections compare to spec systems.",
consumer: "Milestone planners deciding projection requirements.",
contract:
"Identify what comparable spec systems store durably before locking slices.",
failureBoundary:
"If findings are weak, keep projection scope internal and avoid product claims.",
evidence:
"Spec Kit was inspected as a comparable spec workflow artifact system.",
nonGoals: "Do not copy Spec Kit's project layout or command surface.",
invariants:
"Forge keeps DB state authoritative and renders markdown as a projection.",
assumptions:
"Spec Kit behavior is representative enough for this projection decision.",
category: "agent planning systems",
targetJob: "keep roadmap state machine-readable and human-readable",
comparables: ["Spec Kit keeps specs as durable workflow artifacts."],
tableStakes: ["Rendered plans remain readable to humans."],
differentiators: ["DB-origin projection stays authoritative."],
antiPatterns: ["Do not make markdown the dispatch source of truth."],
},
},
});
insertSlice({
@ -61,6 +105,14 @@ describe("roadmap JSON projection", () => {
const result = await renderRoadmapFromDb(project, "M451");
assert.equal(existsSync(result.roadmapPath), true);
const roadmap = readFileSync(result.roadmapPath, "utf-8");
assert.match(roadmap, /## Product \/ Competitor Research/);
assert.match(roadmap, /### Purpose Contract/);
assert.match(roadmap, /Purpose: Decide how roadmap projections/);
assert.match(
roadmap,
/Comparable: Spec Kit keeps specs as durable workflow artifacts/,
);
assert.equal(existsSync(result.roadmapJsonPath), true);
const json = JSON.parse(readFileSync(result.roadmapJsonPath, "utf-8"));
assert.equal(json.schemaVersion, 1);
@ -68,6 +120,10 @@ describe("roadmap JSON projection", () => {
assert.equal(typeof json.generatedAt, "string");
assert.equal(json.sourceDbPath, join(project, ".sf", "sf.db"));
assert.equal(json.milestone.id, "M451");
assert.equal(
json.milestone.productResearch.purpose,
"Decide how roadmap projections compare to spec systems.",
);
assert.deepEqual(
json.slices.map((slice) => [
slice.id,

View file

@ -18,9 +18,11 @@ import {
getScheduleEntries,
insertGateRun,
insertJudgment,
insertMilestone,
insertRetrievalEvidence,
insertScheduleEntry,
openDatabase,
reconcileWorktreeDb,
} from "../sf-db.js";
const tmpDirs = [];
@ -202,10 +204,20 @@ test("openDatabase_migrates_v27_tasks_without_created_at_through_spec_backfill",
const columns = db.prepare("PRAGMA table_info(tasks)").all();
assert.ok(columns.some((row) => row.name === "created_at"));
const milestoneColumns = db.prepare("PRAGMA table_info(milestones)").all();
assert.ok(
milestoneColumns.some((row) => row.name === "product_research_json"),
);
const milestoneSpecColumns = db
.prepare("PRAGMA table_info(milestone_specs)")
.all();
assert.ok(
milestoneSpecColumns.some((row) => row.name === "product_research_json"),
);
const version = db
.prepare("SELECT MAX(version) AS version FROM schema_version")
.get();
assert.equal(version.version, 41);
assert.equal(version.version, 42);
const taskSpec = db
.prepare(
"SELECT milestone_id, slice_id, task_id, verify FROM task_specs WHERE task_id = 'T01'",
@ -264,6 +276,44 @@ test("openDatabase_when_fresh_db_supports_gate_run_micro_usd", () => {
assert.equal(row.cost_micro_usd, 123_456);
});
test("reconcileWorktreeDb_when_worktree_lacks_product_research_column_merges_milestones", () => {
const dir = mkdtempSync(join(tmpdir(), "sf-worktree-v41-"));
tmpDirs.push(dir);
const mainDbPath = join(dir, "main.db");
const worktreeDbPath = join(dir, "worktree.db");
assert.equal(openDatabase(mainDbPath), true);
closeDatabase();
assert.equal(openDatabase(worktreeDbPath), true);
insertMilestone({
id: "M041",
title: "Old worktree milestone",
status: "active",
});
closeDatabase();
const wt = new DatabaseSync(worktreeDbPath);
wt.exec("ALTER TABLE milestones DROP COLUMN product_research_json");
wt.close();
assert.equal(openDatabase(mainDbPath), true);
const merged = reconcileWorktreeDb(mainDbPath, worktreeDbPath);
assert.equal(merged.conflicts.length, 0);
assert.equal(merged.milestones, 1);
const row = getDatabase()
.prepare(
"SELECT id, title, product_research_json FROM milestones WHERE id = 'M041'",
)
.get();
assert.deepEqual(row, {
id: "M041",
title: "Old worktree milestone",
product_research_json: "",
});
});
test("openDatabase_migrates_v35_gate_cost_usd_to_micro_usd", () => {
const dbPath = makeLegacyV35GateRunsDb();

View file

@ -28,6 +28,22 @@ test("specTables_when_live_planning_written_persist_write_once_intent", () => {
planning: {
vision: "Initial milestone vision",
successCriteria: ["initial criterion"],
productResearch: {
purpose: "Validate product research spec persistence.",
consumer: "Spec table tests.",
contract: "Initial intent must be preserved in milestone_specs.",
failureBoundary: "Fail if a replan overwrites original spec intent.",
evidence: "Unit test fixture.",
nonGoals: "Do not validate rendered roadmap prose here.",
invariants: "Spec rows are insert-once intent records.",
assumptions: "The in-memory DB uses current schema.",
category: "test harness",
targetJob: "prove spec persistence",
comparables: ["Spec Kit durable specs"],
tableStakes: ["Initial spec data survives replans"],
differentiators: ["DB spec table remains authoritative"],
antiPatterns: ["Do not overwrite original intent silently"],
},
},
});
insertSlice({
@ -54,6 +70,22 @@ test("specTables_when_live_planning_written_persist_write_once_intent", () => {
upsertMilestonePlanning("M001", {
vision: "Changed milestone vision",
successCriteria: ["changed criterion"],
productResearch: {
purpose: "Changed product research.",
consumer: "Changed consumer.",
contract: "Changed contract.",
failureBoundary: "Changed failure boundary.",
evidence: "Changed evidence.",
nonGoals: "Changed non-goals.",
invariants: "Changed invariants.",
assumptions: "Changed assumptions.",
category: "changed category",
targetJob: "changed job",
comparables: ["changed comparable"],
tableStakes: ["changed table stake"],
differentiators: ["changed differentiator"],
antiPatterns: ["changed anti-pattern"],
},
});
upsertSlicePlanning("M001", "S01", {
goal: "Changed slice goal",
@ -70,6 +102,10 @@ test("specTables_when_live_planning_written_persist_write_once_intent", () => {
assert.deepEqual(JSON.parse(milestoneSpec.success_criteria), [
"initial criterion",
]);
assert.equal(
JSON.parse(milestoneSpec.product_research_json).purpose,
"Validate product research spec persistence.",
);
assert.equal(milestoneSpec.spec_version, 1);
const sliceSpec = getSliceSpec("M001", "S01");
@ -83,3 +119,47 @@ test("specTables_when_live_planning_written_persist_write_once_intent", () => {
assert.deepEqual(JSON.parse(taskSpec.expected_output), ["initial output"]);
assert.equal(taskSpec.spec_version, 1);
});
test("specTables_when_shell_milestone_created_before_planning_captures_first_real_plan", () => {
openDatabase(":memory:");
insertMilestone({
id: "M002",
title: "Shell first",
status: "active",
});
upsertMilestonePlanning("M002", {
vision: "First real planning payload",
successCriteria: ["first criterion"],
productResearch: {
purpose: "Capture first real product research.",
consumer: "Milestone planners.",
contract: "Shell creation must not freeze an empty spec row.",
failureBoundary: "Fail if milestone_specs stays empty.",
evidence: "Unit test fixture.",
nonGoals: "Do not update non-empty immutable spec rows.",
invariants: "First real planning payload is the spec intent.",
assumptions: "Shell milestones can exist before planning.",
category: "planning runtime",
targetJob: "preserve spec intent",
comparables: ["Spec Kit first generated spec"],
tableStakes: ["Specs are not accidentally empty."],
differentiators: ["DB shell rows can be safely filled later."],
antiPatterns: ["INSERT OR IGNORE preserving empty placeholders"],
},
});
upsertMilestonePlanning("M002", {
vision: "Changed planning payload",
successCriteria: ["changed criterion"],
});
const milestoneSpec = getMilestoneSpec("M002");
assert.equal(milestoneSpec.vision, "First real planning payload");
assert.deepEqual(JSON.parse(milestoneSpec.success_criteria), [
"first criterion",
]);
assert.equal(
JSON.parse(milestoneSpec.product_research_json).purpose,
"Capture first real product research.",
);
});

View file

@ -86,6 +86,54 @@ function validateProofStrategy(value) {
};
});
}
const PRODUCT_RESEARCH_TEXT_FIELDS = [
"purpose",
"consumer",
"contract",
"failureBoundary",
"evidence",
"nonGoals",
"invariants",
"assumptions",
"category",
"targetJob",
];
const PRODUCT_RESEARCH_ARRAY_FIELDS = [
"comparables",
"tableStakes",
"differentiators",
"antiPatterns",
];
function validateProductResearch(value) {
if (!value || typeof value !== "object" || Array.isArray(value)) {
throw new Error("productResearch must be an object");
}
const normalized = {};
for (const field of PRODUCT_RESEARCH_TEXT_FIELDS) {
if (!isNonEmptyString(value[field])) {
throw new Error(`productResearch.${field} must be a non-empty string`);
}
normalized[field] = normalizePlanningText(
value[field],
`productResearch.${field}`,
);
}
for (const field of PRODUCT_RESEARCH_ARRAY_FIELDS) {
const items = value[field];
if (
!Array.isArray(items) ||
items.some((item) => !isNonEmptyString(item))
) {
throw new Error(
`productResearch.${field} must be an array of non-empty strings`,
);
}
normalized[field] = items.map((item, index) =>
normalizePlanningText(item, `productResearch.${field}[${index}]`),
);
}
return normalized;
}
function validateSlices(value) {
if (!Array.isArray(value) || value.length === 0) {
throw new Error("slices must be a non-empty array");
@ -108,7 +156,7 @@ function validateSlices(value) {
const observabilityImpact = obj.observabilityImpact;
const isSketchRaw = obj.isSketch;
const sketchScopeRaw = obj.sketchScope;
// gsd-2 ADR-011: preserve 3-valued isSketch semantics (true/false/absent).
// SF ADR-011: preserve 3-valued isSketch semantics (true/false/absent).
// Absent must round-trip as undefined so DB upsert ON CONFLICT preserves
// any existing is_sketch row state rather than silently overwriting.
const isSketch =
@ -134,7 +182,7 @@ function validateSlices(value) {
throw new Error(`slices[${index}].demo must be a non-empty string`);
if (!isNonEmptyString(goal))
throw new Error(`slices[${index}].goal must be a non-empty string`);
// gsd-2 ADR-011: sketch slices defer the heavyweight planning fields to
// SF ADR-011: sketch slices defer the heavyweight planning fields to
// refine-slice. Non-sketch slices must populate them up front.
if (isSketch === true) {
if (!isNonEmptyString(sketchScopeRaw)) {
@ -309,6 +357,9 @@ function validateParams(params) {
params.boundaryMapMarkdown ?? "Not provided.",
"boundaryMapMarkdown",
),
productResearch: params.productResearch
? validateProductResearch(params.productResearch)
: undefined,
visionMeeting: hasStructuredVisionAlignmentMeeting(params.visionMeeting)
? normalizeVisionMeeting(params.visionMeeting)
: undefined,
@ -386,6 +437,7 @@ export async function handlePlanMilestone(rawParams, basePath) {
definitionOfDone: params.definitionOfDone,
requirementCoverage: params.requirementCoverage,
boundaryMapMarkdown: params.boundaryMapMarkdown,
productResearch: params.productResearch,
visionMeeting: params.visionMeeting,
});
for (let i = 0; i < slices.length; i++) {
@ -411,7 +463,7 @@ export async function handlePlanMilestone(rawParams, basePath) {
isSketch: slice.isSketch,
sketchScope: slice.sketchScope,
});
// gsd-2 ADR-011: sketches defer planning fields to refine-slice — only
// SF ADR-011: sketches defer planning fields to refine-slice — only
// upsert when we actually have content to write.
if (slice.isSketch !== true) {
upsertSlicePlanning(params.milestoneId, slice.sliceId, {
@ -432,6 +484,7 @@ export async function handlePlanMilestone(rawParams, basePath) {
proofStrategy: params.proofStrategy ?? "",
verificationContract: params.verificationContract ?? "",
boundaryMapMarkdown: params.boundaryMapMarkdown ?? "",
productResearch: params.productResearch ?? null,
slices:
slices.map((s) => ({ sliceId: s.sliceId, title: s.title })) ?? [],
});

View file

@ -265,7 +265,7 @@ export async function handlePlanSlice(rawParams, basePath) {
adversarialReview: params.adversarialReview,
planningMeeting: params.planningMeeting,
});
// gsd-2 ADR-011: when sf_plan_slice runs against a sketch slice (refine-slice
// SF ADR-011: when sf_plan_slice runs against a sketch slice (refine-slice
// produced a full plan from the sketch_scope hint), clear the is_sketch
// flag atomically with the plan write so the next dispatch cycle no
// longer routes to refine. Idempotent — no-op for non-sketches.

View file

@ -211,7 +211,7 @@ export async function executeTaskComplete(params, basePath = process.cwd()) {
isError: true,
};
}
// gsd-2 ADR-011 P2: optional escalation payload. Only honored when the
// SF ADR-011 P2: optional escalation payload. Only honored when the
// phases.mid_execution_escalation preference is true. When false (default),
// any escalation field on the params is silently ignored — keeps the
// payload backwards-compatible for callers that always send it.

View file

@ -111,7 +111,7 @@ export const KNOWN_UNIT_TYPES = [
"run-uat",
"gate-evaluate",
"rewrite-docs",
// gsd-2 ADR-011 deep planning gate (project-scoped, before any milestone work)
// SF ADR-011 deep planning gate (project-scoped, before any milestone work)
"discuss-project",
"discuss-requirements",
"research-project",
@ -451,7 +451,7 @@ export const UNIT_MANIFESTS = {
},
maxSystemPromptChars: COMMON_BUDGET_MEDIUM,
},
// ─── Project-scoped (deep planning gate, gsd-2 ADR-011) ───────────────────
// ─── Project-scoped (deep planning gate, SF ADR-011) ───────────────────
"discuss-project": {
skills: { mode: "all" },
knowledge: "full",

View file

@ -6,7 +6,7 @@ export function isNonEmptyString(value) {
return typeof value === "string" && value.trim().length > 0;
}
const LEAKED_JSON_FIELD_RE =
/(^|[,{]\s*)"(?:successCriteria|proofLevel|integrationClosure|observabilityImpact|planningMeeting|visionMeeting|tasks|updatedTasks|removedTaskIds|verify|expectedOutput|oneLiner|narrative|verification|uatContent|keyFiles|keyDecisions|deviations|knownIssues|knownLimitations|followUps|operationalReadiness)"\s*:|",\s*"[A-Za-z][A-Za-z0-9_]*"\s*:/;
/(^|[,{]\s*)"(?:successCriteria|proofLevel|integrationClosure|observabilityImpact|planningMeeting|visionMeeting|productResearch|tasks|updatedTasks|removedTaskIds|verify|expectedOutput|oneLiner|narrative|verification|uatContent|keyFiles|keyDecisions|deviations|knownIssues|knownLimitations|followUps|operationalReadiness)"\s*:|",\s*"[A-Za-z][A-Za-z0-9_]*"\s*:/;
/**
* Normalize freeform model-authored markdown fields before persistence.
* Tool schemas prove the outer type, but models can still pass JSON-ish debris

View file

@ -124,6 +124,11 @@ export function snapshotState() {
r["vision_meeting_json"].trim().length > 0
? JSON.parse(r["vision_meeting_json"])
: null,
product_research:
typeof r["product_research_json"] === "string" &&
r["product_research_json"].trim().length > 0
? JSON.parse(r["product_research_json"])
: null,
}));
const rawSlices = db
.prepare("SELECT * FROM slices ORDER BY milestone_id, sequence, id")

View file

@ -310,6 +310,10 @@ export function renderRoadmapContent(milestoneRow, sliceRows) {
lines.push("## Vision");
lines.push(milestoneRow.vision || milestoneRow.title || "TBD");
lines.push("");
if (milestoneRow.product_research) {
lines.push(...renderProductResearchLines(milestoneRow.product_research));
lines.push("");
}
if (milestoneRow.vision_meeting) {
lines.push("## Vision Alignment Meeting");
lines.push("");
@ -378,6 +382,39 @@ export function renderRoadmapContent(milestoneRow, sliceRows) {
lines.push("");
return lines.join("\n");
}
function renderProductResearchLines(research) {
const lines = [
"## Product / Competitor Research",
"",
"### Purpose Contract",
"",
`- Purpose: ${research.purpose}`,
`- Consumer: ${research.consumer}`,
`- Contract: ${research.contract}`,
`- Failure boundary: ${research.failureBoundary}`,
`- Evidence: ${research.evidence}`,
`- Non-goals: ${research.nonGoals}`,
`- Invariants: ${research.invariants}`,
`- Assumptions: ${research.assumptions}`,
"",
"### Category Findings",
"",
`- Category / job: ${research.category}${research.targetJob}`,
];
for (const item of research.comparables ?? []) {
lines.push(`- Comparable: ${item}`);
}
for (const item of research.tableStakes ?? []) {
lines.push(`- Table stake: ${item}`);
}
for (const item of research.differentiators ?? []) {
lines.push(`- Differentiator: ${item}`);
}
for (const item of research.antiPatterns ?? []) {
lines.push(`- Do not copy: ${item}`);
}
return lines;
}
/**
* Render ROADMAP.md projection to disk for a specific milestone.
* Queries DB via helper functions, renders content, writes via atomicWriteSync.

View file

@ -1,6 +1,6 @@
/**
* Tests for autonomous routing verifies that `autonomous` is routed to
* headless mode so it doesn't fall through to the interactive TUI, which hangs
* machine surface so it doesn't fall through to the interactive TUI, which hangs
* when stdin/stdout are piped.
*
* Regression test for #2732.
@ -68,7 +68,7 @@ test("cli.ts routes `autonomous` to headless runner", () => {
/messages\[0\]\s*===\s*['"]autonomous['"][\s\S]*?runHeadless/;
assert.ok(
autoBlockRegex.test(cliSource),
"`autonomous` subcommand handler must invoke runHeadless to delegate to headless mode",
"`autonomous` subcommand handler must invoke runHeadless to delegate to machine surface",
);
});

View file

@ -32,7 +32,7 @@ const EXPLICIT_SUBCOMMANDS = new Set([
/**
* Detect whether the current subcommand should be auto-redirected
* to headless mode when stdout is not a TTY.
* to machine surface when stdout is not a TTY.
*
* Returns true when the subcommand is "autonomous" and stdout is piped.
*/

View file

@ -108,6 +108,8 @@ function parseHeadlessArgs(argv: string[]): HeadlessOptions {
options.resumeSession = args[++i];
} else if (arg === "--bare") {
options.bare = true;
} else if (commandSeen) {
options.commandArgs.push(arg);
}
} else if (!commandSeen) {
if (arg === "autonomous") {
@ -276,6 +278,19 @@ test("no --resume means undefined", () => {
assert.equal(opts.resumeSession, undefined);
});
test("unknown command flags after subcommand are preserved for slash command handlers", () => {
const opts = parseHeadlessArgs([
"node",
"sf",
"headless",
"autonomous",
"--full",
]);
assert.equal(opts.command, "autonomous");
assert.deepEqual(opts.commandArgs, ["--full"]);
});
// ─── Exit code constants ───────────────────────────────────────────────────
test("EXIT_SUCCESS is 0", () => {

View file

@ -1,10 +1,11 @@
import assert from "node:assert/strict";
import { mkdirSync, mkdtempSync, writeFileSync } from "node:fs";
import { existsSync, mkdirSync, mkdtempSync, writeFileSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import { afterEach, test } from "vitest";
import {
bootstrapProject,
buildAutoBootstrapContext,
hasMilestones,
hasProjectMilestones,
@ -21,9 +22,14 @@ afterEach(() => {
test("buildAutoBootstrapContext includes purpose docs and source inventory", () => {
const root = mkdtempSync(join(tmpdir(), "sf-headless-bootstrap-"));
mkdirSync(join(root, ".sf"), { recursive: true });
mkdirSync(join(root, "docs"), { recursive: true });
mkdirSync(join(root, "src"), { recursive: true });
writeFileSync(
join(root, ".sf", "PROJECT.md"),
"# Project\n\nSF owns state.\n",
);
writeFileSync(join(root, "VISION.md"), "# Vision\n\nBuild the runner.\n");
writeFileSync(join(root, "TODO.md"), "# TODO\n\nWire ACE.\n");
writeFileSync(join(root, "docs", "architecture.md"), "# Architecture\n");
@ -32,10 +38,14 @@ test("buildAutoBootstrapContext includes purpose docs and source inventory", ()
const context = buildAutoBootstrapContext(root);
assert.match(context, /Autonomous Repo Bootstrap/);
assert.match(context, /SF working specs first/);
assert.match(context, /\.sf working model/);
assert.match(context, /product category and representative competitors/);
assert.match(context, /purpose, vision, architecture/);
assert.match(context, /ACE spec-first TDD/);
assert.match(context, /explorer-style subagents/);
assert.match(context, /harness-engineering principles/);
assert.match(context, /## \.sf\/PROJECT\.md/);
assert.match(context, /## VISION\.md/);
assert.match(context, /## TODO\.md/);
assert.match(context, /## docs\/architecture\.md/);
@ -43,6 +53,63 @@ test("buildAutoBootstrapContext includes purpose docs and source inventory", ()
assert.match(context, /src\/main\.ts/);
});
test("buildAutoBootstrapContext prioritizes SF specs over orientation and legacy spec docs", () => {
const root = mkdtempSync(join(tmpdir(), "sf-headless-spec-priority-"));
mkdirSync(join(root, ".sf"), { recursive: true });
mkdirSync(join(root, "docs", "specs"), { recursive: true });
writeFileSync(
join(root, ".sf", "PROJECT.md"),
"# Project\n\nCanonical SF project spec.\n",
);
writeFileSync(join(root, "BASE_SPEC.md"), "# Base Spec\n\nImported seed.\n");
writeFileSync(
join(root, "docs", "specs", "README.md"),
"# Specs\n\nImported durable specs.\n",
);
writeFileSync(join(root, "README.md"), "# Readme\n\nOrientation.\n");
const context = buildAutoBootstrapContext(root);
const projectIndex = context.indexOf("## .sf/PROJECT.md");
const baseSpecIndex = context.indexOf("## BASE_SPEC.md");
const docsSpecIndex = context.indexOf("## docs/specs/README.md");
const readmeIndex = context.indexOf("## README.md");
assert.notEqual(projectIndex, -1);
assert.notEqual(baseSpecIndex, -1);
assert.notEqual(docsSpecIndex, -1);
assert.notEqual(readmeIndex, -1);
assert.ok(projectIndex < baseSpecIndex);
assert.ok(projectIndex < readmeIndex);
assert.ok(readmeIndex < baseSpecIndex);
assert.ok(readmeIndex < docsSpecIndex);
});
test("bootstrapProject_keeps_old_state_dirs_separate_and_does_not_auto_add_serena_mcp", () => {
const root = mkdtempSync(join(tmpdir(), "sf-headless-no-compat-"));
const home = mkdtempSync(join(tmpdir(), "sf-headless-home-"));
const oldHome = process.env.HOME;
process.env.HOME = home;
mkdirSync(join(root, ".gsd"), { recursive: true });
writeFileSync(join(root, ".gsd", "state.md"), "# old state\n");
try {
bootstrapProject(root);
} finally {
if (oldHome === undefined) {
delete process.env.HOME;
} else {
process.env.HOME = oldHome;
}
}
assert.equal(existsSync(join(root, ".gsd", "state.md")), true);
assert.equal(existsSync(join(root, ".sf", "milestones")), true);
assert.equal(existsSync(join(root, ".sf", "mcp.json")), false);
assert.equal(existsSync(join(home, ".serena", "serena_config.yml")), false);
});
test("hasMilestones only reports true when milestone directories exist", () => {
const root = mkdtempSync(join(tmpdir(), "sf-headless-milestones-"));

View file

@ -1,6 +1,6 @@
/**
* Regression test for #3547: discuss and plan must be classified as
* multi-turn commands in headless mode.
* multi-turn commands in machine surface.
*/
import assert from "node:assert/strict";

Some files were not shown because too many files have changed in this diff Show more