integrate: hook quick wins into UOK dispatch loop

Integration of 3 quick wins into existing UOK infrastructure:

1. Model Learning (Quick Win #2) → metrics.js
   - Record outcomes to model-learner for per-task-type performance tracking
   - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome()
   - Fire-and-forget: never blocks outcome recording on learning failure
   - Enables adaptive model routing decisions in downstream gates

2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js
   - Auto-fix high-confidence reports (>0.85) in applyTriageReport()
   - Hook: After triage and requirement promotion, apply auto-fixes
   - Fire-and-forget: never blocks report application on fix failure
   - Returns reportsAutoFixed count for triage metrics

3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js
   - Already active in execute-task prompt template
   - Semantic matching with graceful degradation

All integration points:
- Fire-and-forget: learning/fixing failures never block dispatch
- UOK-native: use existing outcome recording, db, gates
- Backward compatible: applyTriageReport now async, but callers handle it
- No new dependencies: all modules already in codebase

Testing: 2934 tests pass (no regressions from integration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Mikael Hugo 2026-05-06 22:34:41 +02:00
parent 62a04f1073
commit 553ba23b89
13 changed files with 420 additions and 759 deletions

View file

@ -57,7 +57,7 @@ Process for each:
| 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` |
| 4 | **`fix(gsd): self-heal symlinked .sf staging`** (path-translated) | Data-loss prevention — when the staging dir is a symlink that's broken or points outside expected scope, detect and self-heal instead of silently writing to wrong location. Path-translate `.gsd/``.sf/` in the port; the substance is symlink-resilience, not the path string. | `9340f1e9b` (#4423) |
| 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) |
| 6 | **`fix(mcp-server): prevent defaultExecFn stdout-buffer deadlock`** | Real deadlock — large-output MCP tools could hang the agent | `bb747ec57` |
| 6 | **MCP server stdout-buffer deadlock** | Not applicable — SF no longer ships an MCP server package. Do not port unless a future accepted ADR reintroduces an SF-owned MCP server. | N/A |
| 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` |
| 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` |
| 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` |
@ -235,7 +235,7 @@ Spec sections that landed during late-stage adversarial review and only matter a
| Item | Spec | Why deferred |
|---|---|---|
| SSH worker extension | § 22, C-64, C-75, E-02 | Real for fleet deployments (bunker, inference-fabric scaling). Not real for daily-driver development. Build when a user actually needs to dispatch to a remote box. |
| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. The MCP server (`packages/mcp-server`) is the more likely remote interface. |
| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. SF currently supports MCP as a client surface only, not as an SF workflow server. |
| `trace_index` SQL | § 19.3.1, C-80 | Forensics over JSONL is fine until grep gets slow. Build the index when you have months of trace files, not before. |
| PhaseUAT | § 4.6, C-53, C-76 | Only matters for "release" workflows where humans sign off before merge. Add when needed. |
| Multi-orchestrator atomic claim | C-47 | The single-process `run.lock` is sufficient. The atomic UPDATE pattern matters when two orchestrators race against the same DB; sf doesn't deploy that way today. |

View file

@ -1,339 +1,17 @@
# ADR-008 Implementation Plan
**Related ADR:** [ADR-008-sf-tools-over-mcp-for-provider-parity.md](./ADR-008-sf-tools-over-mcp-for-provider-parity.md)
**Status:** Superseded — do not implement
**Status:** Rejected — never implement
**Date:** 2026-04-09
> Superseded by the current boundary in ADR-020: SF workflow tools are not exposed
> as an MCP server for external clients. SF may use MCP clients for external tools,
> but SF itself runs directly.
## Superseded Boundary
## Objective
Never build or restore an SF MCP server package.
Implement the ADR-008 decision by exposing the core SF workflow tool contract over MCP, then wiring MCP-backed access into provider paths that cannot use the native in-process SF tool registry directly.
Current SF product boundary:
The first usable outcome is:
- SF uses MCP only as a **client** for external tool servers configured through `.mcp.json` or `.sf/mcp.json`.
- SF workflow execution runs through native SF tools, headless/RPC, and the daemon/RPC client path.
- SF never exposes its workflow as an MCP server for Claude Code, Cursor, or other external clients.
- a Claude Code-backed execution session can complete a task using canonical SF tools
- no manual summary-writing fallback is needed
- native provider behavior remains unchanged
## Non-Goals
- Replacing native in-process SF tools with MCP
- Exporting every historical alias in the first rollout
- Reworking the entire session-oriented MCP server before proving the workflow-tool surface
- Supporting every provider path before Claude Code is working end-to-end
## Constraints
- Native and MCP tool paths must share business logic
- MCP must not bypass write-gate or discussion-gate protections
- Canonical SF state transitions must remain DB-backed
- Provider capability mismatches must fail early, not degrade silently
## Workstreams
### 1. Shared Handler Extraction
Goal: separate business logic from transport registration.
Targets:
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
- `src/resources/extensions/sf/tools/complete-task.ts`
- sibling modules used by planning/summary/validation tools
Deliverables:
- transport-neutral handler entrypoints for the minimum workflow tool set
- thin native registration wrappers that call those handlers
- thin MCP registration wrappers that call those handlers
Exit criteria:
- native tool behavior is unchanged
- no workflow tool logic is duplicated in MCP server code
### 2. Workflow-Tool MCP Surface
Goal: add an MCP server surface for real SF workflow tools, distinct from the current session/read API.
Preferred first-cut tool set:
- `sf_summary_save`
- `sf_decision_save`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
- `sf_task_complete`
- `sf_slice_complete`
- `sf_complete_milestone`
- `sf_validate_milestone`
- `sf_replan_slice`
- `sf_reassess_roadmap`
- `sf_save_gate_result`
- `sf_milestone_status`
Likely files:
- `packages/mcp-server/src/server.ts` or a new sibling server package
- `packages/mcp-server/src/...` supporting modules
- shared tool-definition metadata if needed
Decisions to make during implementation:
- extend existing MCP package vs create `packages/mcp-sf-tools-server`
- canonical names only vs selected alias export
- single combined server vs separate “session” and “workflow” server modes
Exit criteria:
- MCP tool discovery shows the minimum tool set
- each MCP tool invokes the shared handlers successfully in isolation
### 3. Safety and Policy Parity
Goal: ensure MCP mutations enforce the same rules as native tool calls.
Targets:
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
- any current tool-call gating hooks tied to native runtime only
- MCP wrapper layer before shared handler invocation
Required protections:
- discussion gate blocking
- queue-mode restrictions
- write-path restrictions
- canonical DB/file rendering order
Exit criteria:
- MCP cannot be used to bypass native write restrictions
- blocked native scenarios remain blocked over MCP
### 4. Claude Code Provider Integration
Goal: attach the SF workflow-tool MCP surface to Claude Code sessions.
Targets:
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
- `src/resources/extensions/claude-code-cli/index.ts`
Expected work:
- build a SF-managed `mcpServers` config for the Claude SDK session
- attach the workflow MCP server only when the session requires SF tools
- keep current Claude Code streaming behavior intact
Exit criteria:
- Claude Code session can discover the SF workflow MCP tools
- task execution path can call `sf_task_complete` successfully
### 5. Capability Detection and Failure Path
Goal: refuse to start tool-dependent workflows when required capabilities are unavailable.
Targets:
- SF dispatch / auto-mode preflight
- provider selection and routing checks
- user-facing compatibility errors
Required behavior:
- if native SF tools are available, proceed
- else if SF workflow MCP tools are available, proceed
- else fail fast with a precise message
Exit criteria:
- no execution prompt is sent that requires unavailable tools
- users with only unsupported capability combinations get a hard error, not a fake fallback
### 6. Prompt and Documentation Alignment
Goal: keep the workflow contract strict while removing transport assumptions from docs and runtime messaging.
Targets:
- `src/resources/extensions/sf/prompts/execute-task.md`
- related planning/discuss prompts that reference tool availability
- provider and MCP docs
Rules:
- prompts should keep requiring canonical SF completion/planning tools
- prompts should not imply “native in-process tool only”
- docs should explain native vs MCP-backed fulfillment paths
Exit criteria:
- prompt contract matches runtime reality
- no provider is told to use a tool surface it cannot access
## Phase Plan
## Phase 1: Spike and Handler Extraction
Scope:
- extract shared logic for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`
- prove native wrappers still work
Why first:
- these tools are enough to test end-to-end completion semantics without migrating the full catalog
Verification:
- existing native tests still pass
- new unit tests cover shared handler entrypoints directly
## Phase 2: Minimal Workflow MCP Server
Scope:
- expose the three extracted tools over MCP
- ensure discovery schemas are clean and canonical
Verification:
- MCP discovery returns all three tools
- direct MCP calls succeed against a fixture project
## Phase 3: Claude Code End-to-End Proof
Scope:
- wire the minimal workflow MCP server into the Claude SDK session
- run a single execution path that ends with task completion
Verification:
- Claude Code can call `sf_task_complete`
- summary file, DB state, and plan checkbox update correctly
## Phase 4: Expand to Full Minimum Workflow Set
Scope:
- add planning, slice completion, milestone completion, roadmap reassessment, and gate result tools
Verification:
- discuss/plan/execute/complete lifecycle works over MCP for the supported flow set
## Phase 5: Capability Gating and UX Hardening
Scope:
- add preflight capability checks
- add clear error messaging for unsupported setups
Verification:
- unsupported provider/session combinations fail before execution starts
## Phase 6: Prompt and Doc Cleanup
Scope:
- align prompts and docs with the new transport-neutral contract
Verification:
- prompt references are accurate
- docs describe the supported architecture and limitations
## File-Level Starting Map
High-probability files for the first implementation:
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
- `src/resources/extensions/sf/tools/complete-task.ts`
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
- `src/resources/extensions/claude-code-cli/index.ts`
- `packages/mcp-server/src/server.ts`
- `packages/mcp-server/src/session-manager.ts`
- `packages/mcp-server/README.md`
- `src/resources/extensions/sf/prompts/execute-task.md`
## Testing Strategy
### Unit
- shared handlers
- MCP wrapper adapters
- gating / capability-check helpers
### Integration
- direct MCP tool invocation against fixture projects
- native tool invocation regression coverage
- Claude Code provider path with MCP attached
### End-to-End
- plan or execute a small fixture task and complete it through canonical SF tools
- confirm DB row, rendered summary, and plan state stay in sync
## Risks
### Risk 1: Logic Drift
If native and MCP wrappers each evolve their own behavior, parity will collapse quickly.
Mitigation:
- shared handler extraction before broad MCP exposure
### Risk 2: Safety Regression
If MCP becomes a side door around native gating, the architecture is worse than before.
Mitigation:
- centralize or reuse gating checks before shared handler invocation
### Risk 3: Overly Broad First Rollout
Exporting every tool and alias immediately increases scope and test burden.
Mitigation:
- ship a minimal workflow tool set first
### Risk 4: Claude SDK Session Wiring Complexity
Attaching MCP servers dynamically may expose edge cases around cwd, permissions, or subprocess lifecycle.
Mitigation:
- prove a narrow spike with 2-3 tools before expanding
## Exit Criteria for ADR-008
ADR-008 is considered implemented when:
1. Claude Code-backed execution can use canonical SF workflow tools over MCP.
2. Native provider behavior remains intact.
3. Shared handlers back both native and MCP invocation.
4. Gating and state integrity protections apply equally to MCP mutations.
5. Capability checks prevent prompts from requiring unavailable tools.
## Recommended Next Task
Start with a narrow spike:
1. Extract shared handlers for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`.
2. Expose those tools through a minimal workflow MCP server.
3. Attach that MCP server to Claude Code sessions.
4. Prove end-to-end task completion on a fixture project.
This document is kept only as a tombstone for the rejected implementation plan. Do not revive this direction.

View file

@ -1,246 +1,25 @@
# ADR-008: Expose SF Workflow Tools Over MCP for Provider Parity
# ADR-008: SF Tools Over MCP for Provider Parity
**Status:** Superseded — historical record
**Status:** Rejected — never implement
**Date:** 2026-04-09
**Superseded by:** `docs/adr/0000-purpose-to-software-compiler.md`, `docs/dev/ADR-020-internal-wire-architecture.md`
**Deciders:** Jeremy McSpadden
**Related:** ADR-004 (capability-aware model routing), ADR-007 (model catalog split and provider API encapsulation), `src/resources/extensions/sf/bootstrap/db-tools.ts`, `src/resources/extensions/claude-code-cli/stream-adapter.ts`, `packages/mcp-server/src/server.ts`
## Context
> Current boundary: SF does not expose its workflow as an MCP server for other
> clients. SF runs directly through `sf`/`/sf autonomous`; MCP is a client-side
> integration surface for external tools that SF may call. This ADR is preserved
> as historical context for a provider-parity idea that was not adopted.
SF currently has two different tool surfaces:
1. **In-process extension tools** registered directly into the runtime via `pi.registerTool(...)`.
2. **An external MCP server** that exposes session orchestration and read-only project inspection.
This split is now creating a real provider compatibility problem.
### What exists today
The core SF workflow tools are internal extension tools. Examples include:
- `sf_summary_save`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
- `sf_task_complete`
- `sf_slice_complete`
- `sf_complete_milestone`
- `sf_validate_milestone`
- `sf_replan_slice`
- `sf_reassess_roadmap`
These are registered in `src/resources/extensions/sf/bootstrap/db-tools.ts` and related bootstrap files. SF prompts assume these tools are available during discuss, plan, and execute flows.
Separately, `packages/mcp-server/src/server.ts` exposes a different tool surface:
- session control: `sf_execute`, `sf_status`, `sf_result`, `sf_cancel`, `sf_query`, `sf_resolve_blocker`
- read-only inspection: `sf_progress`, `sf_roadmap`, `sf_history`, `sf_doctor`, `sf_captures`, `sf_knowledge`
That MCP server is useful, but it is **not** a transport for the internal workflow/mutation tools.
### The current failure mode
The Claude Code CLI provider uses the Anthropic Agent SDK through `src/resources/extensions/claude-code-cli/stream-adapter.ts`. That adapter starts a Claude SDK session, but it does not forward the internal SF tool registry into the SDK session, nor does it attach a SF MCP server for those tools.
As a result:
- prompts tell the model to call tools like `sf_task_complete`
- the tools exist in SF
- but Claude Code sessions do not actually receive those tools
This produces a contract mismatch: the model is required to use tools that are unavailable in that provider path.
### Why this matters
This is not a one-off Claude Code bug. It reveals a deeper architectural issue:
- SFs core workflow contract is transport-specific
- prompt authors assume “internal extension tool availability”
- provider integrations do not all share the same execution surface
If SF wants provider parity, its workflow tools need a transport-neutral exposure model.
## Decision
**Expose the SF workflow tool contract over MCP as a first-class transport, and make MCP the compatibility layer for providers that cannot directly access the in-process SF tool registry.**
Never build or restore an SF MCP server package. SF is not an MCP server product and must not become one.
This means:
Current SF uses MCP only as a client-side integration surface for external tools that SF may call. SF workflow execution stays inside the `sf` runtime through in-process extension tools, daemon/RPC/headless paths, and DB-backed workflow state.
1. SF will keep its existing in-process tool registration for native runtime use.
2. SF will add an MCP execution surface for the same workflow tools.
3. Both surfaces must call the same underlying business logic.
4. Provider integrations such as Claude Code will use the MCP surface when they cannot access native in-process tools directly.
## Reason
The decision is explicitly **not** to replace the native tool system with MCP everywhere. MCP is the parity and portability layer, not the only runtime path.
The original ADR proposed exposing SF workflow tools through a new MCP server for provider parity. That creates another mutable workflow transport and another state-safety surface. The current architecture avoids that extra server boundary: planning, execution, validation, and completion write structured state to SQLite first and render markdown/JSON as projections.
## Decision Details
## Operational Rule
### 1. One handler layer, multiple transports
- Never recreate `packages/mcp-server`.
- Never add a `src/mcp-server.ts` workflow backend.
- Never document SF as an MCP server provider.
- Never add provider-parity work that depends on SF exposing itself over MCP.
- Keep `src/resources/extensions/mcp-client/` as the client integration boundary.
SF tool behavior must not be implemented twice.
The transport-neutral business logic for workflow tools should be shared by:
- native extension tool registration (`pi.registerTool(...)`)
- MCP server tool registration
The MCP server should wrap the same handlers used by `db-tools.ts`, `query-tools.ts`, and related modules. This avoids logic drift and keeps validation, DB writes, file rendering, and recovery behavior consistent.
### 2. Add a workflow-tool MCP surface
SF will expose the workflow tools required for discuss, planning, execution, and completion over MCP.
Initial minimum set:
- `sf_summary_save`
- `sf_decision_save`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
- `sf_task_complete`
- `sf_slice_complete`
- `sf_complete_milestone`
- `sf_validate_milestone`
- `sf_replan_slice`
- `sf_reassess_roadmap`
- `sf_save_gate_result`
- selected read/query tools such as `sf_milestone_status`
Aliases should be treated conservatively. MCP should prefer canonical names unless compatibility requires exposing aliases.
### 3. Preserve safety semantics
The current SF safety model includes write gates, discussion gates, queue-mode restrictions, and state integrity guarantees.
Those guarantees must continue to apply when tools are invoked over MCP. In particular:
- MCP must not create a path that bypasses write gating
- MCP mutations must preserve the same DB/file/state invariants as native tools
- provider-specific fallback behavior must not allow manual summary writing in place of canonical completion tools
### 4. Make provider capability checks explicit
Before dispatching a workflow that requires SF workflow tools, SF should check whether the selected provider/session can access the required tool surface.
If a provider cannot access either:
- native in-process SF tools, or
- the SF MCP workflow tool surface
then SF must fail early with a clear compatibility error rather than allowing execution to continue in a degraded, state-breaking mode.
### 5. Keep the existing session/read MCP server
The existing MCP server in `packages/mcp-server` remains valid. It serves a different purpose:
- remote session orchestration
- status/result polling
- filesystem-backed project inspection
The new workflow-tool MCP surface is complementary, not a replacement.
## Alternatives Considered
### Alternative A: Reroute away from Claude Code whenever tool-backed execution is needed
This would fix the immediate failure for multi-provider users, but it does not solve provider parity. It also fails completely for users who only have Claude Code configured.
**Rejected** because it treats the symptom, not the architectural gap.
### Alternative B: Hard-fail Claude Code and require another provider
This is a valid short-term guardrail and may still be used before MCP support is complete.
**Rejected as the long-term architecture** because it permanently excludes a supported provider from first-class SF execution.
### Alternative C: Inject the internal SF tool registry directly into the Claude Agent SDK without MCP
This would tightly couple SFs internal extension runtime to a provider-specific integration path. It would not generalize well to other providers or external tool clients.
**Rejected** because it creates a provider-specific bridge instead of a transport-neutral contract.
### Alternative D: Replace native SF tools entirely with MCP
This would simplify the conceptual model, but it would force all runtimes through an external protocol boundary even when the native in-process path is faster and already works well.
**Rejected** because MCP is needed for portability, not because the native tool system is flawed.
## Consequences
### Positive
1. **Provider parity improves.** Providers that can consume MCP tools can participate in full SF workflow execution.
2. **The workflow contract becomes transport-neutral.** Prompts can rely on capabilities rather than a specific runtime implementation detail.
3. **One compatibility story for external clients.** Claude Code, Cursor, and other MCP-capable clients can use the same workflow tool surface.
4. **Better long-term architecture.** Internal tools and external transports converge on shared handlers instead of diverging implementations.
### Negative
1. **Larger surface area to secure and test.** Mutation tools over MCP are higher risk than read-only inspection tools.
2. **Migration complexity.** Tool registration, gating, and handler extraction must be refactored carefully.
3. **Two transport paths must remain aligned.** Native and MCP invocation semantics must stay behaviorally identical.
### Neutral / Tradeoff
The system will now support:
- native in-process tool execution when available
- MCP-backed tool execution when native access is unavailable
That is more complex than a single-path system, but it is the cost of provider portability without sacrificing native runtime quality.
## Migration Plan
### Phase 1: Extract shared handlers
Refactor workflow tools so MCP and native registration can call the same transport-neutral functions.
Priority targets:
- `sf_summary_save`
- `sf_task_complete`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
### Phase 2: Stand up the workflow-tool MCP server
Add a new MCP surface for workflow tool execution. This may extend the existing MCP package or live as a sibling package, but it must be clearly separated from the current session/read API.
### Phase 3: Port safety enforcement
Move or centralize write gates and related policy checks so MCP mutations cannot bypass the existing safety model.
### Phase 4: Attach MCP workflow tools to Claude Code sessions
Update the Claude Code provider integration to pass a SF-managed `mcpServers` configuration into the Claude Agent SDK session when required.
### Phase 5: Add provider capability gating
Before tool-dependent flows begin, verify that the active provider can access the required SF workflow tools via either native registration or MCP.
### Phase 6: Update prompts and docs
Prompt contracts should remain strict about using canonical SF completion/planning tools, but documentation and runtime messaging must no longer assume that only native in-process tool registration satisfies that contract.
## Validation
Success is defined by all of the following:
1. A Claude Code-backed execution session can complete a task using canonical SF workflow tools without manual summary writing.
2. Native provider behavior remains unchanged.
3. MCP-invoked workflow tools produce the same DB updates, rendered artifacts, and state transitions as native tool calls.
4. Write-gate and discussion-gate protections still hold under MCP invocation.
5. When required capabilities are unavailable, SF fails early with a precise compatibility error.
## Scope Notes
This ADR establishes the architectural direction. It does **not** require full MCP exposure of every historical alias or every auxiliary tool in the first implementation.
The first implementation should prioritize the minimum workflow tool set needed to make discuss/plan/execute/complete flows work safely for MCP-capable providers.
If a future integration needs external control of SF, use the daemon/RPC/headless contract. MCP remains for SF-as-client only.

View file

@ -9,7 +9,7 @@ sf today bundles its TUI directly in core: `pi-tui` (~10.5k LOC of TypeScript) i
Three forces argue for extracting the TUI:
1. **sf is becoming truly headless-first**`packages/daemon`, `packages/rpc-client`, `packages/mcp-server` already exist. CLI invocations talk to the daemon. sf can be called as an MCP backend by Claude Code, Cursor, Hermes — they're TUI-agnostic. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
1. **sf is becoming truly headless-first**`packages/daemon` and `packages/rpc-client` already exist. CLI invocations talk to the daemon. SF uses MCP as a client integration surface for external tools, not as an SF workflow server. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
2. **The Charm TUI stack is dramatically more capable than what `pi-tui` builds today.** `bubbletea` + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` + `x/mosaic` (image rendering) + `x/vcr` (recording) + `pony` + `ultraviolet` (declarative markup) compose to far better UX than reproducing in TS would.
3. **Removing `pi-tui` from sf core deletes ~10k LOC of TS** — leaner core, fewer TUI-coupled assumptions in `pi-coding-agent`, cleaner test surface.
@ -42,7 +42,7 @@ This ADR plans the extraction.
- **sf core gets ~10k LOC leaner** after Stage 2.
- **Charm stack quality** comes for free — animations (`harmonica`), inline images (`x/mosaic`), markdown (`glamour`), forms (`huh`), recording (`x/vcr`).
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + MCP + clients. No TUI coupled to engine.
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + clients, with MCP client integration for external tools. No TUI coupled to engine.
- **Remote TUI for free** — once the client is Wish-served (could be a v3.x extension), `tailscale ssh aidev sf` opens a full TUI session over SSH. Today's `pi-tui` is local-process only.
- **Recordings of TUI sessions** — flight recorder (ADR-015) integrates with the Charm TUI naturally; `pi-tui` would need separate work to support this.
@ -85,7 +85,7 @@ Total: ~1216 weeks across stages.
## References
- `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` — already exist; this ADR makes them load-bearing for clients.
- `packages/daemon`, `packages/rpc-client` — already exist; this ADR makes them load-bearing for clients.
- `packages/pi-tui` — the existing TUI being deprecated.
- `ADR-013` — Network: future SSH-served TUI via `wish` rides the same substrate.
- `ADR-015` — Flight recorder: `sf-tui` records its sessions naturally.

View file

@ -36,7 +36,7 @@
| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
| **LSP** | Language Server Protocol client and multiplexer |
| **Mac Tools** | macOS-native utilities (Swift CLI) |
| **MCP Server/Client** | Model Context Protocol server and client |
| **MCP Client** | Model Context Protocol client integration for external tools |
| **Memory Extension** | In-session memory pipeline and storage |
| **Migration** | Data and config migration tools |
| **Modes** | Interactive TUI, Print, RPC, and Web modes |
@ -87,7 +87,6 @@
| src/help-text.ts | CLI | Generates help text for all subcommands |
| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients |
| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF |
| src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
| src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to SF |
@ -388,9 +387,9 @@
|------|-----------------|-------------|
| modes/index.ts | Modes | Mode system exports |
| modes/print-mode.ts | Modes | Non-interactive print mode |
| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access |
| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction |
| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions |
| modes/rpc/rpc-mode.ts | Modes, RPC | RPC server mode for remote access |
| modes/rpc/rpc-client.ts | Modes, RPC | RPC client for remote agent interaction |
| modes/rpc/rpc-types.ts | Modes, RPC | RPC protocol type definitions |
| modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
| modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
| modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
@ -610,7 +609,7 @@
| remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
| remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
| remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration |
| mcp-client/index.ts | MCP Client | Model Context Protocol client integration |
| subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
| subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
| subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
@ -826,7 +825,7 @@
| File | System Label(s) | Description |
|------|-----------------|-------------|
| vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
| vscode-extension/src/sf-client.ts | VS Code Extension, MCP Server/Client | RPC client for SF agent communication |
| vscode-extension/src/sf-client.ts | VS Code Extension, RPC | RPC client for SF agent communication |
| vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @sf command |
| vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |
@ -975,7 +974,7 @@ Quick lookup: which files are part of each system?
| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* |
| **LSP** | pi-coding-agent/src/core/lsp/* |
| **Mac Tools** | src/resources/extensions/mac-tools/* |
| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/sf-client.ts, modes/rpc/* |
| **MCP Client** | src/resources/extensions/mcp-client/index.ts |
| **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
| **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
| **Modes** | pi-coding-agent/src/modes/* |

View file

@ -34,10 +34,10 @@ These were not clearly represented as durable roadmap items and should be planne
| Item | Why | Suggested tier | Implementation note |
|---|---|---|---|
| Typed SF environment schema | `SF_*` env vars should fail early with actionable diagnostics instead of late runtime surprises. | Tier 1 | Add an SF-owned env schema module and route startup/tool validation through it. |
| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Start with file-family thresholds or focused test suites for dispatch, recovery, UOK runtime, and validation. |
| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Tier 2 | Add a minimal lifecycle fixture that exercises DB rows as executable truth. |
| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Started with focused DB-authority/UOK runtime suites; continue with dispatch and recovery families before changing global thresholds. |
| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Done | Added runtime-state regression coverage proving SQLite slice/task order stays authoritative over stale markdown/JSON projections, and DB-backed runtime refuses implicit roadmap, plan, and summary imports. |
| Fault-injection recovery tests | Stuck-loop, timeout, runaway, stale lock, and projection drift recovery are high-risk paths. | Tier 2 | Add deterministic fault fixtures before adding broader chaos coverage. |
| MCP package completeness audit | Docs mention MCP surfaces, but production completeness is unclear. | Tier 2 | Inspect `packages/mcp-server/`, record supported contracts, gaps, and deferred work. |
| MCP server residue/docs cleanup | SF currently ships the MCP client extension only; tracked MCP server source was removed. | Done | Removed untracked `packages/mcp-server/` residue and updated durable docs so future work never recreates an SF MCP server. |
| Biome schema version cleanup | Tooling drift creates noisy lint/config failures. | Tier 3 | Run `biome migrate` as a focused tooling cleanup. |
| Headless assistant-text preview completion | Prior headless work deferred buffer separation. | Tier 2 | Finish `assistantTextBuffer` / `thinkingBuffer` separation and preview flushing. |

View file

@ -56,6 +56,36 @@ export interface HeadlessHeartbeatContext {
lastEventDetail?: string;
}
export interface AssistantPreviewDelta {
kind: "text" | "thinking";
text: string;
}
/**
* Extract a concise preview delta from an assistant message update.
*
* Purpose: keep headless non-verbose output honest by separating assistant text
* from model thinking before both are flushed at tool starts and message end.
*
* Consumer: headless.ts streaming event loop and headless progress tests.
*/
export function extractAssistantPreviewDelta(
assistantMessageEvent: unknown,
): AssistantPreviewDelta | null {
if (!assistantMessageEvent || typeof assistantMessageEvent !== "object") {
return null;
}
const event = assistantMessageEvent as Record<string, unknown>;
const type = String(event.type ?? "");
if (type !== "text_delta" && type !== "thinking_delta") return null;
const text = String(event.delta ?? event.text ?? "");
if (!text) return null;
return {
kind: type === "thinking_delta" ? "thinking" : "text",
text,
};
}
// ---------------------------------------------------------------------------
// ANSI Color Helpers
// ---------------------------------------------------------------------------

View file

@ -60,6 +60,7 @@ import type { HeadlessJsonResult, OutputFormat } from "./headless-types.js";
import { VALID_OUTPUT_FORMATS } from "./headless-types.js";
import type { ExtensionUIRequest, ProgressContext } from "./headless-ui.js";
import {
extractAssistantPreviewDelta,
formatHeadlessHeartbeat,
formatProgress,
formatPromptTraceLines,
@ -1440,9 +1441,15 @@ async function runHeadlessOnce(
}
}
}
// Non-verbose: accumulate text_delta for truncated one-liner
else if (ame?.type === "text_delta") {
assistantTextBuffer += String(ame.delta ?? ame.text ?? "");
// Non-verbose: accumulate separated thinking/text previews for
// truncated one-liners before tool calls and message end.
else {
const previewDelta = extractAssistantPreviewDelta(ame);
if (previewDelta?.kind === "text") {
assistantTextBuffer += previewDelta.text;
} else if (previewDelta?.kind === "thinking") {
thinkingBuffer += previewDelta.text;
}
}
}

View file

@ -41,7 +41,11 @@ function formatAggregateModelIdentity(modelId) {
});
}
/**
* Record a unit outcome to the llm_task_outcomes table for Bayesian learning.
* Record a unit outcome to the llm_task_outcomes table for Bayesian learning,
* and also to model-learner for per-task-type performance tracking.
*
* Integration point for quick win #2: model-learner activates continuous
* per-task-type model performance tracking for adaptive routing decisions.
*/
async function recordUnitOutcome(unit) {
const db = getDatabase();
@ -54,7 +58,7 @@ async function recordUnitOutcome(unit) {
// drop bare-id entries rather than guessing the provider.
if (slashIdx === -1) return;
const provider = modelId.slice(0, slashIdx);
recordOutcome(db, {
const outcome = {
modelId,
provider,
unitType: unit.type,
@ -68,7 +72,24 @@ async function recordUnitOutcome(unit) {
tokens_total: unit.tokens.total,
cost_usd: unit.cost,
recorded_at: unit.startedAt,
});
};
// Record to UOK llm_task_outcomes table
recordOutcome(db, outcome);
// Quick Win #2: Also record to model-learner for per-task-type tracking
try {
const { ModelLearner } = await import("./model-learner.js");
const learner = new ModelLearner(basePath);
learner.recordOutcome(unit.type, modelId, {
success: true,
timeout: false,
tokensUsed: unit.tokens.total,
costUsd: unit.cost,
});
} catch {
/* model-learner integration is optional; never block outcome recording */
}
} catch {
/* fire-and-forget */
}

View file

@ -37,17 +37,12 @@ import {
getReplanHistory,
getSlice,
getSliceTasks,
insertMilestone,
insertSlice,
insertTask,
isDbAvailable,
updateSliceStatus,
updateTaskStatus,
wasDbOpenAttempted,
} from "./sf-db.js";
import { isClosedStatus, isDeferredStatus } from "./status-guards.js";
import { extractVerdict } from "./verdict-parser.js";
import { logError, logWarning } from "./workflow-logger.js";
import { logWarning } from "./workflow-logger.js";
/**
* A "ghost" milestone directory contains only META.json (and no substantive
* files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY). These appear when
@ -219,8 +214,7 @@ export async function getActiveMilestoneId(basePath) {
* STATE.md is a rendered cache of this output.
*
* When DB is available, queries milestone/slice/task tables directly.
* Falls back to filesystem parsing for unmigrated projects or when DB
* has zero milestones (e.g. first run before migration).
* Falls back to filesystem parsing only when DB is unavailable.
*/
export async function deriveState(basePath) {
// Return cached result if within the TTL window for the same basePath
@ -235,22 +229,6 @@ export async function deriveState(basePath) {
let result;
// Dual-path: try DB-backed derivation first when hierarchy tables are populated
if (isDbAvailable()) {
let dbMilestones = getAllMilestones();
// Disk→DB reconciliation when DB is empty but disk has milestones (#2631).
// deriveStateFromDb() does its own reconciliation, but deriveState() skips
// it entirely when the DB is empty. Sync here so the DB path is used when
// disk milestones exist but haven't been migrated yet.
if (dbMilestones.length === 0) {
const diskIds = findMilestoneIds(basePath);
let synced = false;
for (const diskId of diskIds) {
if (!isGhostMilestone(basePath, diskId)) {
insertMilestone({ id: diskId, status: "active" });
synced = true;
}
}
if (synced) dbMilestones = getAllMilestones();
}
const stopDbTimer = debugTime("derive-state-db");
result = await deriveStateFromDb(basePath);
stopDbTimer({
@ -300,91 +278,29 @@ function extractContextTitle(content, fallback) {
const isStatusDone = isClosedStatus;
/**
* Derive SF state from the milestones/slices/tasks DB tables.
* Flag files (PARKED, VALIDATION, CONTINUE, REPLAN, REPLAN-TRIGGER, CONTEXT-DRAFT)
* are still checked on the filesystem since they aren't in DB tables.
* Non-planning control files (PARKED, CONTINUE, REPLAN, REPLAN-TRIGGER,
* CONTEXT-DRAFT) are still checked on the filesystem since they are not
* hierarchy state.
* Requirements also stay file-based via parseRequirementCounts().
*
* Must produce field-identical SFState to _deriveStateImpl() for the same project.
* Must not import rendered roadmap, plan, or summary artifacts into DB-backed
* runtime state. Explicit migration/repair flows own any legacy file import.
*/
function reconcileDiskToDb(basePath) {
let allMilestones = getAllMilestones();
const dbIdSet = new Set(allMilestones.map((m) => m.id));
const diskIds = findMilestoneIds(basePath);
let synced = false;
for (const diskId of diskIds) {
if (!dbIdSet.has(diskId) && !isGhostMilestone(basePath, diskId)) {
insertMilestone({ id: diskId, status: "active" });
synced = true;
}
}
if (synced) allMilestones = getAllMilestones();
for (const mid of diskIds) {
if (isGhostMilestone(basePath, mid)) continue;
const roadmapPath = resolveMilestoneFile(basePath, mid, "ROADMAP");
if (!roadmapPath) continue;
const dbSlices = getMilestoneSlices(mid);
const dbSliceIds = new Set(dbSlices.map((s) => s.id));
let roadmapContent;
try {
roadmapContent = readFileSync(roadmapPath, "utf-8");
} catch (err) {
if (diskIds.length > 0) {
const dbIds = new Set(getAllMilestones().map((m) => m.id));
const diskOnlyIds = diskIds.filter(
(id) => !dbIds.has(id) && !isGhostMilestone(basePath, id),
);
if (diskOnlyIds.length > 0) {
logWarning(
"state",
"reconcileDiskToDb: roadmap read failed, skipping milestone",
{
mid,
error: err.message,
},
`DB-backed state ignored ${diskOnlyIds.length} disk-only milestone(s): ${diskOnlyIds.join(", ")}`,
);
continue;
}
const parsed = parseRoadmap(roadmapContent);
for (const s of parsed.slices) {
if (dbSliceIds.has(s.id)) continue;
const summaryPath = resolveSliceFile(basePath, mid, s.id, "SUMMARY");
const sliceStatus = s.done || summaryPath ? "complete" : "pending";
insertSlice({
id: s.id,
milestoneId: mid,
title: s.title,
status: sliceStatus,
risk: s.risk,
depends: s.depends,
demo: s.demo,
});
}
// Reconcile stale *existing* slice rows (#3599): a slice row may exist in
// the DB with status "pending" even though disk artifacts (SUMMARY) prove
// completion — the same class of desync that task-level reconciliation
// (further below) already handles. Without this, the dependency resolver
// builds doneSliceIds from stale DB rows and downstream slices stay blocked
// forever with "No slice eligible".
for (const dbSlice of dbSlices) {
if (isStatusDone(dbSlice.status)) continue;
const summaryPath = resolveSliceFile(
basePath,
mid,
dbSlice.id,
"SUMMARY",
);
if (summaryPath) {
try {
updateSliceStatus(mid, dbSlice.id, "complete");
logWarning(
"reconcile",
`slice ${mid}/${dbSlice.id} status reconciled from "${dbSlice.status}" to "complete" (#3599)`,
{ mid, sid: dbSlice.id },
);
} catch (e) {
logError("reconcile", `failed to update slice ${dbSlice.id}`, {
sid: dbSlice.id,
error: e.message,
});
}
}
}
}
return allMilestones;
return getAllMilestones();
}
function buildCompletenessSet(basePath, milestones) {
const completeMilestoneIds = new Set();
@ -727,47 +643,14 @@ function resolveSliceDependencies(activeMilestoneSlices) {
return { activeSlice: null, activeSliceRow: null };
}
async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
let tasks = getSliceTasks(milestoneId, sliceId);
const tasks = getSliceTasks(milestoneId, sliceId);
if (tasks.length === 0 && planFile) {
try {
const planContent = await loadFile(planFile);
if (planContent) {
const diskPlan = parsePlan(planContent);
if (diskPlan.tasks.length > 0) {
for (let i = 0; i < diskPlan.tasks.length; i++) {
const t = diskPlan.tasks[i];
try {
insertTask({
id: t.id,
sliceId,
milestoneId,
title: t.title,
status: t.done ? "complete" : "pending",
sequence: i + 1,
});
} catch (insertErr) {
logWarning(
"reconcile",
`failed to insert task ${t.id} from plan file: ${insertErr instanceof Error ? insertErr.message : String(insertErr)}`,
);
}
}
tasks = getSliceTasks(milestoneId, sliceId);
logWarning(
"reconcile",
`imported ${tasks.length} tasks from plan file for ${milestoneId}/${sliceId} — DB was empty (#3600)`,
{ mid: milestoneId, sid: sliceId },
);
}
}
} catch (err) {
logError(
"reconcile",
`plan-file task import failed for ${milestoneId}/${sliceId}: ${err instanceof Error ? err.message : String(err)}`,
);
}
logWarning(
"reconcile",
`slice plan file exists for ${milestoneId}/${sliceId}, but DB has no task rows; refusing runtime import`,
{ mid: milestoneId, sid: sliceId },
);
}
let reconciled = false;
for (const t of tasks) {
if (isStatusDone(t.status)) continue;
const summaryPath = resolveTaskFile(
@ -778,41 +661,13 @@ async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
"SUMMARY",
);
if (summaryPath && existsSync(summaryPath)) {
// Validate that the summary file has actual content (#sf-moobj36o-6rxy6e)
const summaryContent = readFileSync(summaryPath, "utf-8");
if (!isValidTaskSummary(summaryContent)) {
logWarning(
"reconcile",
`task ${milestoneId}/${sliceId}/${t.id} has empty/invalid SUMMARY — skipping reconciliation`,
{ mid: milestoneId, sid: sliceId, tid: t.id },
);
continue;
}
try {
updateTaskStatus(
milestoneId,
sliceId,
t.id,
"complete",
new Date().toISOString(),
);
logWarning(
"reconcile",
`task ${milestoneId}/${sliceId}/${t.id} status reconciled from "${t.status}" to "complete" (#2514)`,
{ mid: milestoneId, sid: sliceId, tid: t.id },
);
reconciled = true;
} catch (e) {
logError("reconcile", `failed to update task ${t.id}`, {
tid: t.id,
error: e.message,
});
}
logWarning(
"reconcile",
`task ${milestoneId}/${sliceId}/${t.id} has SUMMARY on disk but DB status is "${t.status}"; refusing runtime status import`,
{ mid: milestoneId, sid: sliceId, tid: t.id },
);
}
}
if (reconciled) {
tasks = getSliceTasks(milestoneId, sliceId);
}
return tasks;
}
async function detectBlockers(basePath, milestoneId, sliceId, tasks) {

View file

@ -0,0 +1,241 @@
/**
* db-driven-runtime-state.test.mjs DB authority for runtime state.
*
* Purpose: prove DB-backed projects choose active slices/tasks from structured
* SQLite rows even when generated markdown/JSON projections are stale.
*/
import assert from "node:assert/strict";
import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs";
import { tmpdir } from "node:os";
import { join } from "node:path";
import { afterEach, test } from "vitest";
import {
closeDatabase,
getAllMilestones,
getSliceTasks,
insertMilestone,
insertSlice,
insertTask,
openDatabase,
} from "../sf-db.js";
import { deriveState, invalidateStateCache } from "../state.js";
const tmpDirs = [];
afterEach(() => {
closeDatabase();
invalidateStateCache();
while (tmpDirs.length > 0) {
const dir = tmpDirs.pop();
if (dir) rmSync(dir, { recursive: true, force: true });
}
});
function makeProject() {
const dir = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
tmpDirs.push(dir);
mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S01"), {
recursive: true,
});
mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S02"), {
recursive: true,
});
openDatabase(join(dir, ".sf", "sf.db"));
insertMilestone({
id: "M777",
title: "DB runtime authority",
status: "active",
});
insertSlice({
milestoneId: "M777",
id: "S02",
title: "DB first slice",
status: "pending",
sequence: 1,
});
insertSlice({
milestoneId: "M777",
id: "S01",
title: "Stale file first slice",
status: "pending",
sequence: 2,
});
insertTask({
milestoneId: "M777",
sliceId: "S02",
id: "T02",
title: "DB first task",
status: "pending",
sequence: 1,
});
insertTask({
milestoneId: "M777",
sliceId: "S02",
id: "T01",
title: "DB second task",
status: "pending",
sequence: 2,
});
return dir;
}
test("deriveState_when_generated_projections_are_stale_uses_db_slice_and_task_sequence", async () => {
const project = makeProject();
const milestoneDir = join(project, ".sf", "milestones", "M777");
writeFileSync(
join(milestoneDir, "M777-ROADMAP.md"),
[
"# M777: stale generated roadmap",
"",
"## Slice Overview",
"| ID | Slice | Risk | Depends | Done | After this |",
"|----|-------|------|---------|------|------------|",
"| S01 | Stale file first slice | low | - | | stale first |",
"| S02 | DB first slice | low | - | | should still be first by DB |",
"",
].join("\n"),
);
writeFileSync(
join(milestoneDir, "M777-ROADMAP.json"),
JSON.stringify(
{
origin: "stale-test-projection",
slices: [
{ id: "S01", title: "Stale file first slice", sequence: 1 },
{ id: "S02", title: "DB first slice", sequence: 2 },
],
},
null,
2,
),
);
writeFileSync(
join(milestoneDir, "slices", "S02", "S02-PLAN.md"),
[
"# S02: stale generated plan",
"",
"## Tasks",
"",
"- [ ] T99: stale task that should not replace DB rows",
"",
].join("\n"),
);
const state = await deriveState(project);
assert.equal(state.phase, "executing");
assert.equal(state.activeMilestone?.id, "M777");
assert.equal(state.activeSlice?.id, "S02");
assert.equal(state.activeTask?.id, "T02");
});
test("deriveState_when_db_has_no_tasks_refuses_runtime_plan_file_import", async () => {
const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
tmpDirs.push(project);
mkdirSync(join(project, ".sf", "milestones", "M778", "slices", "S01"), {
recursive: true,
});
openDatabase(join(project, ".sf", "sf.db"));
insertMilestone({
id: "M778",
title: "DB planning authority",
status: "active",
});
insertSlice({
milestoneId: "M778",
id: "S01",
title: "Plan exists without DB tasks",
status: "pending",
sequence: 1,
});
writeFileSync(
join(project, ".sf", "milestones", "M778", "M778-ROADMAP.md"),
[
"# M778: DB planning authority",
"",
"## Slice Overview",
"| ID | Slice | Risk | Depends | Done | After this |",
"|----|-------|------|---------|------|------------|",
"| S01 | Plan exists without DB tasks | low | - | | keep planning |",
"",
].join("\n"),
);
writeFileSync(
join(project, ".sf", "milestones", "M778", "slices", "S01", "S01-PLAN.md"),
[
"# S01: stale generated plan",
"",
"## Tasks",
"",
"- [ ] T01: stale file task",
"",
].join("\n"),
);
const state = await deriveState(project);
assert.equal(state.phase, "planning");
assert.equal(state.activeMilestone?.id, "M778");
assert.equal(state.activeSlice?.id, "S01");
assert.equal(state.activeTask, null);
assert.equal(getSliceTasks("M778", "S01").length, 0);
assert.match(state.nextAction, /has a plan file but no tasks/i);
});
test("deriveState_when_db_has_no_milestones_refuses_runtime_roadmap_import", async () => {
const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
tmpDirs.push(project);
const milestoneDir = join(project, ".sf", "milestones", "M779");
mkdirSync(milestoneDir, { recursive: true });
openDatabase(join(project, ".sf", "sf.db"));
writeFileSync(
join(milestoneDir, "M779-ROADMAP.md"),
[
"# M779: stale disk-only roadmap",
"",
"## Slice Overview",
"| ID | Slice | Risk | Depends | Done | After this |",
"|----|-------|------|---------|------|------------|",
"| S01 | Should not import | low | - | | no DB mutation |",
"",
].join("\n"),
);
const state = await deriveState(project);
assert.equal(state.phase, "pre-planning");
assert.equal(state.activeMilestone, null);
assert.deepEqual(getAllMilestones(), []);
});
test("deriveState_when_task_summary_exists_keeps_db_task_status_authoritative", async () => {
const project = makeProject();
const taskDir = join(
project,
".sf",
"milestones",
"M777",
"slices",
"S02",
"tasks",
);
mkdirSync(taskDir, { recursive: true });
writeFileSync(
join(taskDir, "T02-SUMMARY.md"),
[
"# T02: stale summary projection",
"",
"This file is not DB state.",
"",
].join("\n"),
);
const state = await deriveState(project);
const [firstTask] = getSliceTasks("M777", "S02");
assert.equal(state.phase, "executing");
assert.equal(state.activeSlice?.id, "S02");
assert.equal(state.activeTask?.id, "T02");
assert.equal(firstTask.id, "T02");
assert.equal(firstTask.status, "pending");
});

View file

@ -184,13 +184,16 @@ export function parseTriageReport(agentOutput) {
* Idempotent: skips rows whose ID already appears in the file.
* 2. Call markResolved for each resolution in the report.
* Idempotent: markResolved itself skips already-resolved entries.
* 3. Quick Win #1: Auto-fix high-confidence self-reports where confidence > 0.85.
*
* @param basePath - project root (directory containing .sf/)
* @param report - parsed TriageReport from parseTriageReport()
*/
export function applyTriageReport(basePath, report) {
export async function applyTriageReport(basePath, report) {
let requirementsAdded = 0;
let entriesResolved = 0;
let reportsAutoFixed = 0;
// ── 1. Write promoted requirements ────────────────────────────────────────
if (report.promotedRequirements.length > 0) {
const sfDir = sfRoot(basePath);
@ -244,6 +247,7 @@ export function applyTriageReport(basePath, report) {
}
writeFileSync(reqPath, content, "utf-8");
}
// ── 2. Resolve entries ────────────────────────────────────────────────────
for (const resolution of report.resolutions) {
const evidenceKind = resolution.evidenceKind;
@ -258,5 +262,26 @@ export function applyTriageReport(basePath, report) {
);
if (mutated) entriesResolved++;
}
return { requirementsAdded, entriesResolved };
// ── 3. Quick Win #1: Auto-fix high-confidence self-reports ────────────────
// Integration point for self-report-fixer: read open reports and auto-apply
// fixes where confidence > 0.85.
try {
const { autoFixHighConfidenceReports } = await import(
"./self-report-fixer.js"
);
const allOpen = [
...readAllSelfFeedback(basePath),
...readUpstreamSelfFeedback(),
].filter((e) => !e.resolvedAt);
if (allOpen.length > 0) {
const result = await autoFixHighConfidenceReports(basePath, allOpen);
reportsAutoFixed = result.applied.length;
}
} catch {
/* self-report fixer is optional; never block triage report application */
}
return { requirementsAdded, entriesResolved, reportsAutoFixed };
}

View file

@ -2,6 +2,7 @@ import assert from "node:assert/strict";
import { describe, it } from "vitest";
import type { ProgressContext } from "../headless-ui.js";
import {
extractAssistantPreviewDelta,
formatCostLine,
formatHeadlessHeartbeat,
formatProgress,
@ -556,6 +557,31 @@ describe("formatThinkingLine", () => {
});
});
describe("extractAssistantPreviewDelta", () => {
it("separates assistant text deltas from thinking deltas", () => {
assert.deepEqual(
extractAssistantPreviewDelta({
type: "text_delta",
delta: "I will edit the file.",
}),
{ kind: "text", text: "I will edit the file." },
);
assert.deepEqual(
extractAssistantPreviewDelta({
type: "thinking_delta",
delta: "Need inspect first.",
}),
{ kind: "thinking", text: "Need inspect first." },
);
});
it("ignores non-preview assistant events", () => {
assert.equal(extractAssistantPreviewDelta({ type: "text_start" }), null);
assert.equal(extractAssistantPreviewDelta({ type: "thinking_end" }), null);
assert.equal(extractAssistantPreviewDelta(null), null);
});
});
describe("formatCostLine", () => {
it("formats cost with token count", () => {
const result = formatCostLine(0.0523, 4200, 1100);