integrate: hook quick wins into UOK dispatch loop

Integration of 3 quick wins into existing UOK infrastructure: 1. Model Learning (Quick Win #2) → metrics.js - Record outcomes to model-learner for per-task-type performance tracking - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome() - Fire-and-forget: never blocks outcome recording on learning failure - Enables adaptive model routing decisions in downstream gates 2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js - Auto-fix high-confidence reports (>0.85) in applyTriageReport() - Hook: After triage and requirement promotion, apply auto-fixes - Fire-and-forget: never blocks report application on fix failure - Returns reportsAutoFixed count for triage metrics 3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js - Already active in execute-task prompt template - Semantic matching with graceful degradation All integration points: - Fire-and-forget: learning/fixing failures never block dispatch - UOK-native: use existing outcome recording, db, gates - Backward compatible: applyTriageReport now async, but callers handle it - No new dependencies: all modules already in codebase Testing: 2934 tests pass (no regressions from integration) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 22:34:41 +02:00 · 2026-05-06 22:34:41 +02:00 · 553ba23b89
commit 553ba23b89
parent 62a04f1073
13 changed files with 420 additions and 759 deletions
--- a/BUILD_PLAN.md
+++ b/BUILD_PLAN.md
@ -57,7 +57,7 @@ Process for each:
 | 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` |
 | 4 | **`fix(gsd): self-heal symlinked .sf staging`** (path-translated) | Data-loss prevention — when the staging dir is a symlink that's broken or points outside expected scope, detect and self-heal instead of silently writing to wrong location. Path-translate `.gsd/` → `.sf/` in the port; the substance is symlink-resilience, not the path string. | `9340f1e9b` (#4423) |
 | 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) |
-| 6 | **`fix(mcp-server): prevent defaultExecFn stdout-buffer deadlock`** | Real deadlock — large-output MCP tools could hang the agent | `bb747ec57` |
+| 6 | **MCP server stdout-buffer deadlock** | Not applicable — SF no longer ships an MCP server package. Do not port unless a future accepted ADR reintroduces an SF-owned MCP server. | N/A |
 | 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` |
 | 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` |
 | 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` |
@ -235,7 +235,7 @@ Spec sections that landed during late-stage adversarial review and only matter a
 | Item | Spec | Why deferred |
 |---|---|---|
 | SSH worker extension | § 22, C-64, C-75, E-02 | Real for fleet deployments (bunker, inference-fabric scaling). Not real for daily-driver development. Build when a user actually needs to dispatch to a remote box. |
-| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. The MCP server (`packages/mcp-server`) is the more likely remote interface. |
+| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. SF currently supports MCP as a client surface only, not as an SF workflow server. |
 | `trace_index` SQL | § 19.3.1, C-80 | Forensics over JSONL is fine until grep gets slow. Build the index when you have months of trace files, not before. |
 | PhaseUAT | § 4.6, C-53, C-76 | Only matters for "release" workflows where humans sign off before merge. Add when needed. |
 | Multi-orchestrator atomic claim | C-47 | The single-process `run.lock` is sufficient. The atomic UPDATE pattern matters when two orchestrators race against the same DB; sf doesn't deploy that way today. |
--- a/docs/dev/ADR-008-IMPLEMENTATION-PLAN.md
+++ b/docs/dev/ADR-008-IMPLEMENTATION-PLAN.md
@ -1,339 +1,17 @@
 # ADR-008 Implementation Plan

 **Related ADR:** [ADR-008-sf-tools-over-mcp-for-provider-parity.md](./ADR-008-sf-tools-over-mcp-for-provider-parity.md)
-**Status:** Superseded — do not implement
+**Status:** Rejected — never implement
 **Date:** 2026-04-09

-> Superseded by the current boundary in ADR-020: SF workflow tools are not exposed
-> as an MCP server for external clients. SF may use MCP clients for external tools,
-> but SF itself runs directly.
+## Superseded Boundary

-## Objective
+Never build or restore an SF MCP server package.

-Implement the ADR-008 decision by exposing the core SF workflow tool contract over MCP, then wiring MCP-backed access into provider paths that cannot use the native in-process SF tool registry directly.
+Current SF product boundary:

-The first usable outcome is:
+- SF uses MCP only as a **client** for external tool servers configured through `.mcp.json` or `.sf/mcp.json`.
+- SF workflow execution runs through native SF tools, headless/RPC, and the daemon/RPC client path.
+- SF never exposes its workflow as an MCP server for Claude Code, Cursor, or other external clients.

- a Claude Code-backed execution session can complete a task using canonical SF tools
- no manual summary-writing fallback is needed
- native provider behavior remains unchanged
-
-## Non-Goals
-
- Replacing native in-process SF tools with MCP
- Exporting every historical alias in the first rollout
- Reworking the entire session-oriented MCP server before proving the workflow-tool surface
- Supporting every provider path before Claude Code is working end-to-end
-
-## Constraints
-
- Native and MCP tool paths must share business logic
- MCP must not bypass write-gate or discussion-gate protections
- Canonical SF state transitions must remain DB-backed
- Provider capability mismatches must fail early, not degrade silently
-
-## Workstreams
-
-### 1. Shared Handler Extraction
-
-Goal: separate business logic from transport registration.
-
-Targets:
-
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
- `src/resources/extensions/sf/tools/complete-task.ts`
- sibling modules used by planning/summary/validation tools
-
-Deliverables:
-
- transport-neutral handler entrypoints for the minimum workflow tool set
- thin native registration wrappers that call those handlers
- thin MCP registration wrappers that call those handlers
-
-Exit criteria:
-
- native tool behavior is unchanged
- no workflow tool logic is duplicated in MCP server code
-
-### 2. Workflow-Tool MCP Surface
-
-Goal: add an MCP server surface for real SF workflow tools, distinct from the current session/read API.
-
-Preferred first-cut tool set:
-
- `sf_summary_save`
- `sf_decision_save`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
- `sf_task_complete`
- `sf_slice_complete`
- `sf_complete_milestone`
- `sf_validate_milestone`
- `sf_replan_slice`
- `sf_reassess_roadmap`
- `sf_save_gate_result`
- `sf_milestone_status`
-
-Likely files:
-
- `packages/mcp-server/src/server.ts` or a new sibling server package
- `packages/mcp-server/src/...` supporting modules
- shared tool-definition metadata if needed
-
-Decisions to make during implementation:
-
- extend existing MCP package vs create `packages/mcp-sf-tools-server`
- canonical names only vs selected alias export
- single combined server vs separate “session” and “workflow” server modes
-
-Exit criteria:
-
- MCP tool discovery shows the minimum tool set
- each MCP tool invokes the shared handlers successfully in isolation
-
-### 3. Safety and Policy Parity
-
-Goal: ensure MCP mutations enforce the same rules as native tool calls.
-
-Targets:
-
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
- any current tool-call gating hooks tied to native runtime only
- MCP wrapper layer before shared handler invocation
-
-Required protections:
-
- discussion gate blocking
- queue-mode restrictions
- write-path restrictions
- canonical DB/file rendering order
-
-Exit criteria:
-
- MCP cannot be used to bypass native write restrictions
- blocked native scenarios remain blocked over MCP
-
-### 4. Claude Code Provider Integration
-
-Goal: attach the SF workflow-tool MCP surface to Claude Code sessions.
-
-Targets:
-
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
- `src/resources/extensions/claude-code-cli/index.ts`
-
-Expected work:
-
- build a SF-managed `mcpServers` config for the Claude SDK session
- attach the workflow MCP server only when the session requires SF tools
- keep current Claude Code streaming behavior intact
-
-Exit criteria:
-
- Claude Code session can discover the SF workflow MCP tools
- task execution path can call `sf_task_complete` successfully
-
-### 5. Capability Detection and Failure Path
-
-Goal: refuse to start tool-dependent workflows when required capabilities are unavailable.
-
-Targets:
-
- SF dispatch / auto-mode preflight
- provider selection and routing checks
- user-facing compatibility errors
-
-Required behavior:
-
- if native SF tools are available, proceed
- else if SF workflow MCP tools are available, proceed
- else fail fast with a precise message
-
-Exit criteria:
-
- no execution prompt is sent that requires unavailable tools
- users with only unsupported capability combinations get a hard error, not a fake fallback
-
-### 6. Prompt and Documentation Alignment
-
-Goal: keep the workflow contract strict while removing transport assumptions from docs and runtime messaging.
-
-Targets:
-
- `src/resources/extensions/sf/prompts/execute-task.md`
- related planning/discuss prompts that reference tool availability
- provider and MCP docs
-
-Rules:
-
- prompts should keep requiring canonical SF completion/planning tools
- prompts should not imply “native in-process tool only”
- docs should explain native vs MCP-backed fulfillment paths
-
-Exit criteria:
-
- prompt contract matches runtime reality
- no provider is told to use a tool surface it cannot access
-
-## Phase Plan
-
-## Phase 1: Spike and Handler Extraction
-
-Scope:
-
- extract shared logic for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`
- prove native wrappers still work
-
-Why first:
-
- these tools are enough to test end-to-end completion semantics without migrating the full catalog
-
-Verification:
-
- existing native tests still pass
- new unit tests cover shared handler entrypoints directly
-
-## Phase 2: Minimal Workflow MCP Server
-
-Scope:
-
- expose the three extracted tools over MCP
- ensure discovery schemas are clean and canonical
-
-Verification:
-
- MCP discovery returns all three tools
- direct MCP calls succeed against a fixture project
-
-## Phase 3: Claude Code End-to-End Proof
-
-Scope:
-
- wire the minimal workflow MCP server into the Claude SDK session
- run a single execution path that ends with task completion
-
-Verification:
-
- Claude Code can call `sf_task_complete`
- summary file, DB state, and plan checkbox update correctly
-
-## Phase 4: Expand to Full Minimum Workflow Set
-
-Scope:
-
- add planning, slice completion, milestone completion, roadmap reassessment, and gate result tools
-
-Verification:
-
- discuss/plan/execute/complete lifecycle works over MCP for the supported flow set
-
-## Phase 5: Capability Gating and UX Hardening
-
-Scope:
-
- add preflight capability checks
- add clear error messaging for unsupported setups
-
-Verification:
-
- unsupported provider/session combinations fail before execution starts
-
-## Phase 6: Prompt and Doc Cleanup
-
-Scope:
-
- align prompts and docs with the new transport-neutral contract
-
-Verification:
-
- prompt references are accurate
- docs describe the supported architecture and limitations
-
-## File-Level Starting Map
-
-High-probability files for the first implementation:
-
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
- `src/resources/extensions/sf/tools/complete-task.ts`
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
- `src/resources/extensions/claude-code-cli/index.ts`
- `packages/mcp-server/src/server.ts`
- `packages/mcp-server/src/session-manager.ts`
- `packages/mcp-server/README.md`
- `src/resources/extensions/sf/prompts/execute-task.md`
-
-## Testing Strategy
-
-### Unit
-
- shared handlers
- MCP wrapper adapters
- gating / capability-check helpers
-
-### Integration
-
- direct MCP tool invocation against fixture projects
- native tool invocation regression coverage
- Claude Code provider path with MCP attached
-
-### End-to-End
-
- plan or execute a small fixture task and complete it through canonical SF tools
- confirm DB row, rendered summary, and plan state stay in sync
-
-## Risks
-
-### Risk 1: Logic Drift
-
-If native and MCP wrappers each evolve their own behavior, parity will collapse quickly.
-
-Mitigation:
-
- shared handler extraction before broad MCP exposure
-
-### Risk 2: Safety Regression
-
-If MCP becomes a side door around native gating, the architecture is worse than before.
-
-Mitigation:
-
- centralize or reuse gating checks before shared handler invocation
-
-### Risk 3: Overly Broad First Rollout
-
-Exporting every tool and alias immediately increases scope and test burden.
-
-Mitigation:
-
- ship a minimal workflow tool set first
-
-### Risk 4: Claude SDK Session Wiring Complexity
-
-Attaching MCP servers dynamically may expose edge cases around cwd, permissions, or subprocess lifecycle.
-
-Mitigation:
-
- prove a narrow spike with 2-3 tools before expanding
-
-## Exit Criteria for ADR-008
-
-ADR-008 is considered implemented when:
-
-1. Claude Code-backed execution can use canonical SF workflow tools over MCP.
-2. Native provider behavior remains intact.
-3. Shared handlers back both native and MCP invocation.
-4. Gating and state integrity protections apply equally to MCP mutations.
-5. Capability checks prevent prompts from requiring unavailable tools.
-
-## Recommended Next Task
-
-Start with a narrow spike:
-
-1. Extract shared handlers for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`.
-2. Expose those tools through a minimal workflow MCP server.
-3. Attach that MCP server to Claude Code sessions.
-4. Prove end-to-end task completion on a fixture project.
+This document is kept only as a tombstone for the rejected implementation plan. Do not revive this direction.
--- a/docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md
+++ b/docs/dev/ADR-008-sf-tools-over-mcp-for-provider-parity.md
@ -1,246 +1,25 @@
-# ADR-008: Expose SF Workflow Tools Over MCP for Provider Parity
+# ADR-008: SF Tools Over MCP for Provider Parity

-**Status:** Superseded — historical record
+**Status:** Rejected — never implement
 **Date:** 2026-04-09
 **Superseded by:** `docs/adr/0000-purpose-to-software-compiler.md`, `docs/dev/ADR-020-internal-wire-architecture.md`
-**Deciders:** Jeremy McSpadden
-**Related:** ADR-004 (capability-aware model routing), ADR-007 (model catalog split and provider API encapsulation), `src/resources/extensions/sf/bootstrap/db-tools.ts`, `src/resources/extensions/claude-code-cli/stream-adapter.ts`, `packages/mcp-server/src/server.ts`
-
-## Context
-
-> Current boundary: SF does not expose its workflow as an MCP server for other
-> clients. SF runs directly through `sf`/`/sf autonomous`; MCP is a client-side
-> integration surface for external tools that SF may call. This ADR is preserved
-> as historical context for a provider-parity idea that was not adopted.
-
-SF currently has two different tool surfaces:
-
-1. **In-process extension tools** registered directly into the runtime via `pi.registerTool(...)`.
-2. **An external MCP server** that exposes session orchestration and read-only project inspection.
-
-This split is now creating a real provider compatibility problem.
-
-### What exists today
-
-The core SF workflow tools are internal extension tools. Examples include:
-
- `sf_summary_save`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
- `sf_task_complete`
- `sf_slice_complete`
- `sf_complete_milestone`
- `sf_validate_milestone`
- `sf_replan_slice`
- `sf_reassess_roadmap`
-
-These are registered in `src/resources/extensions/sf/bootstrap/db-tools.ts` and related bootstrap files. SF prompts assume these tools are available during discuss, plan, and execute flows.
-
-Separately, `packages/mcp-server/src/server.ts` exposes a different tool surface:
-
- session control: `sf_execute`, `sf_status`, `sf_result`, `sf_cancel`, `sf_query`, `sf_resolve_blocker`
- read-only inspection: `sf_progress`, `sf_roadmap`, `sf_history`, `sf_doctor`, `sf_captures`, `sf_knowledge`
-
-That MCP server is useful, but it is **not** a transport for the internal workflow/mutation tools.
-
-### The current failure mode
-
-The Claude Code CLI provider uses the Anthropic Agent SDK through `src/resources/extensions/claude-code-cli/stream-adapter.ts`. That adapter starts a Claude SDK session, but it does not forward the internal SF tool registry into the SDK session, nor does it attach a SF MCP server for those tools.
-
-As a result:
-
- prompts tell the model to call tools like `sf_task_complete`
- the tools exist in SF
- but Claude Code sessions do not actually receive those tools
-
-This produces a contract mismatch: the model is required to use tools that are unavailable in that provider path.
-
-### Why this matters
-
-This is not a one-off Claude Code bug. It reveals a deeper architectural issue:
-
- SF’s core workflow contract is transport-specific
- prompt authors assume “internal extension tool availability”
- provider integrations do not all share the same execution surface
-
-If SF wants provider parity, its workflow tools need a transport-neutral exposure model.

 ## Decision

-**Expose the SF workflow tool contract over MCP as a first-class transport, and make MCP the compatibility layer for providers that cannot directly access the in-process SF tool registry.**
+Never build or restore an SF MCP server package. SF is not an MCP server product and must not become one.

-This means:
+Current SF uses MCP only as a client-side integration surface for external tools that SF may call. SF workflow execution stays inside the `sf` runtime through in-process extension tools, daemon/RPC/headless paths, and DB-backed workflow state.

-1. SF will keep its existing in-process tool registration for native runtime use.
-2. SF will add an MCP execution surface for the same workflow tools.
-3. Both surfaces must call the same underlying business logic.
-4. Provider integrations such as Claude Code will use the MCP surface when they cannot access native in-process tools directly.
+## Reason

-The decision is explicitly **not** to replace the native tool system with MCP everywhere. MCP is the parity and portability layer, not the only runtime path.
+The original ADR proposed exposing SF workflow tools through a new MCP server for provider parity. That creates another mutable workflow transport and another state-safety surface. The current architecture avoids that extra server boundary: planning, execution, validation, and completion write structured state to SQLite first and render markdown/JSON as projections.

-## Decision Details
+## Operational Rule

-### 1. One handler layer, multiple transports
+- Never recreate `packages/mcp-server`.
+- Never add a `src/mcp-server.ts` workflow backend.
+- Never document SF as an MCP server provider.
+- Never add provider-parity work that depends on SF exposing itself over MCP.
+- Keep `src/resources/extensions/mcp-client/` as the client integration boundary.

-SF tool behavior must not be implemented twice.
-
-The transport-neutral business logic for workflow tools should be shared by:
-
- native extension tool registration (`pi.registerTool(...)`)
- MCP server tool registration
-
-The MCP server should wrap the same handlers used by `db-tools.ts`, `query-tools.ts`, and related modules. This avoids logic drift and keeps validation, DB writes, file rendering, and recovery behavior consistent.
-
-### 2. Add a workflow-tool MCP surface
-
-SF will expose the workflow tools required for discuss, planning, execution, and completion over MCP.
-
-Initial minimum set:
-
- `sf_summary_save`
- `sf_decision_save`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
- `sf_task_complete`
- `sf_slice_complete`
- `sf_complete_milestone`
- `sf_validate_milestone`
- `sf_replan_slice`
- `sf_reassess_roadmap`
- `sf_save_gate_result`
- selected read/query tools such as `sf_milestone_status`
-
-Aliases should be treated conservatively. MCP should prefer canonical names unless compatibility requires exposing aliases.
-
-### 3. Preserve safety semantics
-
-The current SF safety model includes write gates, discussion gates, queue-mode restrictions, and state integrity guarantees.
-
-Those guarantees must continue to apply when tools are invoked over MCP. In particular:
-
- MCP must not create a path that bypasses write gating
- MCP mutations must preserve the same DB/file/state invariants as native tools
- provider-specific fallback behavior must not allow manual summary writing in place of canonical completion tools
-
-### 4. Make provider capability checks explicit
-
-Before dispatching a workflow that requires SF workflow tools, SF should check whether the selected provider/session can access the required tool surface.
-
-If a provider cannot access either:
-
- native in-process SF tools, or
- the SF MCP workflow tool surface
-
-then SF must fail early with a clear compatibility error rather than allowing execution to continue in a degraded, state-breaking mode.
-
-### 5. Keep the existing session/read MCP server
-
-The existing MCP server in `packages/mcp-server` remains valid. It serves a different purpose:
-
- remote session orchestration
- status/result polling
- filesystem-backed project inspection
-
-The new workflow-tool MCP surface is complementary, not a replacement.
-
-## Alternatives Considered
-
-### Alternative A: Reroute away from Claude Code whenever tool-backed execution is needed
-
-This would fix the immediate failure for multi-provider users, but it does not solve provider parity. It also fails completely for users who only have Claude Code configured.
-
-**Rejected** because it treats the symptom, not the architectural gap.
-
-### Alternative B: Hard-fail Claude Code and require another provider
-
-This is a valid short-term guardrail and may still be used before MCP support is complete.
-
-**Rejected as the long-term architecture** because it permanently excludes a supported provider from first-class SF execution.
-
-### Alternative C: Inject the internal SF tool registry directly into the Claude Agent SDK without MCP
-
-This would tightly couple SF’s internal extension runtime to a provider-specific integration path. It would not generalize well to other providers or external tool clients.
-
-**Rejected** because it creates a provider-specific bridge instead of a transport-neutral contract.
-
-### Alternative D: Replace native SF tools entirely with MCP
-
-This would simplify the conceptual model, but it would force all runtimes through an external protocol boundary even when the native in-process path is faster and already works well.
-
-**Rejected** because MCP is needed for portability, not because the native tool system is flawed.
-
-## Consequences
-
-### Positive
-
-1. **Provider parity improves.** Providers that can consume MCP tools can participate in full SF workflow execution.
-2. **The workflow contract becomes transport-neutral.** Prompts can rely on capabilities rather than a specific runtime implementation detail.
-3. **One compatibility story for external clients.** Claude Code, Cursor, and other MCP-capable clients can use the same workflow tool surface.
-4. **Better long-term architecture.** Internal tools and external transports converge on shared handlers instead of diverging implementations.
-
-### Negative
-
-1. **Larger surface area to secure and test.** Mutation tools over MCP are higher risk than read-only inspection tools.
-2. **Migration complexity.** Tool registration, gating, and handler extraction must be refactored carefully.
-3. **Two transport paths must remain aligned.** Native and MCP invocation semantics must stay behaviorally identical.
-
-### Neutral / Tradeoff
-
-The system will now support:
-
- native in-process tool execution when available
- MCP-backed tool execution when native access is unavailable
-
-That is more complex than a single-path system, but it is the cost of provider portability without sacrificing native runtime quality.
-
-## Migration Plan
-
-### Phase 1: Extract shared handlers
-
-Refactor workflow tools so MCP and native registration can call the same transport-neutral functions.
-
-Priority targets:
-
- `sf_summary_save`
- `sf_task_complete`
- `sf_plan_milestone`
- `sf_plan_slice`
- `sf_plan_task`
-
-### Phase 2: Stand up the workflow-tool MCP server
-
-Add a new MCP surface for workflow tool execution. This may extend the existing MCP package or live as a sibling package, but it must be clearly separated from the current session/read API.
-
-### Phase 3: Port safety enforcement
-
-Move or centralize write gates and related policy checks so MCP mutations cannot bypass the existing safety model.
-
-### Phase 4: Attach MCP workflow tools to Claude Code sessions
-
-Update the Claude Code provider integration to pass a SF-managed `mcpServers` configuration into the Claude Agent SDK session when required.
-
-### Phase 5: Add provider capability gating
-
-Before tool-dependent flows begin, verify that the active provider can access the required SF workflow tools via either native registration or MCP.
-
-### Phase 6: Update prompts and docs
-
-Prompt contracts should remain strict about using canonical SF completion/planning tools, but documentation and runtime messaging must no longer assume that only native in-process tool registration satisfies that contract.
-
-## Validation
-
-Success is defined by all of the following:
-
-1. A Claude Code-backed execution session can complete a task using canonical SF workflow tools without manual summary writing.
-2. Native provider behavior remains unchanged.
-3. MCP-invoked workflow tools produce the same DB updates, rendered artifacts, and state transitions as native tool calls.
-4. Write-gate and discussion-gate protections still hold under MCP invocation.
-5. When required capabilities are unavailable, SF fails early with a precise compatibility error.
-
-## Scope Notes
-
-This ADR establishes the architectural direction. It does **not** require full MCP exposure of every historical alias or every auxiliary tool in the first implementation.
-
-The first implementation should prioritize the minimum workflow tool set needed to make discuss/plan/execute/complete flows work safely for MCP-capable providers.
+If a future integration needs external control of SF, use the daemon/RPC/headless contract. MCP remains for SF-as-client only.
--- a/docs/dev/ADR-017-charm-tui-client.md
+++ b/docs/dev/ADR-017-charm-tui-client.md
@ -9,7 +9,7 @@ sf today bundles its TUI directly in core: `pi-tui` (~10.5k LOC of TypeScript) i

 Three forces argue for extracting the TUI:

-1. **sf is becoming truly headless-first** — `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` already exist. CLI invocations talk to the daemon. sf can be called as an MCP backend by Claude Code, Cursor, Hermes — they're TUI-agnostic. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
+1. **sf is becoming truly headless-first** — `packages/daemon` and `packages/rpc-client` already exist. CLI invocations talk to the daemon. SF uses MCP as a client integration surface for external tools, not as an SF workflow server. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
 2. **The Charm TUI stack is dramatically more capable than what `pi-tui` builds today.** `bubbletea` + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` + `x/mosaic` (image rendering) + `x/vcr` (recording) + `pony` + `ultraviolet` (declarative markup) compose to far better UX than reproducing in TS would.
 3. **Removing `pi-tui` from sf core deletes ~10k LOC of TS** — leaner core, fewer TUI-coupled assumptions in `pi-coding-agent`, cleaner test surface.

@ -42,7 +42,7 @@ This ADR plans the extraction.

 - **sf core gets ~10k LOC leaner** after Stage 2.
 - **Charm stack quality** comes for free — animations (`harmonica`), inline images (`x/mosaic`), markdown (`glamour`), forms (`huh`), recording (`x/vcr`).
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + MCP + clients. No TUI coupled to engine.
+- **Headless / API-first architecture** is cleanly visible: daemon + RPC + clients, with MCP client integration for external tools. No TUI coupled to engine.
 - **Remote TUI for free** — once the client is Wish-served (could be a v3.x extension), `tailscale ssh aidev sf` opens a full TUI session over SSH. Today's `pi-tui` is local-process only.
 - **Recordings of TUI sessions** — flight recorder (ADR-015) integrates with the Charm TUI naturally; `pi-tui` would need separate work to support this.

@ -85,7 +85,7 @@ Total: ~12–16 weeks across stages.

 ## References

- `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` — already exist; this ADR makes them load-bearing for clients.
+- `packages/daemon`, `packages/rpc-client` — already exist; this ADR makes them load-bearing for clients.
 - `packages/pi-tui` — the existing TUI being deprecated.
 - `ADR-013` — Network: future SSH-served TUI via `wish` rides the same substrate.
 - `ADR-015` — Flight recorder: `sf-tui` records its sessions naturally.
--- a/docs/dev/FILE-SYSTEM-MAP.md
+++ b/docs/dev/FILE-SYSTEM-MAP.md
@ -36,7 +36,7 @@
 | **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
 | **LSP** | Language Server Protocol client and multiplexer |
 | **Mac Tools** | macOS-native utilities (Swift CLI) |
-| **MCP Server/Client** | Model Context Protocol server and client |
+| **MCP Client** | Model Context Protocol client integration for external tools |
 | **Memory Extension** | In-session memory pipeline and storage |
 | **Migration** | Data and config migration tools |
 | **Modes** | Interactive TUI, Print, RPC, and Web modes |
@ -87,7 +87,6 @@
 | src/help-text.ts | CLI | Generates help text for all subcommands |
 | src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
 | src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
-| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients |
 | src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF |
 | src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
 | src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to SF |
@ -388,9 +387,9 @@
 |------|-----------------|-------------|
 | modes/index.ts | Modes | Mode system exports |
 | modes/print-mode.ts | Modes | Non-interactive print mode |
-| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access |
-| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction |
-| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions |
+| modes/rpc/rpc-mode.ts | Modes, RPC | RPC server mode for remote access |
+| modes/rpc/rpc-client.ts | Modes, RPC | RPC client for remote agent interaction |
+| modes/rpc/rpc-types.ts | Modes, RPC | RPC protocol type definitions |
 | modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
 | modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
 | modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
@ -610,7 +609,7 @@
 | remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
 | remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
 | remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
-| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration |
+| mcp-client/index.ts | MCP Client | Model Context Protocol client integration |
 | subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
 | subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
 | subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
@ -826,7 +825,7 @@
 | File | System Label(s) | Description |
 |------|-----------------|-------------|
 | vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
-| vscode-extension/src/sf-client.ts | VS Code Extension, MCP Server/Client | RPC client for SF agent communication |
+| vscode-extension/src/sf-client.ts | VS Code Extension, RPC | RPC client for SF agent communication |
 | vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @sf command |
 | vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |

@ -975,7 +974,7 @@ Quick lookup: which files are part of each system?
 | **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* |
 | **LSP** | pi-coding-agent/src/core/lsp/* |
 | **Mac Tools** | src/resources/extensions/mac-tools/* |
-| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/sf-client.ts, modes/rpc/* |
+| **MCP Client** | src/resources/extensions/mcp-client/index.ts |
 | **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
 | **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
 | **Modes** | pi-coding-agent/src/modes/* |
--- a/docs/plans/todo-triage-2026-05-06-plan.md
+++ b/docs/plans/todo-triage-2026-05-06-plan.md
@ -34,10 +34,10 @@ These were not clearly represented as durable roadmap items and should be planne
 | Item | Why | Suggested tier | Implementation note |
 |---|---|---|---|
 | Typed SF environment schema | `SF_*` env vars should fail early with actionable diagnostics instead of late runtime surprises. | Tier 1 | Add an SF-owned env schema module and route startup/tool validation through it. |
-| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Start with file-family thresholds or focused test suites for dispatch, recovery, UOK runtime, and validation. |
-| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Tier 2 | Add a minimal lifecycle fixture that exercises DB rows as executable truth. |
+| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Started with focused DB-authority/UOK runtime suites; continue with dispatch and recovery families before changing global thresholds. |
+| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Done | Added runtime-state regression coverage proving SQLite slice/task order stays authoritative over stale markdown/JSON projections, and DB-backed runtime refuses implicit roadmap, plan, and summary imports. |
 | Fault-injection recovery tests | Stuck-loop, timeout, runaway, stale lock, and projection drift recovery are high-risk paths. | Tier 2 | Add deterministic fault fixtures before adding broader chaos coverage. |
-| MCP package completeness audit | Docs mention MCP surfaces, but production completeness is unclear. | Tier 2 | Inspect `packages/mcp-server/`, record supported contracts, gaps, and deferred work. |
+| MCP server residue/docs cleanup | SF currently ships the MCP client extension only; tracked MCP server source was removed. | Done | Removed untracked `packages/mcp-server/` residue and updated durable docs so future work never recreates an SF MCP server. |
 | Biome schema version cleanup | Tooling drift creates noisy lint/config failures. | Tier 3 | Run `biome migrate` as a focused tooling cleanup. |
 | Headless assistant-text preview completion | Prior headless work deferred buffer separation. | Tier 2 | Finish `assistantTextBuffer` / `thinkingBuffer` separation and preview flushing. |

--- a/src/headless-ui.ts
+++ b/src/headless-ui.ts
@ -56,6 +56,36 @@ export interface HeadlessHeartbeatContext {
 	lastEventDetail?: string;
 }

+export interface AssistantPreviewDelta {
+	kind: "text" | "thinking";
+	text: string;
+}
+
+/**
+ * Extract a concise preview delta from an assistant message update.
+ *
+ * Purpose: keep headless non-verbose output honest by separating assistant text
+ * from model thinking before both are flushed at tool starts and message end.
+ *
+ * Consumer: headless.ts streaming event loop and headless progress tests.
+ */
+export function extractAssistantPreviewDelta(
+	assistantMessageEvent: unknown,
+): AssistantPreviewDelta | null {
+	if (!assistantMessageEvent || typeof assistantMessageEvent !== "object") {
+		return null;
+	}
+	const event = assistantMessageEvent as Record<string, unknown>;
+	const type = String(event.type ?? "");
+	if (type !== "text_delta" && type !== "thinking_delta") return null;
+	const text = String(event.delta ?? event.text ?? "");
+	if (!text) return null;
+	return {
+		kind: type === "thinking_delta" ? "thinking" : "text",
+		text,
+	};
+}
+
 // ---------------------------------------------------------------------------
 // ANSI Color Helpers
 // ---------------------------------------------------------------------------
--- a/src/headless.ts
+++ b/src/headless.ts
@ -60,6 +60,7 @@ import type { HeadlessJsonResult, OutputFormat } from "./headless-types.js";
 import { VALID_OUTPUT_FORMATS } from "./headless-types.js";
 import type { ExtensionUIRequest, ProgressContext } from "./headless-ui.js";
 import {
+	extractAssistantPreviewDelta,
 	formatHeadlessHeartbeat,
 	formatProgress,
 	formatPromptTraceLines,
@ -1440,9 +1441,15 @@ async function runHeadlessOnce(
 						}
 					}
 				}
-				// Non-verbose: accumulate text_delta for truncated one-liner
-				else if (ame?.type === "text_delta") {
-					assistantTextBuffer += String(ame.delta ?? ame.text ?? "");
+				// Non-verbose: accumulate separated thinking/text previews for
+				// truncated one-liners before tool calls and message end.
+				else {
+					const previewDelta = extractAssistantPreviewDelta(ame);
+					if (previewDelta?.kind === "text") {
+						assistantTextBuffer += previewDelta.text;
+					} else if (previewDelta?.kind === "thinking") {
+						thinkingBuffer += previewDelta.text;
+					}
 				}
 			}

--- a/src/resources/extensions/sf/metrics.js
+++ b/src/resources/extensions/sf/metrics.js
@ -41,7 +41,11 @@ function formatAggregateModelIdentity(modelId) {
 	});
 }
 /**
- * Record a unit outcome to the llm_task_outcomes table for Bayesian learning.
+ * Record a unit outcome to the llm_task_outcomes table for Bayesian learning,
+ * and also to model-learner for per-task-type performance tracking.
+ *
+ * Integration point for quick win #2: model-learner activates continuous
+ * per-task-type model performance tracking for adaptive routing decisions.
 */
 async function recordUnitOutcome(unit) {
 	const db = getDatabase();
@ -54,7 +58,7 @@ async function recordUnitOutcome(unit) {
 		// drop bare-id entries rather than guessing the provider.
 		if (slashIdx === -1) return;
 		const provider = modelId.slice(0, slashIdx);
-		recordOutcome(db, {
+		const outcome = {
 			modelId,
 			provider,
 			unitType: unit.type,
@ -68,7 +72,24 @@ async function recordUnitOutcome(unit) {
 			tokens_total: unit.tokens.total,
 			cost_usd: unit.cost,
 			recorded_at: unit.startedAt,
-		});
+		};
+
+		// Record to UOK llm_task_outcomes table
+		recordOutcome(db, outcome);
+
+		// Quick Win #2: Also record to model-learner for per-task-type tracking
+		try {
+			const { ModelLearner } = await import("./model-learner.js");
+			const learner = new ModelLearner(basePath);
+			learner.recordOutcome(unit.type, modelId, {
+				success: true,
+				timeout: false,
+				tokensUsed: unit.tokens.total,
+				costUsd: unit.cost,
+			});
+		} catch {
+			/* model-learner integration is optional; never block outcome recording */
+		}
 	} catch {
 		/* fire-and-forget */
 	}
--- a/src/resources/extensions/sf/state.js
+++ b/src/resources/extensions/sf/state.js
@ -37,17 +37,12 @@ import {
 	getReplanHistory,
 	getSlice,
 	getSliceTasks,
-	insertMilestone,
-	insertSlice,
-	insertTask,
 	isDbAvailable,
-	updateSliceStatus,
-	updateTaskStatus,
 	wasDbOpenAttempted,
 } from "./sf-db.js";
 import { isClosedStatus, isDeferredStatus } from "./status-guards.js";
 import { extractVerdict } from "./verdict-parser.js";
-import { logError, logWarning } from "./workflow-logger.js";
+import { logWarning } from "./workflow-logger.js";
 /**
 * A "ghost" milestone directory contains only META.json (and no substantive
 * files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY).  These appear when
@ -219,8 +214,7 @@ export async function getActiveMilestoneId(basePath) {
 * STATE.md is a rendered cache of this output.
 *
 * When DB is available, queries milestone/slice/task tables directly.
- * Falls back to filesystem parsing for unmigrated projects or when DB
- * has zero milestones (e.g. first run before migration).
+ * Falls back to filesystem parsing only when DB is unavailable.
 */
 export async function deriveState(basePath) {
 	// Return cached result if within the TTL window for the same basePath
@ -235,22 +229,6 @@ export async function deriveState(basePath) {
 	let result;
 	// Dual-path: try DB-backed derivation first when hierarchy tables are populated
 	if (isDbAvailable()) {
-		let dbMilestones = getAllMilestones();
-		// Disk→DB reconciliation when DB is empty but disk has milestones (#2631).
-		// deriveStateFromDb() does its own reconciliation, but deriveState() skips
-		// it entirely when the DB is empty. Sync here so the DB path is used when
-		// disk milestones exist but haven't been migrated yet.
-		if (dbMilestones.length === 0) {
-			const diskIds = findMilestoneIds(basePath);
-			let synced = false;
-			for (const diskId of diskIds) {
-				if (!isGhostMilestone(basePath, diskId)) {
-					insertMilestone({ id: diskId, status: "active" });
-					synced = true;
-				}
-			}
-			if (synced) dbMilestones = getAllMilestones();
-		}
 		const stopDbTimer = debugTime("derive-state-db");
 		result = await deriveStateFromDb(basePath);
 		stopDbTimer({
@ -300,91 +278,29 @@ function extractContextTitle(content, fallback) {
 const isStatusDone = isClosedStatus;
 /**
 * Derive SF state from the milestones/slices/tasks DB tables.
- * Flag files (PARKED, VALIDATION, CONTINUE, REPLAN, REPLAN-TRIGGER, CONTEXT-DRAFT)
- * are still checked on the filesystem since they aren't in DB tables.
+ * Non-planning control files (PARKED, CONTINUE, REPLAN, REPLAN-TRIGGER,
+ * CONTEXT-DRAFT) are still checked on the filesystem since they are not
+ * hierarchy state.
 * Requirements also stay file-based via parseRequirementCounts().
 *
- * Must produce field-identical SFState to _deriveStateImpl() for the same project.
+ * Must not import rendered roadmap, plan, or summary artifacts into DB-backed
+ * runtime state. Explicit migration/repair flows own any legacy file import.
 */
 function reconcileDiskToDb(basePath) {
-	let allMilestones = getAllMilestones();
-	const dbIdSet = new Set(allMilestones.map((m) => m.id));
 	const diskIds = findMilestoneIds(basePath);
-	let synced = false;
-	for (const diskId of diskIds) {
-		if (!dbIdSet.has(diskId) && !isGhostMilestone(basePath, diskId)) {
-			insertMilestone({ id: diskId, status: "active" });
-			synced = true;
-		}
-	}
-	if (synced) allMilestones = getAllMilestones();
-	for (const mid of diskIds) {
-		if (isGhostMilestone(basePath, mid)) continue;
-		const roadmapPath = resolveMilestoneFile(basePath, mid, "ROADMAP");
-		if (!roadmapPath) continue;
-		const dbSlices = getMilestoneSlices(mid);
-		const dbSliceIds = new Set(dbSlices.map((s) => s.id));
-		let roadmapContent;
-		try {
-			roadmapContent = readFileSync(roadmapPath, "utf-8");
-		} catch (err) {
+	if (diskIds.length > 0) {
+		const dbIds = new Set(getAllMilestones().map((m) => m.id));
+		const diskOnlyIds = diskIds.filter(
+			(id) => !dbIds.has(id) && !isGhostMilestone(basePath, id),
+		);
+		if (diskOnlyIds.length > 0) {
 			logWarning(
 				"state",
-				"reconcileDiskToDb: roadmap read failed, skipping milestone",
-				{
-					mid,
-					error: err.message,
-				},
+				`DB-backed state ignored ${diskOnlyIds.length} disk-only milestone(s): ${diskOnlyIds.join(", ")}`,
 			);
-			continue;
-		}
-		const parsed = parseRoadmap(roadmapContent);
-		for (const s of parsed.slices) {
-			if (dbSliceIds.has(s.id)) continue;
-			const summaryPath = resolveSliceFile(basePath, mid, s.id, "SUMMARY");
-			const sliceStatus = s.done || summaryPath ? "complete" : "pending";
-			insertSlice({
-				id: s.id,
-				milestoneId: mid,
-				title: s.title,
-				status: sliceStatus,
-				risk: s.risk,
-				depends: s.depends,
-				demo: s.demo,
-			});
-		}
-		// Reconcile stale *existing* slice rows (#3599): a slice row may exist in
-		// the DB with status "pending" even though disk artifacts (SUMMARY) prove
-		// completion — the same class of desync that task-level reconciliation
-		// (further below) already handles.  Without this, the dependency resolver
-		// builds doneSliceIds from stale DB rows and downstream slices stay blocked
-		// forever with "No slice eligible".
-		for (const dbSlice of dbSlices) {
-			if (isStatusDone(dbSlice.status)) continue;
-			const summaryPath = resolveSliceFile(
-				basePath,
-				mid,
-				dbSlice.id,
-				"SUMMARY",
-			);
-			if (summaryPath) {
-				try {
-					updateSliceStatus(mid, dbSlice.id, "complete");
-					logWarning(
-						"reconcile",
-						`slice ${mid}/${dbSlice.id} status reconciled from "${dbSlice.status}" to "complete" (#3599)`,
-						{ mid, sid: dbSlice.id },
-					);
-				} catch (e) {
-					logError("reconcile", `failed to update slice ${dbSlice.id}`, {
-						sid: dbSlice.id,
-						error: e.message,
-					});
-				}
-			}
 		}
 	}
-	return allMilestones;
+	return getAllMilestones();
 }
 function buildCompletenessSet(basePath, milestones) {
 	const completeMilestoneIds = new Set();
@ -727,47 +643,14 @@ function resolveSliceDependencies(activeMilestoneSlices) {
 	return { activeSlice: null, activeSliceRow: null };
 }
 async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
-	let tasks = getSliceTasks(milestoneId, sliceId);
+	const tasks = getSliceTasks(milestoneId, sliceId);
 	if (tasks.length === 0 && planFile) {
-		try {
-			const planContent = await loadFile(planFile);
-			if (planContent) {
-				const diskPlan = parsePlan(planContent);
-				if (diskPlan.tasks.length > 0) {
-					for (let i = 0; i < diskPlan.tasks.length; i++) {
-						const t = diskPlan.tasks[i];
-						try {
-							insertTask({
-								id: t.id,
-								sliceId,
-								milestoneId,
-								title: t.title,
-								status: t.done ? "complete" : "pending",
-								sequence: i + 1,
-							});
-						} catch (insertErr) {
-							logWarning(
-								"reconcile",
-								`failed to insert task ${t.id} from plan file: ${insertErr instanceof Error ? insertErr.message : String(insertErr)}`,
-							);
-						}
-					}
-					tasks = getSliceTasks(milestoneId, sliceId);
-					logWarning(
-						"reconcile",
-						`imported ${tasks.length} tasks from plan file for ${milestoneId}/${sliceId} — DB was empty (#3600)`,
-						{ mid: milestoneId, sid: sliceId },
-					);
-				}
-			}
-		} catch (err) {
-			logError(
-				"reconcile",
-				`plan-file task import failed for ${milestoneId}/${sliceId}: ${err instanceof Error ? err.message : String(err)}`,
-			);
-		}
+		logWarning(
+			"reconcile",
+			`slice plan file exists for ${milestoneId}/${sliceId}, but DB has no task rows; refusing runtime import`,
+			{ mid: milestoneId, sid: sliceId },
+		);
 	}
-	let reconciled = false;
 	for (const t of tasks) {
 		if (isStatusDone(t.status)) continue;
 		const summaryPath = resolveTaskFile(
@ -778,41 +661,13 @@ async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
 			"SUMMARY",
 		);
 		if (summaryPath && existsSync(summaryPath)) {
-			// Validate that the summary file has actual content (#sf-moobj36o-6rxy6e)
-			const summaryContent = readFileSync(summaryPath, "utf-8");
-			if (!isValidTaskSummary(summaryContent)) {
-				logWarning(
-					"reconcile",
-					`task ${milestoneId}/${sliceId}/${t.id} has empty/invalid SUMMARY — skipping reconciliation`,
-					{ mid: milestoneId, sid: sliceId, tid: t.id },
-				);
-				continue;
-			}
-			try {
-				updateTaskStatus(
-					milestoneId,
-					sliceId,
-					t.id,
-					"complete",
-					new Date().toISOString(),
-				);
-				logWarning(
-					"reconcile",
-					`task ${milestoneId}/${sliceId}/${t.id} status reconciled from "${t.status}" to "complete" (#2514)`,
-					{ mid: milestoneId, sid: sliceId, tid: t.id },
-				);
-				reconciled = true;
-			} catch (e) {
-				logError("reconcile", `failed to update task ${t.id}`, {
-					tid: t.id,
-					error: e.message,
-				});
-			}
+			logWarning(
+				"reconcile",
+				`task ${milestoneId}/${sliceId}/${t.id} has SUMMARY on disk but DB status is "${t.status}"; refusing runtime status import`,
+				{ mid: milestoneId, sid: sliceId, tid: t.id },
+			);
 		}
 	}
-	if (reconciled) {
-		tasks = getSliceTasks(milestoneId, sliceId);
-	}
 	return tasks;
 }
 async function detectBlockers(basePath, milestoneId, sliceId, tasks) {
--- a/src/resources/extensions/sf/tests/db-driven-runtime-state.test.mjs
+++ b/src/resources/extensions/sf/tests/db-driven-runtime-state.test.mjs
@ -0,0 +1,241 @@
+/**
+ * db-driven-runtime-state.test.mjs — DB authority for runtime state.
+ *
+ * Purpose: prove DB-backed projects choose active slices/tasks from structured
+ * SQLite rows even when generated markdown/JSON projections are stale.
+ */
+import assert from "node:assert/strict";
+import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { afterEach, test } from "vitest";
+import {
+	closeDatabase,
+	getAllMilestones,
+	getSliceTasks,
+	insertMilestone,
+	insertSlice,
+	insertTask,
+	openDatabase,
+} from "../sf-db.js";
+import { deriveState, invalidateStateCache } from "../state.js";
+
+const tmpDirs = [];
+
+afterEach(() => {
+	closeDatabase();
+	invalidateStateCache();
+	while (tmpDirs.length > 0) {
+		const dir = tmpDirs.pop();
+		if (dir) rmSync(dir, { recursive: true, force: true });
+	}
+});
+
+function makeProject() {
+	const dir = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
+	tmpDirs.push(dir);
+	mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S01"), {
+		recursive: true,
+	});
+	mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S02"), {
+		recursive: true,
+	});
+	openDatabase(join(dir, ".sf", "sf.db"));
+	insertMilestone({
+		id: "M777",
+		title: "DB runtime authority",
+		status: "active",
+	});
+	insertSlice({
+		milestoneId: "M777",
+		id: "S02",
+		title: "DB first slice",
+		status: "pending",
+		sequence: 1,
+	});
+	insertSlice({
+		milestoneId: "M777",
+		id: "S01",
+		title: "Stale file first slice",
+		status: "pending",
+		sequence: 2,
+	});
+	insertTask({
+		milestoneId: "M777",
+		sliceId: "S02",
+		id: "T02",
+		title: "DB first task",
+		status: "pending",
+		sequence: 1,
+	});
+	insertTask({
+		milestoneId: "M777",
+		sliceId: "S02",
+		id: "T01",
+		title: "DB second task",
+		status: "pending",
+		sequence: 2,
+	});
+	return dir;
+}
+
+test("deriveState_when_generated_projections_are_stale_uses_db_slice_and_task_sequence", async () => {
+	const project = makeProject();
+	const milestoneDir = join(project, ".sf", "milestones", "M777");
+	writeFileSync(
+		join(milestoneDir, "M777-ROADMAP.md"),
+		[
+			"# M777: stale generated roadmap",
+			"",
+			"## Slice Overview",
+			"| ID | Slice | Risk | Depends | Done | After this |",
+			"|----|-------|------|---------|------|------------|",
+			"| S01 | Stale file first slice | low | - |  | stale first |",
+			"| S02 | DB first slice | low | - |  | should still be first by DB |",
+			"",
+		].join("\n"),
+	);
+	writeFileSync(
+		join(milestoneDir, "M777-ROADMAP.json"),
+		JSON.stringify(
+			{
+				origin: "stale-test-projection",
+				slices: [
+					{ id: "S01", title: "Stale file first slice", sequence: 1 },
+					{ id: "S02", title: "DB first slice", sequence: 2 },
+				],
+			},
+			null,
+			2,
+		),
+	);
+	writeFileSync(
+		join(milestoneDir, "slices", "S02", "S02-PLAN.md"),
+		[
+			"# S02: stale generated plan",
+			"",
+			"## Tasks",
+			"",
+			"- [ ] T99: stale task that should not replace DB rows",
+			"",
+		].join("\n"),
+	);
+
+	const state = await deriveState(project);
+
+	assert.equal(state.phase, "executing");
+	assert.equal(state.activeMilestone?.id, "M777");
+	assert.equal(state.activeSlice?.id, "S02");
+	assert.equal(state.activeTask?.id, "T02");
+});
+
+test("deriveState_when_db_has_no_tasks_refuses_runtime_plan_file_import", async () => {
+	const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
+	tmpDirs.push(project);
+	mkdirSync(join(project, ".sf", "milestones", "M778", "slices", "S01"), {
+		recursive: true,
+	});
+	openDatabase(join(project, ".sf", "sf.db"));
+	insertMilestone({
+		id: "M778",
+		title: "DB planning authority",
+		status: "active",
+	});
+	insertSlice({
+		milestoneId: "M778",
+		id: "S01",
+		title: "Plan exists without DB tasks",
+		status: "pending",
+		sequence: 1,
+	});
+	writeFileSync(
+		join(project, ".sf", "milestones", "M778", "M778-ROADMAP.md"),
+		[
+			"# M778: DB planning authority",
+			"",
+			"## Slice Overview",
+			"| ID | Slice | Risk | Depends | Done | After this |",
+			"|----|-------|------|---------|------|------------|",
+			"| S01 | Plan exists without DB tasks | low | - |  | keep planning |",
+			"",
+		].join("\n"),
+	);
+	writeFileSync(
+		join(project, ".sf", "milestones", "M778", "slices", "S01", "S01-PLAN.md"),
+		[
+			"# S01: stale generated plan",
+			"",
+			"## Tasks",
+			"",
+			"- [ ] T01: stale file task",
+			"",
+		].join("\n"),
+	);
+
+	const state = await deriveState(project);
+
+	assert.equal(state.phase, "planning");
+	assert.equal(state.activeMilestone?.id, "M778");
+	assert.equal(state.activeSlice?.id, "S01");
+	assert.equal(state.activeTask, null);
+	assert.equal(getSliceTasks("M778", "S01").length, 0);
+	assert.match(state.nextAction, /has a plan file but no tasks/i);
+});
+
+test("deriveState_when_db_has_no_milestones_refuses_runtime_roadmap_import", async () => {
+	const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
+	tmpDirs.push(project);
+	const milestoneDir = join(project, ".sf", "milestones", "M779");
+	mkdirSync(milestoneDir, { recursive: true });
+	openDatabase(join(project, ".sf", "sf.db"));
+	writeFileSync(
+		join(milestoneDir, "M779-ROADMAP.md"),
+		[
+			"# M779: stale disk-only roadmap",
+			"",
+			"## Slice Overview",
+			"| ID | Slice | Risk | Depends | Done | After this |",
+			"|----|-------|------|---------|------|------------|",
+			"| S01 | Should not import | low | - |  | no DB mutation |",
+			"",
+		].join("\n"),
+	);
+
+	const state = await deriveState(project);
+
+	assert.equal(state.phase, "pre-planning");
+	assert.equal(state.activeMilestone, null);
+	assert.deepEqual(getAllMilestones(), []);
+});
+
+test("deriveState_when_task_summary_exists_keeps_db_task_status_authoritative", async () => {
+	const project = makeProject();
+	const taskDir = join(
+		project,
+		".sf",
+		"milestones",
+		"M777",
+		"slices",
+		"S02",
+		"tasks",
+	);
+	mkdirSync(taskDir, { recursive: true });
+	writeFileSync(
+		join(taskDir, "T02-SUMMARY.md"),
+		[
+			"# T02: stale summary projection",
+			"",
+			"This file is not DB state.",
+			"",
+		].join("\n"),
+	);
+
+	const state = await deriveState(project);
+	const [firstTask] = getSliceTasks("M777", "S02");
+
+	assert.equal(state.phase, "executing");
+	assert.equal(state.activeSlice?.id, "S02");
+	assert.equal(state.activeTask?.id, "T02");
+	assert.equal(firstTask.id, "T02");
+	assert.equal(firstTask.status, "pending");
+});
--- a/src/resources/extensions/sf/triage-self-feedback.js
+++ b/src/resources/extensions/sf/triage-self-feedback.js
@ -184,13 +184,16 @@ export function parseTriageReport(agentOutput) {
 *      Idempotent: skips rows whose ID already appears in the file.
 *   2. Call markResolved for each resolution in the report.
 *      Idempotent: markResolved itself skips already-resolved entries.
+ *   3. Quick Win #1: Auto-fix high-confidence self-reports where confidence > 0.85.
 *
 * @param basePath - project root (directory containing .sf/)
 * @param report   - parsed TriageReport from parseTriageReport()
 */
-export function applyTriageReport(basePath, report) {
+export async function applyTriageReport(basePath, report) {
 	let requirementsAdded = 0;
 	let entriesResolved = 0;
+	let reportsAutoFixed = 0;
+
 	// ── 1. Write promoted requirements ────────────────────────────────────────
 	if (report.promotedRequirements.length > 0) {
 		const sfDir = sfRoot(basePath);
@ -244,6 +247,7 @@ export function applyTriageReport(basePath, report) {
 		}
 		writeFileSync(reqPath, content, "utf-8");
 	}
+
 	// ── 2. Resolve entries ────────────────────────────────────────────────────
 	for (const resolution of report.resolutions) {
 		const evidenceKind = resolution.evidenceKind;
@ -258,5 +262,26 @@ export function applyTriageReport(basePath, report) {
 		);
 		if (mutated) entriesResolved++;
 	}
-	return { requirementsAdded, entriesResolved };
+
+	// ── 3. Quick Win #1: Auto-fix high-confidence self-reports ────────────────
+	// Integration point for self-report-fixer: read open reports and auto-apply
+	// fixes where confidence > 0.85.
+	try {
+		const { autoFixHighConfidenceReports } = await import(
+			"./self-report-fixer.js"
+		);
+		const allOpen = [
+			...readAllSelfFeedback(basePath),
+			...readUpstreamSelfFeedback(),
+		].filter((e) => !e.resolvedAt);
+
+		if (allOpen.length > 0) {
+			const result = await autoFixHighConfidenceReports(basePath, allOpen);
+			reportsAutoFixed = result.applied.length;
+		}
+	} catch {
+		/* self-report fixer is optional; never block triage report application */
+	}
+
+	return { requirementsAdded, entriesResolved, reportsAutoFixed };
 }
--- a/src/tests/headless-progress.test.ts
+++ b/src/tests/headless-progress.test.ts
@ -2,6 +2,7 @@ import assert from "node:assert/strict";
 import { describe, it } from "vitest";
 import type { ProgressContext } from "../headless-ui.js";
 import {
+	extractAssistantPreviewDelta,
 	formatCostLine,
 	formatHeadlessHeartbeat,
 	formatProgress,
@ -556,6 +557,31 @@ describe("formatThinkingLine", () => {
 	});
 });

+describe("extractAssistantPreviewDelta", () => {
+	it("separates assistant text deltas from thinking deltas", () => {
+		assert.deepEqual(
+			extractAssistantPreviewDelta({
+				type: "text_delta",
+				delta: "I will edit the file.",
+			}),
+			{ kind: "text", text: "I will edit the file." },
+		);
+		assert.deepEqual(
+			extractAssistantPreviewDelta({
+				type: "thinking_delta",
+				delta: "Need inspect first.",
+			}),
+			{ kind: "thinking", text: "Need inspect first." },
+		);
+	});
+
+	it("ignores non-preview assistant events", () => {
+		assert.equal(extractAssistantPreviewDelta({ type: "text_start" }), null);
+		assert.equal(extractAssistantPreviewDelta({ type: "thinking_end" }), null);
+		assert.equal(extractAssistantPreviewDelta(null), null);
+	});
+});
+
 describe("formatCostLine", () => {
 	it("formats cost with token count", () => {
 		const result = formatCostLine(0.0523, 4200, 1100);