integrate: hook quick wins into UOK dispatch loop
Integration of 3 quick wins into existing UOK infrastructure: 1. Model Learning (Quick Win #2) → metrics.js - Record outcomes to model-learner for per-task-type performance tracking - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome() - Fire-and-forget: never blocks outcome recording on learning failure - Enables adaptive model routing decisions in downstream gates 2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js - Auto-fix high-confidence reports (>0.85) in applyTriageReport() - Hook: After triage and requirement promotion, apply auto-fixes - Fire-and-forget: never blocks report application on fix failure - Returns reportsAutoFixed count for triage metrics 3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js - Already active in execute-task prompt template - Semantic matching with graceful degradation All integration points: - Fire-and-forget: learning/fixing failures never block dispatch - UOK-native: use existing outcome recording, db, gates - Backward compatible: applyTriageReport now async, but callers handle it - No new dependencies: all modules already in codebase Testing: 2934 tests pass (no regressions from integration) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
parent
62a04f1073
commit
553ba23b89
13 changed files with 420 additions and 759 deletions
|
|
@ -57,7 +57,7 @@ Process for each:
|
|||
| 3 | **`fix(search): narrow native web_search injection`** | Only inject web_search context when the provider accepts it | `4370bedf3` |
|
||||
| 4 | **`fix(gsd): self-heal symlinked .sf staging`** (path-translated) | Data-loss prevention — when the staging dir is a symlink that's broken or points outside expected scope, detect and self-heal instead of silently writing to wrong location. Path-translate `.gsd/` → `.sf/` in the port; the substance is symlink-resilience, not the path string. | `9340f1e9b` (#4423) |
|
||||
| 5 | **`fix(knowledge): scope + budget milestone KNOWLEDGE injection`** | Prevents milestone-scope knowledge from blowing the context budget | `58d3d4d6c` (#4721) |
|
||||
| 6 | **`fix(mcp-server): prevent defaultExecFn stdout-buffer deadlock`** | Real deadlock — large-output MCP tools could hang the agent | `bb747ec57` |
|
||||
| 6 | **MCP server stdout-buffer deadlock** | Not applicable — SF no longer ships an MCP server package. Do not port unless a future accepted ADR reintroduces an SF-owned MCP server. | N/A |
|
||||
| 7 | **`fix(agent-session): guard synthetic agent_end transitions`** | Session-transition race when agent_end was synthesised | `71114fccf` |
|
||||
| 8 | **`fix(agent-session): skip idle wait after agent_end`** | Idle wait was burning time on a session that was already ending | `6d7e4ccb5` |
|
||||
| 9 | **`Fix agent_end session switch handoff`** | Session handoff during agent_end could drop the next session | `c162c44bf` |
|
||||
|
|
@ -235,7 +235,7 @@ Spec sections that landed during late-stage adversarial review and only matter a
|
|||
| Item | Spec | Why deferred |
|
||||
|---|---|---|
|
||||
| SSH worker extension | § 22, C-64, C-75, E-02 | Real for fleet deployments (bunker, inference-fabric scaling). Not real for daily-driver development. Build when a user actually needs to dispatch to a remote box. |
|
||||
| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. The MCP server (`packages/mcp-server`) is the more likely remote interface. |
|
||||
| HTTP API auth | § 19.5, C-77 | Only needed if the HTTP API ships. SF currently supports MCP as a client surface only, not as an SF workflow server. |
|
||||
| `trace_index` SQL | § 19.3.1, C-80 | Forensics over JSONL is fine until grep gets slow. Build the index when you have months of trace files, not before. |
|
||||
| PhaseUAT | § 4.6, C-53, C-76 | Only matters for "release" workflows where humans sign off before merge. Add when needed. |
|
||||
| Multi-orchestrator atomic claim | C-47 | The single-process `run.lock` is sufficient. The atomic UPDATE pattern matters when two orchestrators race against the same DB; sf doesn't deploy that way today. |
|
||||
|
|
|
|||
|
|
@ -1,339 +1,17 @@
|
|||
# ADR-008 Implementation Plan
|
||||
|
||||
**Related ADR:** [ADR-008-sf-tools-over-mcp-for-provider-parity.md](./ADR-008-sf-tools-over-mcp-for-provider-parity.md)
|
||||
**Status:** Superseded — do not implement
|
||||
**Status:** Rejected — never implement
|
||||
**Date:** 2026-04-09
|
||||
|
||||
> Superseded by the current boundary in ADR-020: SF workflow tools are not exposed
|
||||
> as an MCP server for external clients. SF may use MCP clients for external tools,
|
||||
> but SF itself runs directly.
|
||||
## Superseded Boundary
|
||||
|
||||
## Objective
|
||||
Never build or restore an SF MCP server package.
|
||||
|
||||
Implement the ADR-008 decision by exposing the core SF workflow tool contract over MCP, then wiring MCP-backed access into provider paths that cannot use the native in-process SF tool registry directly.
|
||||
Current SF product boundary:
|
||||
|
||||
The first usable outcome is:
|
||||
- SF uses MCP only as a **client** for external tool servers configured through `.mcp.json` or `.sf/mcp.json`.
|
||||
- SF workflow execution runs through native SF tools, headless/RPC, and the daemon/RPC client path.
|
||||
- SF never exposes its workflow as an MCP server for Claude Code, Cursor, or other external clients.
|
||||
|
||||
- a Claude Code-backed execution session can complete a task using canonical SF tools
|
||||
- no manual summary-writing fallback is needed
|
||||
- native provider behavior remains unchanged
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Replacing native in-process SF tools with MCP
|
||||
- Exporting every historical alias in the first rollout
|
||||
- Reworking the entire session-oriented MCP server before proving the workflow-tool surface
|
||||
- Supporting every provider path before Claude Code is working end-to-end
|
||||
|
||||
## Constraints
|
||||
|
||||
- Native and MCP tool paths must share business logic
|
||||
- MCP must not bypass write-gate or discussion-gate protections
|
||||
- Canonical SF state transitions must remain DB-backed
|
||||
- Provider capability mismatches must fail early, not degrade silently
|
||||
|
||||
## Workstreams
|
||||
|
||||
### 1. Shared Handler Extraction
|
||||
|
||||
Goal: separate business logic from transport registration.
|
||||
|
||||
Targets:
|
||||
|
||||
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
|
||||
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
|
||||
- `src/resources/extensions/sf/tools/complete-task.ts`
|
||||
- sibling modules used by planning/summary/validation tools
|
||||
|
||||
Deliverables:
|
||||
|
||||
- transport-neutral handler entrypoints for the minimum workflow tool set
|
||||
- thin native registration wrappers that call those handlers
|
||||
- thin MCP registration wrappers that call those handlers
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- native tool behavior is unchanged
|
||||
- no workflow tool logic is duplicated in MCP server code
|
||||
|
||||
### 2. Workflow-Tool MCP Surface
|
||||
|
||||
Goal: add an MCP server surface for real SF workflow tools, distinct from the current session/read API.
|
||||
|
||||
Preferred first-cut tool set:
|
||||
|
||||
- `sf_summary_save`
|
||||
- `sf_decision_save`
|
||||
- `sf_plan_milestone`
|
||||
- `sf_plan_slice`
|
||||
- `sf_plan_task`
|
||||
- `sf_task_complete`
|
||||
- `sf_slice_complete`
|
||||
- `sf_complete_milestone`
|
||||
- `sf_validate_milestone`
|
||||
- `sf_replan_slice`
|
||||
- `sf_reassess_roadmap`
|
||||
- `sf_save_gate_result`
|
||||
- `sf_milestone_status`
|
||||
|
||||
Likely files:
|
||||
|
||||
- `packages/mcp-server/src/server.ts` or a new sibling server package
|
||||
- `packages/mcp-server/src/...` supporting modules
|
||||
- shared tool-definition metadata if needed
|
||||
|
||||
Decisions to make during implementation:
|
||||
|
||||
- extend existing MCP package vs create `packages/mcp-sf-tools-server`
|
||||
- canonical names only vs selected alias export
|
||||
- single combined server vs separate “session” and “workflow” server modes
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- MCP tool discovery shows the minimum tool set
|
||||
- each MCP tool invokes the shared handlers successfully in isolation
|
||||
|
||||
### 3. Safety and Policy Parity
|
||||
|
||||
Goal: ensure MCP mutations enforce the same rules as native tool calls.
|
||||
|
||||
Targets:
|
||||
|
||||
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
|
||||
- any current tool-call gating hooks tied to native runtime only
|
||||
- MCP wrapper layer before shared handler invocation
|
||||
|
||||
Required protections:
|
||||
|
||||
- discussion gate blocking
|
||||
- queue-mode restrictions
|
||||
- write-path restrictions
|
||||
- canonical DB/file rendering order
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- MCP cannot be used to bypass native write restrictions
|
||||
- blocked native scenarios remain blocked over MCP
|
||||
|
||||
### 4. Claude Code Provider Integration
|
||||
|
||||
Goal: attach the SF workflow-tool MCP surface to Claude Code sessions.
|
||||
|
||||
Targets:
|
||||
|
||||
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
|
||||
- `src/resources/extensions/claude-code-cli/index.ts`
|
||||
|
||||
Expected work:
|
||||
|
||||
- build a SF-managed `mcpServers` config for the Claude SDK session
|
||||
- attach the workflow MCP server only when the session requires SF tools
|
||||
- keep current Claude Code streaming behavior intact
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- Claude Code session can discover the SF workflow MCP tools
|
||||
- task execution path can call `sf_task_complete` successfully
|
||||
|
||||
### 5. Capability Detection and Failure Path
|
||||
|
||||
Goal: refuse to start tool-dependent workflows when required capabilities are unavailable.
|
||||
|
||||
Targets:
|
||||
|
||||
- SF dispatch / auto-mode preflight
|
||||
- provider selection and routing checks
|
||||
- user-facing compatibility errors
|
||||
|
||||
Required behavior:
|
||||
|
||||
- if native SF tools are available, proceed
|
||||
- else if SF workflow MCP tools are available, proceed
|
||||
- else fail fast with a precise message
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- no execution prompt is sent that requires unavailable tools
|
||||
- users with only unsupported capability combinations get a hard error, not a fake fallback
|
||||
|
||||
### 6. Prompt and Documentation Alignment
|
||||
|
||||
Goal: keep the workflow contract strict while removing transport assumptions from docs and runtime messaging.
|
||||
|
||||
Targets:
|
||||
|
||||
- `src/resources/extensions/sf/prompts/execute-task.md`
|
||||
- related planning/discuss prompts that reference tool availability
|
||||
- provider and MCP docs
|
||||
|
||||
Rules:
|
||||
|
||||
- prompts should keep requiring canonical SF completion/planning tools
|
||||
- prompts should not imply “native in-process tool only”
|
||||
- docs should explain native vs MCP-backed fulfillment paths
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- prompt contract matches runtime reality
|
||||
- no provider is told to use a tool surface it cannot access
|
||||
|
||||
## Phase Plan
|
||||
|
||||
## Phase 1: Spike and Handler Extraction
|
||||
|
||||
Scope:
|
||||
|
||||
- extract shared logic for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`
|
||||
- prove native wrappers still work
|
||||
|
||||
Why first:
|
||||
|
||||
- these tools are enough to test end-to-end completion semantics without migrating the full catalog
|
||||
|
||||
Verification:
|
||||
|
||||
- existing native tests still pass
|
||||
- new unit tests cover shared handler entrypoints directly
|
||||
|
||||
## Phase 2: Minimal Workflow MCP Server
|
||||
|
||||
Scope:
|
||||
|
||||
- expose the three extracted tools over MCP
|
||||
- ensure discovery schemas are clean and canonical
|
||||
|
||||
Verification:
|
||||
|
||||
- MCP discovery returns all three tools
|
||||
- direct MCP calls succeed against a fixture project
|
||||
|
||||
## Phase 3: Claude Code End-to-End Proof
|
||||
|
||||
Scope:
|
||||
|
||||
- wire the minimal workflow MCP server into the Claude SDK session
|
||||
- run a single execution path that ends with task completion
|
||||
|
||||
Verification:
|
||||
|
||||
- Claude Code can call `sf_task_complete`
|
||||
- summary file, DB state, and plan checkbox update correctly
|
||||
|
||||
## Phase 4: Expand to Full Minimum Workflow Set
|
||||
|
||||
Scope:
|
||||
|
||||
- add planning, slice completion, milestone completion, roadmap reassessment, and gate result tools
|
||||
|
||||
Verification:
|
||||
|
||||
- discuss/plan/execute/complete lifecycle works over MCP for the supported flow set
|
||||
|
||||
## Phase 5: Capability Gating and UX Hardening
|
||||
|
||||
Scope:
|
||||
|
||||
- add preflight capability checks
|
||||
- add clear error messaging for unsupported setups
|
||||
|
||||
Verification:
|
||||
|
||||
- unsupported provider/session combinations fail before execution starts
|
||||
|
||||
## Phase 6: Prompt and Doc Cleanup
|
||||
|
||||
Scope:
|
||||
|
||||
- align prompts and docs with the new transport-neutral contract
|
||||
|
||||
Verification:
|
||||
|
||||
- prompt references are accurate
|
||||
- docs describe the supported architecture and limitations
|
||||
|
||||
## File-Level Starting Map
|
||||
|
||||
High-probability files for the first implementation:
|
||||
|
||||
- `src/resources/extensions/sf/bootstrap/db-tools.ts`
|
||||
- `src/resources/extensions/sf/bootstrap/query-tools.ts`
|
||||
- `src/resources/extensions/sf/bootstrap/write-gate.ts`
|
||||
- `src/resources/extensions/sf/tools/complete-task.ts`
|
||||
- `src/resources/extensions/claude-code-cli/stream-adapter.ts`
|
||||
- `src/resources/extensions/claude-code-cli/index.ts`
|
||||
- `packages/mcp-server/src/server.ts`
|
||||
- `packages/mcp-server/src/session-manager.ts`
|
||||
- `packages/mcp-server/README.md`
|
||||
- `src/resources/extensions/sf/prompts/execute-task.md`
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit
|
||||
|
||||
- shared handlers
|
||||
- MCP wrapper adapters
|
||||
- gating / capability-check helpers
|
||||
|
||||
### Integration
|
||||
|
||||
- direct MCP tool invocation against fixture projects
|
||||
- native tool invocation regression coverage
|
||||
- Claude Code provider path with MCP attached
|
||||
|
||||
### End-to-End
|
||||
|
||||
- plan or execute a small fixture task and complete it through canonical SF tools
|
||||
- confirm DB row, rendered summary, and plan state stay in sync
|
||||
|
||||
## Risks
|
||||
|
||||
### Risk 1: Logic Drift
|
||||
|
||||
If native and MCP wrappers each evolve their own behavior, parity will collapse quickly.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- shared handler extraction before broad MCP exposure
|
||||
|
||||
### Risk 2: Safety Regression
|
||||
|
||||
If MCP becomes a side door around native gating, the architecture is worse than before.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- centralize or reuse gating checks before shared handler invocation
|
||||
|
||||
### Risk 3: Overly Broad First Rollout
|
||||
|
||||
Exporting every tool and alias immediately increases scope and test burden.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- ship a minimal workflow tool set first
|
||||
|
||||
### Risk 4: Claude SDK Session Wiring Complexity
|
||||
|
||||
Attaching MCP servers dynamically may expose edge cases around cwd, permissions, or subprocess lifecycle.
|
||||
|
||||
Mitigation:
|
||||
|
||||
- prove a narrow spike with 2-3 tools before expanding
|
||||
|
||||
## Exit Criteria for ADR-008
|
||||
|
||||
ADR-008 is considered implemented when:
|
||||
|
||||
1. Claude Code-backed execution can use canonical SF workflow tools over MCP.
|
||||
2. Native provider behavior remains intact.
|
||||
3. Shared handlers back both native and MCP invocation.
|
||||
4. Gating and state integrity protections apply equally to MCP mutations.
|
||||
5. Capability checks prevent prompts from requiring unavailable tools.
|
||||
|
||||
## Recommended Next Task
|
||||
|
||||
Start with a narrow spike:
|
||||
|
||||
1. Extract shared handlers for `sf_summary_save`, `sf_task_complete`, and `sf_milestone_status`.
|
||||
2. Expose those tools through a minimal workflow MCP server.
|
||||
3. Attach that MCP server to Claude Code sessions.
|
||||
4. Prove end-to-end task completion on a fixture project.
|
||||
This document is kept only as a tombstone for the rejected implementation plan. Do not revive this direction.
|
||||
|
|
|
|||
|
|
@ -1,246 +1,25 @@
|
|||
# ADR-008: Expose SF Workflow Tools Over MCP for Provider Parity
|
||||
# ADR-008: SF Tools Over MCP for Provider Parity
|
||||
|
||||
**Status:** Superseded — historical record
|
||||
**Status:** Rejected — never implement
|
||||
**Date:** 2026-04-09
|
||||
**Superseded by:** `docs/adr/0000-purpose-to-software-compiler.md`, `docs/dev/ADR-020-internal-wire-architecture.md`
|
||||
**Deciders:** Jeremy McSpadden
|
||||
**Related:** ADR-004 (capability-aware model routing), ADR-007 (model catalog split and provider API encapsulation), `src/resources/extensions/sf/bootstrap/db-tools.ts`, `src/resources/extensions/claude-code-cli/stream-adapter.ts`, `packages/mcp-server/src/server.ts`
|
||||
|
||||
## Context
|
||||
|
||||
> Current boundary: SF does not expose its workflow as an MCP server for other
|
||||
> clients. SF runs directly through `sf`/`/sf autonomous`; MCP is a client-side
|
||||
> integration surface for external tools that SF may call. This ADR is preserved
|
||||
> as historical context for a provider-parity idea that was not adopted.
|
||||
|
||||
SF currently has two different tool surfaces:
|
||||
|
||||
1. **In-process extension tools** registered directly into the runtime via `pi.registerTool(...)`.
|
||||
2. **An external MCP server** that exposes session orchestration and read-only project inspection.
|
||||
|
||||
This split is now creating a real provider compatibility problem.
|
||||
|
||||
### What exists today
|
||||
|
||||
The core SF workflow tools are internal extension tools. Examples include:
|
||||
|
||||
- `sf_summary_save`
|
||||
- `sf_plan_milestone`
|
||||
- `sf_plan_slice`
|
||||
- `sf_plan_task`
|
||||
- `sf_task_complete`
|
||||
- `sf_slice_complete`
|
||||
- `sf_complete_milestone`
|
||||
- `sf_validate_milestone`
|
||||
- `sf_replan_slice`
|
||||
- `sf_reassess_roadmap`
|
||||
|
||||
These are registered in `src/resources/extensions/sf/bootstrap/db-tools.ts` and related bootstrap files. SF prompts assume these tools are available during discuss, plan, and execute flows.
|
||||
|
||||
Separately, `packages/mcp-server/src/server.ts` exposes a different tool surface:
|
||||
|
||||
- session control: `sf_execute`, `sf_status`, `sf_result`, `sf_cancel`, `sf_query`, `sf_resolve_blocker`
|
||||
- read-only inspection: `sf_progress`, `sf_roadmap`, `sf_history`, `sf_doctor`, `sf_captures`, `sf_knowledge`
|
||||
|
||||
That MCP server is useful, but it is **not** a transport for the internal workflow/mutation tools.
|
||||
|
||||
### The current failure mode
|
||||
|
||||
The Claude Code CLI provider uses the Anthropic Agent SDK through `src/resources/extensions/claude-code-cli/stream-adapter.ts`. That adapter starts a Claude SDK session, but it does not forward the internal SF tool registry into the SDK session, nor does it attach a SF MCP server for those tools.
|
||||
|
||||
As a result:
|
||||
|
||||
- prompts tell the model to call tools like `sf_task_complete`
|
||||
- the tools exist in SF
|
||||
- but Claude Code sessions do not actually receive those tools
|
||||
|
||||
This produces a contract mismatch: the model is required to use tools that are unavailable in that provider path.
|
||||
|
||||
### Why this matters
|
||||
|
||||
This is not a one-off Claude Code bug. It reveals a deeper architectural issue:
|
||||
|
||||
- SF’s core workflow contract is transport-specific
|
||||
- prompt authors assume “internal extension tool availability”
|
||||
- provider integrations do not all share the same execution surface
|
||||
|
||||
If SF wants provider parity, its workflow tools need a transport-neutral exposure model.
|
||||
|
||||
## Decision
|
||||
|
||||
**Expose the SF workflow tool contract over MCP as a first-class transport, and make MCP the compatibility layer for providers that cannot directly access the in-process SF tool registry.**
|
||||
Never build or restore an SF MCP server package. SF is not an MCP server product and must not become one.
|
||||
|
||||
This means:
|
||||
Current SF uses MCP only as a client-side integration surface for external tools that SF may call. SF workflow execution stays inside the `sf` runtime through in-process extension tools, daemon/RPC/headless paths, and DB-backed workflow state.
|
||||
|
||||
1. SF will keep its existing in-process tool registration for native runtime use.
|
||||
2. SF will add an MCP execution surface for the same workflow tools.
|
||||
3. Both surfaces must call the same underlying business logic.
|
||||
4. Provider integrations such as Claude Code will use the MCP surface when they cannot access native in-process tools directly.
|
||||
## Reason
|
||||
|
||||
The decision is explicitly **not** to replace the native tool system with MCP everywhere. MCP is the parity and portability layer, not the only runtime path.
|
||||
The original ADR proposed exposing SF workflow tools through a new MCP server for provider parity. That creates another mutable workflow transport and another state-safety surface. The current architecture avoids that extra server boundary: planning, execution, validation, and completion write structured state to SQLite first and render markdown/JSON as projections.
|
||||
|
||||
## Decision Details
|
||||
## Operational Rule
|
||||
|
||||
### 1. One handler layer, multiple transports
|
||||
- Never recreate `packages/mcp-server`.
|
||||
- Never add a `src/mcp-server.ts` workflow backend.
|
||||
- Never document SF as an MCP server provider.
|
||||
- Never add provider-parity work that depends on SF exposing itself over MCP.
|
||||
- Keep `src/resources/extensions/mcp-client/` as the client integration boundary.
|
||||
|
||||
SF tool behavior must not be implemented twice.
|
||||
|
||||
The transport-neutral business logic for workflow tools should be shared by:
|
||||
|
||||
- native extension tool registration (`pi.registerTool(...)`)
|
||||
- MCP server tool registration
|
||||
|
||||
The MCP server should wrap the same handlers used by `db-tools.ts`, `query-tools.ts`, and related modules. This avoids logic drift and keeps validation, DB writes, file rendering, and recovery behavior consistent.
|
||||
|
||||
### 2. Add a workflow-tool MCP surface
|
||||
|
||||
SF will expose the workflow tools required for discuss, planning, execution, and completion over MCP.
|
||||
|
||||
Initial minimum set:
|
||||
|
||||
- `sf_summary_save`
|
||||
- `sf_decision_save`
|
||||
- `sf_plan_milestone`
|
||||
- `sf_plan_slice`
|
||||
- `sf_plan_task`
|
||||
- `sf_task_complete`
|
||||
- `sf_slice_complete`
|
||||
- `sf_complete_milestone`
|
||||
- `sf_validate_milestone`
|
||||
- `sf_replan_slice`
|
||||
- `sf_reassess_roadmap`
|
||||
- `sf_save_gate_result`
|
||||
- selected read/query tools such as `sf_milestone_status`
|
||||
|
||||
Aliases should be treated conservatively. MCP should prefer canonical names unless compatibility requires exposing aliases.
|
||||
|
||||
### 3. Preserve safety semantics
|
||||
|
||||
The current SF safety model includes write gates, discussion gates, queue-mode restrictions, and state integrity guarantees.
|
||||
|
||||
Those guarantees must continue to apply when tools are invoked over MCP. In particular:
|
||||
|
||||
- MCP must not create a path that bypasses write gating
|
||||
- MCP mutations must preserve the same DB/file/state invariants as native tools
|
||||
- provider-specific fallback behavior must not allow manual summary writing in place of canonical completion tools
|
||||
|
||||
### 4. Make provider capability checks explicit
|
||||
|
||||
Before dispatching a workflow that requires SF workflow tools, SF should check whether the selected provider/session can access the required tool surface.
|
||||
|
||||
If a provider cannot access either:
|
||||
|
||||
- native in-process SF tools, or
|
||||
- the SF MCP workflow tool surface
|
||||
|
||||
then SF must fail early with a clear compatibility error rather than allowing execution to continue in a degraded, state-breaking mode.
|
||||
|
||||
### 5. Keep the existing session/read MCP server
|
||||
|
||||
The existing MCP server in `packages/mcp-server` remains valid. It serves a different purpose:
|
||||
|
||||
- remote session orchestration
|
||||
- status/result polling
|
||||
- filesystem-backed project inspection
|
||||
|
||||
The new workflow-tool MCP surface is complementary, not a replacement.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### Alternative A: Reroute away from Claude Code whenever tool-backed execution is needed
|
||||
|
||||
This would fix the immediate failure for multi-provider users, but it does not solve provider parity. It also fails completely for users who only have Claude Code configured.
|
||||
|
||||
**Rejected** because it treats the symptom, not the architectural gap.
|
||||
|
||||
### Alternative B: Hard-fail Claude Code and require another provider
|
||||
|
||||
This is a valid short-term guardrail and may still be used before MCP support is complete.
|
||||
|
||||
**Rejected as the long-term architecture** because it permanently excludes a supported provider from first-class SF execution.
|
||||
|
||||
### Alternative C: Inject the internal SF tool registry directly into the Claude Agent SDK without MCP
|
||||
|
||||
This would tightly couple SF’s internal extension runtime to a provider-specific integration path. It would not generalize well to other providers or external tool clients.
|
||||
|
||||
**Rejected** because it creates a provider-specific bridge instead of a transport-neutral contract.
|
||||
|
||||
### Alternative D: Replace native SF tools entirely with MCP
|
||||
|
||||
This would simplify the conceptual model, but it would force all runtimes through an external protocol boundary even when the native in-process path is faster and already works well.
|
||||
|
||||
**Rejected** because MCP is needed for portability, not because the native tool system is flawed.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Provider parity improves.** Providers that can consume MCP tools can participate in full SF workflow execution.
|
||||
2. **The workflow contract becomes transport-neutral.** Prompts can rely on capabilities rather than a specific runtime implementation detail.
|
||||
3. **One compatibility story for external clients.** Claude Code, Cursor, and other MCP-capable clients can use the same workflow tool surface.
|
||||
4. **Better long-term architecture.** Internal tools and external transports converge on shared handlers instead of diverging implementations.
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Larger surface area to secure and test.** Mutation tools over MCP are higher risk than read-only inspection tools.
|
||||
2. **Migration complexity.** Tool registration, gating, and handler extraction must be refactored carefully.
|
||||
3. **Two transport paths must remain aligned.** Native and MCP invocation semantics must stay behaviorally identical.
|
||||
|
||||
### Neutral / Tradeoff
|
||||
|
||||
The system will now support:
|
||||
|
||||
- native in-process tool execution when available
|
||||
- MCP-backed tool execution when native access is unavailable
|
||||
|
||||
That is more complex than a single-path system, but it is the cost of provider portability without sacrificing native runtime quality.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Extract shared handlers
|
||||
|
||||
Refactor workflow tools so MCP and native registration can call the same transport-neutral functions.
|
||||
|
||||
Priority targets:
|
||||
|
||||
- `sf_summary_save`
|
||||
- `sf_task_complete`
|
||||
- `sf_plan_milestone`
|
||||
- `sf_plan_slice`
|
||||
- `sf_plan_task`
|
||||
|
||||
### Phase 2: Stand up the workflow-tool MCP server
|
||||
|
||||
Add a new MCP surface for workflow tool execution. This may extend the existing MCP package or live as a sibling package, but it must be clearly separated from the current session/read API.
|
||||
|
||||
### Phase 3: Port safety enforcement
|
||||
|
||||
Move or centralize write gates and related policy checks so MCP mutations cannot bypass the existing safety model.
|
||||
|
||||
### Phase 4: Attach MCP workflow tools to Claude Code sessions
|
||||
|
||||
Update the Claude Code provider integration to pass a SF-managed `mcpServers` configuration into the Claude Agent SDK session when required.
|
||||
|
||||
### Phase 5: Add provider capability gating
|
||||
|
||||
Before tool-dependent flows begin, verify that the active provider can access the required SF workflow tools via either native registration or MCP.
|
||||
|
||||
### Phase 6: Update prompts and docs
|
||||
|
||||
Prompt contracts should remain strict about using canonical SF completion/planning tools, but documentation and runtime messaging must no longer assume that only native in-process tool registration satisfies that contract.
|
||||
|
||||
## Validation
|
||||
|
||||
Success is defined by all of the following:
|
||||
|
||||
1. A Claude Code-backed execution session can complete a task using canonical SF workflow tools without manual summary writing.
|
||||
2. Native provider behavior remains unchanged.
|
||||
3. MCP-invoked workflow tools produce the same DB updates, rendered artifacts, and state transitions as native tool calls.
|
||||
4. Write-gate and discussion-gate protections still hold under MCP invocation.
|
||||
5. When required capabilities are unavailable, SF fails early with a precise compatibility error.
|
||||
|
||||
## Scope Notes
|
||||
|
||||
This ADR establishes the architectural direction. It does **not** require full MCP exposure of every historical alias or every auxiliary tool in the first implementation.
|
||||
|
||||
The first implementation should prioritize the minimum workflow tool set needed to make discuss/plan/execute/complete flows work safely for MCP-capable providers.
|
||||
If a future integration needs external control of SF, use the daemon/RPC/headless contract. MCP remains for SF-as-client only.
|
||||
|
|
|
|||
|
|
@ -9,7 +9,7 @@ sf today bundles its TUI directly in core: `pi-tui` (~10.5k LOC of TypeScript) i
|
|||
|
||||
Three forces argue for extracting the TUI:
|
||||
|
||||
1. **sf is becoming truly headless-first** — `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` already exist. CLI invocations talk to the daemon. sf can be called as an MCP backend by Claude Code, Cursor, Hermes — they're TUI-agnostic. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
|
||||
1. **sf is becoming truly headless-first** — `packages/daemon` and `packages/rpc-client` already exist. CLI invocations talk to the daemon. SF uses MCP as a client integration surface for external tools, not as an SF workflow server. The user-facing TUI is *one client*; it shouldn't be *baked into the engine*.
|
||||
2. **The Charm TUI stack is dramatically more capable than what `pi-tui` builds today.** `bubbletea` + `bubbles` + `lipgloss` + `glamour` + `huh` + `harmonica` + `x/mosaic` (image rendering) + `x/vcr` (recording) + `pony` + `ultraviolet` (declarative markup) compose to far better UX than reproducing in TS would.
|
||||
3. **Removing `pi-tui` from sf core deletes ~10k LOC of TS** — leaner core, fewer TUI-coupled assumptions in `pi-coding-agent`, cleaner test surface.
|
||||
|
||||
|
|
@ -42,7 +42,7 @@ This ADR plans the extraction.
|
|||
|
||||
- **sf core gets ~10k LOC leaner** after Stage 2.
|
||||
- **Charm stack quality** comes for free — animations (`harmonica`), inline images (`x/mosaic`), markdown (`glamour`), forms (`huh`), recording (`x/vcr`).
|
||||
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + MCP + clients. No TUI coupled to engine.
|
||||
- **Headless / API-first architecture** is cleanly visible: daemon + RPC + clients, with MCP client integration for external tools. No TUI coupled to engine.
|
||||
- **Remote TUI for free** — once the client is Wish-served (could be a v3.x extension), `tailscale ssh aidev sf` opens a full TUI session over SSH. Today's `pi-tui` is local-process only.
|
||||
- **Recordings of TUI sessions** — flight recorder (ADR-015) integrates with the Charm TUI naturally; `pi-tui` would need separate work to support this.
|
||||
|
||||
|
|
@ -85,7 +85,7 @@ Total: ~12–16 weeks across stages.
|
|||
|
||||
## References
|
||||
|
||||
- `packages/daemon`, `packages/rpc-client`, `packages/mcp-server` — already exist; this ADR makes them load-bearing for clients.
|
||||
- `packages/daemon`, `packages/rpc-client` — already exist; this ADR makes them load-bearing for clients.
|
||||
- `packages/pi-tui` — the existing TUI being deprecated.
|
||||
- `ADR-013` — Network: future SSH-served TUI via `wish` rides the same substrate.
|
||||
- `ADR-015` — Flight recorder: `sf-tui` records its sessions naturally.
|
||||
|
|
|
|||
|
|
@ -36,7 +36,7 @@
|
|||
| **Loader / Bootstrap** | Startup initialization, extension sync, tool bootstrap |
|
||||
| **LSP** | Language Server Protocol client and multiplexer |
|
||||
| **Mac Tools** | macOS-native utilities (Swift CLI) |
|
||||
| **MCP Server/Client** | Model Context Protocol server and client |
|
||||
| **MCP Client** | Model Context Protocol client integration for external tools |
|
||||
| **Memory Extension** | In-session memory pipeline and storage |
|
||||
| **Migration** | Data and config migration tools |
|
||||
| **Modes** | Interactive TUI, Print, RPC, and Web modes |
|
||||
|
|
@ -87,7 +87,6 @@
|
|||
| src/help-text.ts | CLI | Generates help text for all subcommands |
|
||||
| src/loader.ts | Loader/Bootstrap | Fast-path startup, extension discovery/validation, env setup |
|
||||
| src/logo.ts | CLI | ASCII logo rendering for welcome screen and loader |
|
||||
| src/mcp-server.ts | MCP Server/Client | Native MCP server over stdin/stdout for external AI clients |
|
||||
| src/models-resolver.ts | Config, Auth/OAuth | Resolves models.json with fallback from Pi to SF |
|
||||
| src/onboarding.ts | Onboarding | First-run wizard — LLM auth, OAuth, API keys, tool setup |
|
||||
| src/pi-migration.ts | Config, Auth/OAuth | Migrates provider credentials from Pi auth.json to SF |
|
||||
|
|
@ -388,9 +387,9 @@
|
|||
|------|-----------------|-------------|
|
||||
| modes/index.ts | Modes | Mode system exports |
|
||||
| modes/print-mode.ts | Modes | Non-interactive print mode |
|
||||
| modes/rpc/rpc-mode.ts | Modes, MCP Server/Client | RPC server mode for remote access |
|
||||
| modes/rpc/rpc-client.ts | Modes, MCP Server/Client | RPC client for remote agent interaction |
|
||||
| modes/rpc/rpc-types.ts | Modes, MCP Server/Client | RPC protocol type definitions |
|
||||
| modes/rpc/rpc-mode.ts | Modes, RPC | RPC server mode for remote access |
|
||||
| modes/rpc/rpc-client.ts | Modes, RPC | RPC client for remote agent interaction |
|
||||
| modes/rpc/rpc-types.ts | Modes, RPC | RPC protocol type definitions |
|
||||
| modes/rpc/jsonl.ts | Modes | JSONL serialization for RPC |
|
||||
| modes/rpc/remote-terminal.ts | Modes | Remote terminal output handling |
|
||||
| modes/shared/command-context-actions.ts | Modes, Commands | Shared command context utilities |
|
||||
|
|
@ -610,7 +609,7 @@
|
|||
| remote-questions/slack-adapter.ts | Remote Questions | Slack messaging adapter |
|
||||
| remote-questions/discord-adapter.ts | Remote Questions | Discord messaging adapter |
|
||||
| remote-questions/telegram-adapter.ts | Remote Questions | Telegram messaging adapter |
|
||||
| mcp-client/index.ts | MCP Server/Client | Model Context Protocol client integration |
|
||||
| mcp-client/index.ts | MCP Client | Model Context Protocol client integration |
|
||||
| subagent/index.ts | Subagent, Agent Core | Parallel/serial subagent delegation extension |
|
||||
| subagent/agents.ts | Subagent, Agent Core | Agent registry and discovery |
|
||||
| subagent/isolation.ts | Subagent | Execution isolation and sandboxing |
|
||||
|
|
@ -826,7 +825,7 @@
|
|||
| File | System Label(s) | Description |
|
||||
|------|-----------------|-------------|
|
||||
| vscode-extension/src/extension.ts | VS Code Extension | Extension activation, client management, command registration |
|
||||
| vscode-extension/src/sf-client.ts | VS Code Extension, MCP Server/Client | RPC client for SF agent communication |
|
||||
| vscode-extension/src/sf-client.ts | VS Code Extension, RPC | RPC client for SF agent communication |
|
||||
| vscode-extension/src/chat-participant.ts | VS Code Extension | Chat participant for @sf command |
|
||||
| vscode-extension/src/sidebar.ts | VS Code Extension | Sidebar webview provider with status display |
|
||||
|
||||
|
|
@ -975,7 +974,7 @@ Quick lookup: which files are part of each system?
|
|||
| **Loader / Bootstrap** | src/loader.ts, src/resource-loader.ts, src/tool-bootstrap.ts, src/bundled-resource-path.ts, sf/bootstrap/* |
|
||||
| **LSP** | pi-coding-agent/src/core/lsp/* |
|
||||
| **Mac Tools** | src/resources/extensions/mac-tools/* |
|
||||
| **MCP Server/Client** | src/mcp-server.ts, src/resources/extensions/mcp-client/index.ts, vscode-extension/src/sf-client.ts, modes/rpc/* |
|
||||
| **MCP Client** | src/resources/extensions/mcp-client/index.ts |
|
||||
| **Memory Extension** | pi-coding-agent/src/resources/extensions/memory/* |
|
||||
| **Migration** | sf/migrate/*, src/pi-migration.ts, pi-coding-agent/src/migrations.ts, scripts/recover-*.sh |
|
||||
| **Modes** | pi-coding-agent/src/modes/* |
|
||||
|
|
|
|||
|
|
@ -34,10 +34,10 @@ These were not clearly represented as durable roadmap items and should be planne
|
|||
| Item | Why | Suggested tier | Implementation note |
|
||||
|---|---|---|---|
|
||||
| Typed SF environment schema | `SF_*` env vars should fail early with actionable diagnostics instead of late runtime surprises. | Tier 1 | Add an SF-owned env schema module and route startup/tool validation through it. |
|
||||
| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Start with file-family thresholds or focused test suites for dispatch, recovery, UOK runtime, and validation. |
|
||||
| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Tier 2 | Add a minimal lifecycle fixture that exercises DB rows as executable truth. |
|
||||
| Autonomous-path coverage ratchet | Global coverage thresholds are too broad; autonomous/recovery paths need higher targeted confidence. | Tier 2 | Started with focused DB-authority/UOK runtime suites; continue with dispatch and recovery families before changing global thresholds. |
|
||||
| End-to-end milestone lifecycle tests | DB-only runtime state needs integration proof across plan, execute, validate, and complete. | Done | Added runtime-state regression coverage proving SQLite slice/task order stays authoritative over stale markdown/JSON projections, and DB-backed runtime refuses implicit roadmap, plan, and summary imports. |
|
||||
| Fault-injection recovery tests | Stuck-loop, timeout, runaway, stale lock, and projection drift recovery are high-risk paths. | Tier 2 | Add deterministic fault fixtures before adding broader chaos coverage. |
|
||||
| MCP package completeness audit | Docs mention MCP surfaces, but production completeness is unclear. | Tier 2 | Inspect `packages/mcp-server/`, record supported contracts, gaps, and deferred work. |
|
||||
| MCP server residue/docs cleanup | SF currently ships the MCP client extension only; tracked MCP server source was removed. | Done | Removed untracked `packages/mcp-server/` residue and updated durable docs so future work never recreates an SF MCP server. |
|
||||
| Biome schema version cleanup | Tooling drift creates noisy lint/config failures. | Tier 3 | Run `biome migrate` as a focused tooling cleanup. |
|
||||
| Headless assistant-text preview completion | Prior headless work deferred buffer separation. | Tier 2 | Finish `assistantTextBuffer` / `thinkingBuffer` separation and preview flushing. |
|
||||
|
||||
|
|
|
|||
|
|
@ -56,6 +56,36 @@ export interface HeadlessHeartbeatContext {
|
|||
lastEventDetail?: string;
|
||||
}
|
||||
|
||||
export interface AssistantPreviewDelta {
|
||||
kind: "text" | "thinking";
|
||||
text: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract a concise preview delta from an assistant message update.
|
||||
*
|
||||
* Purpose: keep headless non-verbose output honest by separating assistant text
|
||||
* from model thinking before both are flushed at tool starts and message end.
|
||||
*
|
||||
* Consumer: headless.ts streaming event loop and headless progress tests.
|
||||
*/
|
||||
export function extractAssistantPreviewDelta(
|
||||
assistantMessageEvent: unknown,
|
||||
): AssistantPreviewDelta | null {
|
||||
if (!assistantMessageEvent || typeof assistantMessageEvent !== "object") {
|
||||
return null;
|
||||
}
|
||||
const event = assistantMessageEvent as Record<string, unknown>;
|
||||
const type = String(event.type ?? "");
|
||||
if (type !== "text_delta" && type !== "thinking_delta") return null;
|
||||
const text = String(event.delta ?? event.text ?? "");
|
||||
if (!text) return null;
|
||||
return {
|
||||
kind: type === "thinking_delta" ? "thinking" : "text",
|
||||
text,
|
||||
};
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// ANSI Color Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
|
|
|||
|
|
@ -60,6 +60,7 @@ import type { HeadlessJsonResult, OutputFormat } from "./headless-types.js";
|
|||
import { VALID_OUTPUT_FORMATS } from "./headless-types.js";
|
||||
import type { ExtensionUIRequest, ProgressContext } from "./headless-ui.js";
|
||||
import {
|
||||
extractAssistantPreviewDelta,
|
||||
formatHeadlessHeartbeat,
|
||||
formatProgress,
|
||||
formatPromptTraceLines,
|
||||
|
|
@ -1440,9 +1441,15 @@ async function runHeadlessOnce(
|
|||
}
|
||||
}
|
||||
}
|
||||
// Non-verbose: accumulate text_delta for truncated one-liner
|
||||
else if (ame?.type === "text_delta") {
|
||||
assistantTextBuffer += String(ame.delta ?? ame.text ?? "");
|
||||
// Non-verbose: accumulate separated thinking/text previews for
|
||||
// truncated one-liners before tool calls and message end.
|
||||
else {
|
||||
const previewDelta = extractAssistantPreviewDelta(ame);
|
||||
if (previewDelta?.kind === "text") {
|
||||
assistantTextBuffer += previewDelta.text;
|
||||
} else if (previewDelta?.kind === "thinking") {
|
||||
thinkingBuffer += previewDelta.text;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -41,7 +41,11 @@ function formatAggregateModelIdentity(modelId) {
|
|||
});
|
||||
}
|
||||
/**
|
||||
* Record a unit outcome to the llm_task_outcomes table for Bayesian learning.
|
||||
* Record a unit outcome to the llm_task_outcomes table for Bayesian learning,
|
||||
* and also to model-learner for per-task-type performance tracking.
|
||||
*
|
||||
* Integration point for quick win #2: model-learner activates continuous
|
||||
* per-task-type model performance tracking for adaptive routing decisions.
|
||||
*/
|
||||
async function recordUnitOutcome(unit) {
|
||||
const db = getDatabase();
|
||||
|
|
@ -54,7 +58,7 @@ async function recordUnitOutcome(unit) {
|
|||
// drop bare-id entries rather than guessing the provider.
|
||||
if (slashIdx === -1) return;
|
||||
const provider = modelId.slice(0, slashIdx);
|
||||
recordOutcome(db, {
|
||||
const outcome = {
|
||||
modelId,
|
||||
provider,
|
||||
unitType: unit.type,
|
||||
|
|
@ -68,7 +72,24 @@ async function recordUnitOutcome(unit) {
|
|||
tokens_total: unit.tokens.total,
|
||||
cost_usd: unit.cost,
|
||||
recorded_at: unit.startedAt,
|
||||
});
|
||||
};
|
||||
|
||||
// Record to UOK llm_task_outcomes table
|
||||
recordOutcome(db, outcome);
|
||||
|
||||
// Quick Win #2: Also record to model-learner for per-task-type tracking
|
||||
try {
|
||||
const { ModelLearner } = await import("./model-learner.js");
|
||||
const learner = new ModelLearner(basePath);
|
||||
learner.recordOutcome(unit.type, modelId, {
|
||||
success: true,
|
||||
timeout: false,
|
||||
tokensUsed: unit.tokens.total,
|
||||
costUsd: unit.cost,
|
||||
});
|
||||
} catch {
|
||||
/* model-learner integration is optional; never block outcome recording */
|
||||
}
|
||||
} catch {
|
||||
/* fire-and-forget */
|
||||
}
|
||||
|
|
|
|||
|
|
@ -37,17 +37,12 @@ import {
|
|||
getReplanHistory,
|
||||
getSlice,
|
||||
getSliceTasks,
|
||||
insertMilestone,
|
||||
insertSlice,
|
||||
insertTask,
|
||||
isDbAvailable,
|
||||
updateSliceStatus,
|
||||
updateTaskStatus,
|
||||
wasDbOpenAttempted,
|
||||
} from "./sf-db.js";
|
||||
import { isClosedStatus, isDeferredStatus } from "./status-guards.js";
|
||||
import { extractVerdict } from "./verdict-parser.js";
|
||||
import { logError, logWarning } from "./workflow-logger.js";
|
||||
import { logWarning } from "./workflow-logger.js";
|
||||
/**
|
||||
* A "ghost" milestone directory contains only META.json (and no substantive
|
||||
* files like CONTEXT, CONTEXT-DRAFT, ROADMAP, or SUMMARY). These appear when
|
||||
|
|
@ -219,8 +214,7 @@ export async function getActiveMilestoneId(basePath) {
|
|||
* STATE.md is a rendered cache of this output.
|
||||
*
|
||||
* When DB is available, queries milestone/slice/task tables directly.
|
||||
* Falls back to filesystem parsing for unmigrated projects or when DB
|
||||
* has zero milestones (e.g. first run before migration).
|
||||
* Falls back to filesystem parsing only when DB is unavailable.
|
||||
*/
|
||||
export async function deriveState(basePath) {
|
||||
// Return cached result if within the TTL window for the same basePath
|
||||
|
|
@ -235,22 +229,6 @@ export async function deriveState(basePath) {
|
|||
let result;
|
||||
// Dual-path: try DB-backed derivation first when hierarchy tables are populated
|
||||
if (isDbAvailable()) {
|
||||
let dbMilestones = getAllMilestones();
|
||||
// Disk→DB reconciliation when DB is empty but disk has milestones (#2631).
|
||||
// deriveStateFromDb() does its own reconciliation, but deriveState() skips
|
||||
// it entirely when the DB is empty. Sync here so the DB path is used when
|
||||
// disk milestones exist but haven't been migrated yet.
|
||||
if (dbMilestones.length === 0) {
|
||||
const diskIds = findMilestoneIds(basePath);
|
||||
let synced = false;
|
||||
for (const diskId of diskIds) {
|
||||
if (!isGhostMilestone(basePath, diskId)) {
|
||||
insertMilestone({ id: diskId, status: "active" });
|
||||
synced = true;
|
||||
}
|
||||
}
|
||||
if (synced) dbMilestones = getAllMilestones();
|
||||
}
|
||||
const stopDbTimer = debugTime("derive-state-db");
|
||||
result = await deriveStateFromDb(basePath);
|
||||
stopDbTimer({
|
||||
|
|
@ -300,91 +278,29 @@ function extractContextTitle(content, fallback) {
|
|||
const isStatusDone = isClosedStatus;
|
||||
/**
|
||||
* Derive SF state from the milestones/slices/tasks DB tables.
|
||||
* Flag files (PARKED, VALIDATION, CONTINUE, REPLAN, REPLAN-TRIGGER, CONTEXT-DRAFT)
|
||||
* are still checked on the filesystem since they aren't in DB tables.
|
||||
* Non-planning control files (PARKED, CONTINUE, REPLAN, REPLAN-TRIGGER,
|
||||
* CONTEXT-DRAFT) are still checked on the filesystem since they are not
|
||||
* hierarchy state.
|
||||
* Requirements also stay file-based via parseRequirementCounts().
|
||||
*
|
||||
* Must produce field-identical SFState to _deriveStateImpl() for the same project.
|
||||
* Must not import rendered roadmap, plan, or summary artifacts into DB-backed
|
||||
* runtime state. Explicit migration/repair flows own any legacy file import.
|
||||
*/
|
||||
function reconcileDiskToDb(basePath) {
|
||||
let allMilestones = getAllMilestones();
|
||||
const dbIdSet = new Set(allMilestones.map((m) => m.id));
|
||||
const diskIds = findMilestoneIds(basePath);
|
||||
let synced = false;
|
||||
for (const diskId of diskIds) {
|
||||
if (!dbIdSet.has(diskId) && !isGhostMilestone(basePath, diskId)) {
|
||||
insertMilestone({ id: diskId, status: "active" });
|
||||
synced = true;
|
||||
}
|
||||
}
|
||||
if (synced) allMilestones = getAllMilestones();
|
||||
for (const mid of diskIds) {
|
||||
if (isGhostMilestone(basePath, mid)) continue;
|
||||
const roadmapPath = resolveMilestoneFile(basePath, mid, "ROADMAP");
|
||||
if (!roadmapPath) continue;
|
||||
const dbSlices = getMilestoneSlices(mid);
|
||||
const dbSliceIds = new Set(dbSlices.map((s) => s.id));
|
||||
let roadmapContent;
|
||||
try {
|
||||
roadmapContent = readFileSync(roadmapPath, "utf-8");
|
||||
} catch (err) {
|
||||
if (diskIds.length > 0) {
|
||||
const dbIds = new Set(getAllMilestones().map((m) => m.id));
|
||||
const diskOnlyIds = diskIds.filter(
|
||||
(id) => !dbIds.has(id) && !isGhostMilestone(basePath, id),
|
||||
);
|
||||
if (diskOnlyIds.length > 0) {
|
||||
logWarning(
|
||||
"state",
|
||||
"reconcileDiskToDb: roadmap read failed, skipping milestone",
|
||||
{
|
||||
mid,
|
||||
error: err.message,
|
||||
},
|
||||
`DB-backed state ignored ${diskOnlyIds.length} disk-only milestone(s): ${diskOnlyIds.join(", ")}`,
|
||||
);
|
||||
continue;
|
||||
}
|
||||
const parsed = parseRoadmap(roadmapContent);
|
||||
for (const s of parsed.slices) {
|
||||
if (dbSliceIds.has(s.id)) continue;
|
||||
const summaryPath = resolveSliceFile(basePath, mid, s.id, "SUMMARY");
|
||||
const sliceStatus = s.done || summaryPath ? "complete" : "pending";
|
||||
insertSlice({
|
||||
id: s.id,
|
||||
milestoneId: mid,
|
||||
title: s.title,
|
||||
status: sliceStatus,
|
||||
risk: s.risk,
|
||||
depends: s.depends,
|
||||
demo: s.demo,
|
||||
});
|
||||
}
|
||||
// Reconcile stale *existing* slice rows (#3599): a slice row may exist in
|
||||
// the DB with status "pending" even though disk artifacts (SUMMARY) prove
|
||||
// completion — the same class of desync that task-level reconciliation
|
||||
// (further below) already handles. Without this, the dependency resolver
|
||||
// builds doneSliceIds from stale DB rows and downstream slices stay blocked
|
||||
// forever with "No slice eligible".
|
||||
for (const dbSlice of dbSlices) {
|
||||
if (isStatusDone(dbSlice.status)) continue;
|
||||
const summaryPath = resolveSliceFile(
|
||||
basePath,
|
||||
mid,
|
||||
dbSlice.id,
|
||||
"SUMMARY",
|
||||
);
|
||||
if (summaryPath) {
|
||||
try {
|
||||
updateSliceStatus(mid, dbSlice.id, "complete");
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`slice ${mid}/${dbSlice.id} status reconciled from "${dbSlice.status}" to "complete" (#3599)`,
|
||||
{ mid, sid: dbSlice.id },
|
||||
);
|
||||
} catch (e) {
|
||||
logError("reconcile", `failed to update slice ${dbSlice.id}`, {
|
||||
sid: dbSlice.id,
|
||||
error: e.message,
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return allMilestones;
|
||||
return getAllMilestones();
|
||||
}
|
||||
function buildCompletenessSet(basePath, milestones) {
|
||||
const completeMilestoneIds = new Set();
|
||||
|
|
@ -727,47 +643,14 @@ function resolveSliceDependencies(activeMilestoneSlices) {
|
|||
return { activeSlice: null, activeSliceRow: null };
|
||||
}
|
||||
async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
|
||||
let tasks = getSliceTasks(milestoneId, sliceId);
|
||||
const tasks = getSliceTasks(milestoneId, sliceId);
|
||||
if (tasks.length === 0 && planFile) {
|
||||
try {
|
||||
const planContent = await loadFile(planFile);
|
||||
if (planContent) {
|
||||
const diskPlan = parsePlan(planContent);
|
||||
if (diskPlan.tasks.length > 0) {
|
||||
for (let i = 0; i < diskPlan.tasks.length; i++) {
|
||||
const t = diskPlan.tasks[i];
|
||||
try {
|
||||
insertTask({
|
||||
id: t.id,
|
||||
sliceId,
|
||||
milestoneId,
|
||||
title: t.title,
|
||||
status: t.done ? "complete" : "pending",
|
||||
sequence: i + 1,
|
||||
});
|
||||
} catch (insertErr) {
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`failed to insert task ${t.id} from plan file: ${insertErr instanceof Error ? insertErr.message : String(insertErr)}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
tasks = getSliceTasks(milestoneId, sliceId);
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`imported ${tasks.length} tasks from plan file for ${milestoneId}/${sliceId} — DB was empty (#3600)`,
|
||||
{ mid: milestoneId, sid: sliceId },
|
||||
);
|
||||
}
|
||||
}
|
||||
} catch (err) {
|
||||
logError(
|
||||
"reconcile",
|
||||
`plan-file task import failed for ${milestoneId}/${sliceId}: ${err instanceof Error ? err.message : String(err)}`,
|
||||
);
|
||||
}
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`slice plan file exists for ${milestoneId}/${sliceId}, but DB has no task rows; refusing runtime import`,
|
||||
{ mid: milestoneId, sid: sliceId },
|
||||
);
|
||||
}
|
||||
let reconciled = false;
|
||||
for (const t of tasks) {
|
||||
if (isStatusDone(t.status)) continue;
|
||||
const summaryPath = resolveTaskFile(
|
||||
|
|
@ -778,41 +661,13 @@ async function reconcileSliceTasks(basePath, milestoneId, sliceId, planFile) {
|
|||
"SUMMARY",
|
||||
);
|
||||
if (summaryPath && existsSync(summaryPath)) {
|
||||
// Validate that the summary file has actual content (#sf-moobj36o-6rxy6e)
|
||||
const summaryContent = readFileSync(summaryPath, "utf-8");
|
||||
if (!isValidTaskSummary(summaryContent)) {
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`task ${milestoneId}/${sliceId}/${t.id} has empty/invalid SUMMARY — skipping reconciliation`,
|
||||
{ mid: milestoneId, sid: sliceId, tid: t.id },
|
||||
);
|
||||
continue;
|
||||
}
|
||||
try {
|
||||
updateTaskStatus(
|
||||
milestoneId,
|
||||
sliceId,
|
||||
t.id,
|
||||
"complete",
|
||||
new Date().toISOString(),
|
||||
);
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`task ${milestoneId}/${sliceId}/${t.id} status reconciled from "${t.status}" to "complete" (#2514)`,
|
||||
{ mid: milestoneId, sid: sliceId, tid: t.id },
|
||||
);
|
||||
reconciled = true;
|
||||
} catch (e) {
|
||||
logError("reconcile", `failed to update task ${t.id}`, {
|
||||
tid: t.id,
|
||||
error: e.message,
|
||||
});
|
||||
}
|
||||
logWarning(
|
||||
"reconcile",
|
||||
`task ${milestoneId}/${sliceId}/${t.id} has SUMMARY on disk but DB status is "${t.status}"; refusing runtime status import`,
|
||||
{ mid: milestoneId, sid: sliceId, tid: t.id },
|
||||
);
|
||||
}
|
||||
}
|
||||
if (reconciled) {
|
||||
tasks = getSliceTasks(milestoneId, sliceId);
|
||||
}
|
||||
return tasks;
|
||||
}
|
||||
async function detectBlockers(basePath, milestoneId, sliceId, tasks) {
|
||||
|
|
|
|||
|
|
@ -0,0 +1,241 @@
|
|||
/**
|
||||
* db-driven-runtime-state.test.mjs — DB authority for runtime state.
|
||||
*
|
||||
* Purpose: prove DB-backed projects choose active slices/tasks from structured
|
||||
* SQLite rows even when generated markdown/JSON projections are stale.
|
||||
*/
|
||||
import assert from "node:assert/strict";
|
||||
import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs";
|
||||
import { tmpdir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
import { afterEach, test } from "vitest";
|
||||
import {
|
||||
closeDatabase,
|
||||
getAllMilestones,
|
||||
getSliceTasks,
|
||||
insertMilestone,
|
||||
insertSlice,
|
||||
insertTask,
|
||||
openDatabase,
|
||||
} from "../sf-db.js";
|
||||
import { deriveState, invalidateStateCache } from "../state.js";
|
||||
|
||||
const tmpDirs = [];
|
||||
|
||||
afterEach(() => {
|
||||
closeDatabase();
|
||||
invalidateStateCache();
|
||||
while (tmpDirs.length > 0) {
|
||||
const dir = tmpDirs.pop();
|
||||
if (dir) rmSync(dir, { recursive: true, force: true });
|
||||
}
|
||||
});
|
||||
|
||||
function makeProject() {
|
||||
const dir = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
|
||||
tmpDirs.push(dir);
|
||||
mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S01"), {
|
||||
recursive: true,
|
||||
});
|
||||
mkdirSync(join(dir, ".sf", "milestones", "M777", "slices", "S02"), {
|
||||
recursive: true,
|
||||
});
|
||||
openDatabase(join(dir, ".sf", "sf.db"));
|
||||
insertMilestone({
|
||||
id: "M777",
|
||||
title: "DB runtime authority",
|
||||
status: "active",
|
||||
});
|
||||
insertSlice({
|
||||
milestoneId: "M777",
|
||||
id: "S02",
|
||||
title: "DB first slice",
|
||||
status: "pending",
|
||||
sequence: 1,
|
||||
});
|
||||
insertSlice({
|
||||
milestoneId: "M777",
|
||||
id: "S01",
|
||||
title: "Stale file first slice",
|
||||
status: "pending",
|
||||
sequence: 2,
|
||||
});
|
||||
insertTask({
|
||||
milestoneId: "M777",
|
||||
sliceId: "S02",
|
||||
id: "T02",
|
||||
title: "DB first task",
|
||||
status: "pending",
|
||||
sequence: 1,
|
||||
});
|
||||
insertTask({
|
||||
milestoneId: "M777",
|
||||
sliceId: "S02",
|
||||
id: "T01",
|
||||
title: "DB second task",
|
||||
status: "pending",
|
||||
sequence: 2,
|
||||
});
|
||||
return dir;
|
||||
}
|
||||
|
||||
test("deriveState_when_generated_projections_are_stale_uses_db_slice_and_task_sequence", async () => {
|
||||
const project = makeProject();
|
||||
const milestoneDir = join(project, ".sf", "milestones", "M777");
|
||||
writeFileSync(
|
||||
join(milestoneDir, "M777-ROADMAP.md"),
|
||||
[
|
||||
"# M777: stale generated roadmap",
|
||||
"",
|
||||
"## Slice Overview",
|
||||
"| ID | Slice | Risk | Depends | Done | After this |",
|
||||
"|----|-------|------|---------|------|------------|",
|
||||
"| S01 | Stale file first slice | low | - | | stale first |",
|
||||
"| S02 | DB first slice | low | - | | should still be first by DB |",
|
||||
"",
|
||||
].join("\n"),
|
||||
);
|
||||
writeFileSync(
|
||||
join(milestoneDir, "M777-ROADMAP.json"),
|
||||
JSON.stringify(
|
||||
{
|
||||
origin: "stale-test-projection",
|
||||
slices: [
|
||||
{ id: "S01", title: "Stale file first slice", sequence: 1 },
|
||||
{ id: "S02", title: "DB first slice", sequence: 2 },
|
||||
],
|
||||
},
|
||||
null,
|
||||
2,
|
||||
),
|
||||
);
|
||||
writeFileSync(
|
||||
join(milestoneDir, "slices", "S02", "S02-PLAN.md"),
|
||||
[
|
||||
"# S02: stale generated plan",
|
||||
"",
|
||||
"## Tasks",
|
||||
"",
|
||||
"- [ ] T99: stale task that should not replace DB rows",
|
||||
"",
|
||||
].join("\n"),
|
||||
);
|
||||
|
||||
const state = await deriveState(project);
|
||||
|
||||
assert.equal(state.phase, "executing");
|
||||
assert.equal(state.activeMilestone?.id, "M777");
|
||||
assert.equal(state.activeSlice?.id, "S02");
|
||||
assert.equal(state.activeTask?.id, "T02");
|
||||
});
|
||||
|
||||
test("deriveState_when_db_has_no_tasks_refuses_runtime_plan_file_import", async () => {
|
||||
const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
|
||||
tmpDirs.push(project);
|
||||
mkdirSync(join(project, ".sf", "milestones", "M778", "slices", "S01"), {
|
||||
recursive: true,
|
||||
});
|
||||
openDatabase(join(project, ".sf", "sf.db"));
|
||||
insertMilestone({
|
||||
id: "M778",
|
||||
title: "DB planning authority",
|
||||
status: "active",
|
||||
});
|
||||
insertSlice({
|
||||
milestoneId: "M778",
|
||||
id: "S01",
|
||||
title: "Plan exists without DB tasks",
|
||||
status: "pending",
|
||||
sequence: 1,
|
||||
});
|
||||
writeFileSync(
|
||||
join(project, ".sf", "milestones", "M778", "M778-ROADMAP.md"),
|
||||
[
|
||||
"# M778: DB planning authority",
|
||||
"",
|
||||
"## Slice Overview",
|
||||
"| ID | Slice | Risk | Depends | Done | After this |",
|
||||
"|----|-------|------|---------|------|------------|",
|
||||
"| S01 | Plan exists without DB tasks | low | - | | keep planning |",
|
||||
"",
|
||||
].join("\n"),
|
||||
);
|
||||
writeFileSync(
|
||||
join(project, ".sf", "milestones", "M778", "slices", "S01", "S01-PLAN.md"),
|
||||
[
|
||||
"# S01: stale generated plan",
|
||||
"",
|
||||
"## Tasks",
|
||||
"",
|
||||
"- [ ] T01: stale file task",
|
||||
"",
|
||||
].join("\n"),
|
||||
);
|
||||
|
||||
const state = await deriveState(project);
|
||||
|
||||
assert.equal(state.phase, "planning");
|
||||
assert.equal(state.activeMilestone?.id, "M778");
|
||||
assert.equal(state.activeSlice?.id, "S01");
|
||||
assert.equal(state.activeTask, null);
|
||||
assert.equal(getSliceTasks("M778", "S01").length, 0);
|
||||
assert.match(state.nextAction, /has a plan file but no tasks/i);
|
||||
});
|
||||
|
||||
test("deriveState_when_db_has_no_milestones_refuses_runtime_roadmap_import", async () => {
|
||||
const project = mkdtempSync(join(tmpdir(), "sf-db-runtime-state-"));
|
||||
tmpDirs.push(project);
|
||||
const milestoneDir = join(project, ".sf", "milestones", "M779");
|
||||
mkdirSync(milestoneDir, { recursive: true });
|
||||
openDatabase(join(project, ".sf", "sf.db"));
|
||||
writeFileSync(
|
||||
join(milestoneDir, "M779-ROADMAP.md"),
|
||||
[
|
||||
"# M779: stale disk-only roadmap",
|
||||
"",
|
||||
"## Slice Overview",
|
||||
"| ID | Slice | Risk | Depends | Done | After this |",
|
||||
"|----|-------|------|---------|------|------------|",
|
||||
"| S01 | Should not import | low | - | | no DB mutation |",
|
||||
"",
|
||||
].join("\n"),
|
||||
);
|
||||
|
||||
const state = await deriveState(project);
|
||||
|
||||
assert.equal(state.phase, "pre-planning");
|
||||
assert.equal(state.activeMilestone, null);
|
||||
assert.deepEqual(getAllMilestones(), []);
|
||||
});
|
||||
|
||||
test("deriveState_when_task_summary_exists_keeps_db_task_status_authoritative", async () => {
|
||||
const project = makeProject();
|
||||
const taskDir = join(
|
||||
project,
|
||||
".sf",
|
||||
"milestones",
|
||||
"M777",
|
||||
"slices",
|
||||
"S02",
|
||||
"tasks",
|
||||
);
|
||||
mkdirSync(taskDir, { recursive: true });
|
||||
writeFileSync(
|
||||
join(taskDir, "T02-SUMMARY.md"),
|
||||
[
|
||||
"# T02: stale summary projection",
|
||||
"",
|
||||
"This file is not DB state.",
|
||||
"",
|
||||
].join("\n"),
|
||||
);
|
||||
|
||||
const state = await deriveState(project);
|
||||
const [firstTask] = getSliceTasks("M777", "S02");
|
||||
|
||||
assert.equal(state.phase, "executing");
|
||||
assert.equal(state.activeSlice?.id, "S02");
|
||||
assert.equal(state.activeTask?.id, "T02");
|
||||
assert.equal(firstTask.id, "T02");
|
||||
assert.equal(firstTask.status, "pending");
|
||||
});
|
||||
|
|
@ -184,13 +184,16 @@ export function parseTriageReport(agentOutput) {
|
|||
* Idempotent: skips rows whose ID already appears in the file.
|
||||
* 2. Call markResolved for each resolution in the report.
|
||||
* Idempotent: markResolved itself skips already-resolved entries.
|
||||
* 3. Quick Win #1: Auto-fix high-confidence self-reports where confidence > 0.85.
|
||||
*
|
||||
* @param basePath - project root (directory containing .sf/)
|
||||
* @param report - parsed TriageReport from parseTriageReport()
|
||||
*/
|
||||
export function applyTriageReport(basePath, report) {
|
||||
export async function applyTriageReport(basePath, report) {
|
||||
let requirementsAdded = 0;
|
||||
let entriesResolved = 0;
|
||||
let reportsAutoFixed = 0;
|
||||
|
||||
// ── 1. Write promoted requirements ────────────────────────────────────────
|
||||
if (report.promotedRequirements.length > 0) {
|
||||
const sfDir = sfRoot(basePath);
|
||||
|
|
@ -244,6 +247,7 @@ export function applyTriageReport(basePath, report) {
|
|||
}
|
||||
writeFileSync(reqPath, content, "utf-8");
|
||||
}
|
||||
|
||||
// ── 2. Resolve entries ────────────────────────────────────────────────────
|
||||
for (const resolution of report.resolutions) {
|
||||
const evidenceKind = resolution.evidenceKind;
|
||||
|
|
@ -258,5 +262,26 @@ export function applyTriageReport(basePath, report) {
|
|||
);
|
||||
if (mutated) entriesResolved++;
|
||||
}
|
||||
return { requirementsAdded, entriesResolved };
|
||||
|
||||
// ── 3. Quick Win #1: Auto-fix high-confidence self-reports ────────────────
|
||||
// Integration point for self-report-fixer: read open reports and auto-apply
|
||||
// fixes where confidence > 0.85.
|
||||
try {
|
||||
const { autoFixHighConfidenceReports } = await import(
|
||||
"./self-report-fixer.js"
|
||||
);
|
||||
const allOpen = [
|
||||
...readAllSelfFeedback(basePath),
|
||||
...readUpstreamSelfFeedback(),
|
||||
].filter((e) => !e.resolvedAt);
|
||||
|
||||
if (allOpen.length > 0) {
|
||||
const result = await autoFixHighConfidenceReports(basePath, allOpen);
|
||||
reportsAutoFixed = result.applied.length;
|
||||
}
|
||||
} catch {
|
||||
/* self-report fixer is optional; never block triage report application */
|
||||
}
|
||||
|
||||
return { requirementsAdded, entriesResolved, reportsAutoFixed };
|
||||
}
|
||||
|
|
|
|||
|
|
@ -2,6 +2,7 @@ import assert from "node:assert/strict";
|
|||
import { describe, it } from "vitest";
|
||||
import type { ProgressContext } from "../headless-ui.js";
|
||||
import {
|
||||
extractAssistantPreviewDelta,
|
||||
formatCostLine,
|
||||
formatHeadlessHeartbeat,
|
||||
formatProgress,
|
||||
|
|
@ -556,6 +557,31 @@ describe("formatThinkingLine", () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe("extractAssistantPreviewDelta", () => {
|
||||
it("separates assistant text deltas from thinking deltas", () => {
|
||||
assert.deepEqual(
|
||||
extractAssistantPreviewDelta({
|
||||
type: "text_delta",
|
||||
delta: "I will edit the file.",
|
||||
}),
|
||||
{ kind: "text", text: "I will edit the file." },
|
||||
);
|
||||
assert.deepEqual(
|
||||
extractAssistantPreviewDelta({
|
||||
type: "thinking_delta",
|
||||
delta: "Need inspect first.",
|
||||
}),
|
||||
{ kind: "thinking", text: "Need inspect first." },
|
||||
);
|
||||
});
|
||||
|
||||
it("ignores non-preview assistant events", () => {
|
||||
assert.equal(extractAssistantPreviewDelta({ type: "text_start" }), null);
|
||||
assert.equal(extractAssistantPreviewDelta({ type: "thinking_end" }), null);
|
||||
assert.equal(extractAssistantPreviewDelta(null), null);
|
||||
});
|
||||
});
|
||||
|
||||
describe("formatCostLine", () => {
|
||||
it("formats cost with token count", () => {
|
||||
const result = formatCostLine(0.0523, 4200, 1100);
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue