singularity-forge/docs/dev/ADR-001-branchless-worktree-architecture.md
2026-05-06 00:38:36 +02:00

21 KiB
Raw Blame History

ADR-001: Branchless Worktree Architecture

Historical note: this ADR predates the current DB-backed planning-state direction. Where it says markdown is truth and DB is cache, that statement is superseded by docs/adr/0000-purpose-to-software-compiler.md and docs/adr/0001-promote-only-sf-state.md: structured .sf state is authoritative at runtime, and markdown is a projection when structured state exists.

Status: Accepted — partial drift Date: 2026-03-15 Revised: 2026-05-02 — partial drift documented; code migration incomplete Deciders: Lex Christopherson Advisors: Claude Opus 4.6, Gemini 2.5 Pro, GPT-5.4 (Codex)

Context

SF uses git for isolation during autonomous coding sessions. The current architecture (shipped in M003, v2.13.0) creates a worktree per milestone with slice branches inside each worktree. Each slice (S01, S02, ...) gets its own branch (sf/M001/S01) within the worktree, which merges back to the milestone branch (milestone/M001) via --no-ff when the slice completes. The milestone branch squash-merges to main when the milestone completes.

This architecture replaced a previous "branch-per-slice" model that had severe .sf/ merge conflicts. M003 solved the merge conflicts but retained slice branches inside worktrees, inheriting complexity that has produced persistent, user-facing failures.

Problems

1. Planning artifact invisibility (loop detection failures)

When research-slice or plan-slice dispatches, the agent writes artifacts (e.g., S02-RESEARCH.md) on a slice branch. After the agent completes, handleAgentEnd switches back to the milestone branch for the next dispatch. The artifact is on the slice branch, not the milestone branch. verifyExpectedArtifact() checks the milestone branch, can't find the file, increments the loop counter, retries, same result. After 3 retries → hard stop. After 6 lifetime dispatches → permanent stop. This burns budget and blocks progress.

Documented in the auto-stop architecture doc as "The Branch-Switching Problem."

2. .sf/ state clobbering across branches

.sf/ is gitignored (line 52 of .gitignore: .sf/). Planning artifacts (roadmaps, plans, summaries, decisions, requirements) live in .sf/milestones/ but are invisible to git. When multiple branches or worktrees operate from the same repo, they share a single .sf/ directory on disk. Branch A's M001 roadmap overwrites Branch B's M001 roadmap. SF reads corrupted state, shows wrong milestone as complete, or enters infinite dispatch loops.

The codebase has a contradictory workaround: smartStage() (git-service.ts:304-352) force-adds SF_DURABLE_PATHS (milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md) despite the .gitignore. This means .sf/milestones/ IS partially tracked on some branches but the gitignore claims otherwise. The code fights the configuration.

3. Merge/conflict code complexity

The current slice branch model requires:

  • mergeSliceToMilestone() — 98 lines, --no-ff merge with withMergeHeal wrapper
  • mergeSliceToMain() — 189 lines, squash-merge with conflict detection/categorization/auto-resolution
  • git-self-heal.ts — 198 lines, 3 recovery functions for merge failures
  • fix-merge dispatch unit — dedicated LLM session to resolve conflicts the auto-resolver can't handle
  • smartStage() — 49 lines of runtime exclusion during staging
  • Conflict categorization — 80 lines classifying .sf/ vs runtime vs code conflicts

Total: ~582 lines of merge/branch/conflict code across 3 files, plus the fix-merge prompt template and dispatch logic. This code exists solely because of slice branches.

4. Dual isolation modes

Branch-mode (git-service.ts:mergeSliceToMain) and worktree-mode (auto-worktree.ts:mergeSliceToMilestone) have parallel implementations with different merge strategies, different conflict handling, and different branch naming. Both paths must be maintained and tested. 11 test files exercise merge/branch/worktree logic.

5. Bug history

  • v2.11.1: URGENT fix for parse cache staleness causing repeated unit dispatch (directly caused by branch switching invalidation timing)
  • v2.13.1: Windows hotfix for multi-line commit messages in mergeSliceToMilestone
  • 15+ separate bug fixes for .sf/ merge conflicts in the pre-M003 era
  • Persistent user complaints about loop detection failures and state corruption

Decision

Eliminate slice branches entirely. All work within a milestone worktree commits sequentially on a single branch (milestone/<MID>). No branch creation, no branch switching, no slice merges, no conflict resolution within a worktree.

Track .sf/ planning artifacts in git. Gitignore only runtime/ephemeral state.

The Architecture

main ──────────────────────────────────────────── main
  │                                                 ↑
  └─ worktree (milestone/M001)                      │
       │                                            │
       commit: feat(M001): context + roadmap        │
       commit: feat(M001/S01): research             │
       commit: feat(M001/S01): plan                 │
       commit: feat(M001/S01/T01): impl             │
       commit: feat(M001/S01/T02): impl             │
       commit: feat(M001/S01): summary + UAT        │
       commit: feat(M001/S02): research             │
       commit: ...                                  │
       commit: feat(M001): milestone complete       │
       │                                            │
       └──────────── squash merge ──────────────────┘

Git Primitives Used

Primitive Purpose
Worktrees One per active milestone. Filesystem isolation.
Commits Granular sequential history of every action.
Squash merge Clean single commit on main per milestone.
Branches Only main and milestone/<MID>. Nothing else.

Git Primitives NOT Used

Primitive Why Not
Slice branches Slices are sequential. Branches add complexity with no rollback benefit.
--no-ff merges No branches to merge within a worktree.
Branch switching Never happens. All work on one branch.
Conflict resolution No merges within a worktree means no conflicts within a worktree.

.sf/ Tracking Model

Tracked in git (travels with the branch):

.sf/milestones/         — roadmaps, plans, summaries, research, contexts, task plans/summaries
.sf/PROJECT.md          — project overview
.sf/DECISIONS.md        — architectural decision register
.sf/REQUIREMENTS.md     — requirements register
.sf/QUEUE.md            — work queue

Gitignored (ephemeral, runtime, infrastructure):

.sf/runtime/            — dispatch records, timeout tracking
.sf/activity/           — JSONL session dumps
.sf/worktrees/          — git worktree working directories
.sf/auto.lock           — crash detection sentinel
.sf/metrics.json        — token/cost accumulator
.sf/completed-units.json — dispatch idempotency tracker
.sf/STATE.md            — derived state cache (rebuilt by deriveState())
.sf/sf.db              — SQLite cache (rebuilt from tracked markdown by importers)
.sf/DISCUSSION-MANIFEST.json — discussion phase tracking
.sf/milestones/**/*-CONTINUE.md — interrupted-work markers
.sf/milestones/**/continue.md   — legacy continue markers

.gitignore Update

Replace the current blanket .sf/ ignore with explicit runtime-only ignores:

# ── SF: Runtime / Ephemeral ─────────────────────────────────
.sf/auto.lock
.sf/completed-units.json
.sf/STATE.md
.sf/metrics.json
.sf/sf.db
.sf/activity/
.sf/runtime/
.sf/worktrees/
.sf/DISCUSSION-MANIFEST.json
.sf/milestones/**/*-CONTINUE.md
.sf/milestones/**/continue.md

Planning artifacts (milestones/, PROJECT.md, DECISIONS.md, REQUIREMENTS.md, QUEUE.md) are NOT in .gitignore and are tracked normally.

Consequences

Code Deletion

File Lines Deleted What's Removed
auto-worktree.ts ~246 mergeSliceToMilestone(), shouldUseWorktreeIsolation(), getMergeToMainMode(), slice merge guards
git-service.ts ~250 mergeSliceToMain(), conflict resolution, runtime stripping post-merge, ensureSliceBranch(), switchToMain()
git-self-heal.ts ~86 abortAndReset(), withMergeHeal() (merge-specific recovery) (still present — see Drift section)
auto.ts ~150 Merge dispatch guards, fix-merge dispatch path, branch-mode routing (partially still present — see Drift section)
worktree.ts ~40 getSliceBranchName(), ensureSliceBranch(), mergeSliceToMain() delegates (getSliceBranchName still present — see Drift section)
Test files ~11 files auto-worktree-merge.test.ts, auto-worktree-milestone-merge.test.ts, merge-related test cases
Total ~770+ lines

Verified 2026-05-02: mergeSliceToMilestone and the original mergeSliceToMain (git-service.ts) are deleted. Several other items listed above are still present — see Drift section below. Current authoritative milestone merge path: src/resources/extensions/sf/auto-worktree.ts:1616 (mergeMilestoneToMain). A separate newer function mergeSliceToMain at src/resources/extensions/sf/slice-cadence.ts:92 was added post-ADR for the slice-cadence collapse feature (#4765) and is unrelated to the deleted branch-era function.

What mergeMilestoneToMain() Becomes

The function simplifies dramatically:

  1. Auto-commit any dirty state in worktree
  2. chdir back to main repo root
  3. git checkout main
  4. git merge --squash milestone/<MID>
  5. git commit with milestone summary
  6. Remove worktree + delete branch

No conflict categorization. No runtime file stripping. No .sf/ special handling. Planning artifacts merge cleanly because they're in .sf/milestones/M001/ which doesn't exist on main until this merge.

What smartStage() Becomes

The force-add of SF_DURABLE_PATHS is no longer needed — planning artifacts are not gitignored, so git add -A picks them up naturally. The function reduces to:

  1. git add -A
  2. git reset HEAD -- <runtime paths> (unstage runtime files)

The _runtimeFilesCleanedUp one-time migration logic can also be removed.

Verified 2026-05-02: smartStage() at src/resources/extensions/sf/git-service.ts:576 is still present in its original form, including _runtimeFilesCleanedUp migration logic and SF_MILESTONE_LOCK parallel-scope logic. The simplification described here has not been performed — see Drift section.

What Happens to handleAgentEnd()

After any unit completes:

  1. Invalidate caches
  2. autoCommitCurrentBranch() — commits on the one and only branch
  3. verifyExpectedArtifact() — file is always on the current branch (no branch switching)
  4. Persist completion key

The "Path A fix" (lines 937-953) becomes the only path. No branch mismatch possible.

What Happens to fix-merge

The fix-merge dispatch unit type is eliminated. Within a worktree, there are no merges that can conflict. The only merge is milestone→main (squash), and if that conflicts (rare, parallel milestone edge case), it's handled as a one-time resolution at milestone completion — not a dispatch loop.

Verified 2026-05-02: The fix-merge prompt template has been deleted (no fix-merge.md exists in src/resources/extensions/sf/prompts/). However, MergeConflictError (re-exported from git-service.ts:201) is still present along with a JSDoc comment at git-service.ts:199 that still mentions dispatching a "fix-merge session". No active dispatch-loop code was found — this appears to be residual documentation in the class definition, not active dispatch logic. Treated as "still present (partial)" in the Drift section.

Backwards Compatibility

The shouldUseWorktreeIsolation() three-tier preference resolution is replaced by a single behavior: worktree isolation is always used. The git.isolation: "branch" preference is deprecated.

Projects with existing sf/M001/S01 slice branches can still be read by state derivation, but new work never creates slice branches.

Risks

1. Parallel milestone code conflicts at squash-merge time

If two milestones modify the same source file, the second squash-merge to main will conflict. Mitigation: git fetch origin main && git rebase main before squash-merge. This is standard practice and rare in single-user workflows.

2. Loss of per-slice git history after squash

Squash merge collapses all commits into one on main. Mitigations:

  • Commit messages tag slices (feat(M001/S01/T01):) — filterable with git log --grep
  • The milestone branch can be preserved (not deleted) if history is needed
  • Alternative: merge --no-ff instead of --squash to keep history on main

3. SQLite DB desync after git reset

If tracked markdown rolls back via git reset --hard, the gitignored sf.db doesn't. Mitigation: the importer layer (M001/S02) rebuilds the DB from markdown on startup. The DB is a cache, markdown is truth.

4. Disk space with multiple worktrees

Each worktree duplicates the working directory (including node_modules). Mitigation: single active milestone at a time (single-user workflow), immediate cleanup after completion.

Alternatives Considered

A. Keep slice branches, fix visibility with immediate mini-merges

After research-slice or plan-slice, immediately merge the slice branch back to the milestone branch. This fixes the loop detection bug but retains all merge complexity.

Rejected: Adds another merge path instead of removing the root cause. Still requires conflict resolution, self-healing, branch switching.

B. Keep .sf/ gitignored, bootstrap from git history for manual worktrees

When SF detects an empty .sf/ in a worktree, reconstruct state from the branch's git history using git show <commit>:.sf/....

Rejected: Recovery logic, not architecture. Doesn't fix the fundamental problem of branch-agnostic state. Fails when git history has been rewritten.

C. Branch-scoped .sf/ directories (.sf/branches/<branch-name>/milestones/...)

Each branch writes to a namespaced subdirectory within .sf/.

Rejected: Adds complexity instead of removing it. Requires renaming/moving on branch creation, doesn't work with standard git tools (git checkout doesn't rename directories).

Validation

This architecture was stress-tested by three independent models:

Gemini 2.5 Pro identified 6 attack vectors. None broke the core model. Recommendations: pre-flight rebase before squash-merge (adopted), heartbeat locks (already exists), DB rebuild on startup (adopted via M001/S02 importers).

GPT-5.4 (Codex) read the full codebase and confirmed the model is sound. Identified that smartStage() already force-adds durable paths (validating the tracked-artifact approach) and that resolveMainWorktreeRoot in PR #487 is architecturally wrong (adopted — PR to be closed).

Codebase analysis confirmed .sf/milestones/ is already partially tracked on main despite the .gitignore, that SF_DURABLE_PATHS exists as a code-level acknowledgment that planning artifacts should be tracked, and that the README already documents the correct runtime-only gitignore pattern.

Codex (GPT-5.4) Dissent — "No Slice Branches Is a Redesign"

Codex read the full codebase and raised 4 concerns. Each is addressed:

Concern 1: "Crash after slice done but before integration — today the runtime detects orphaned slice branches and merges them."

Rebuttal: In the branchless model, there is no integration step to crash between. Slice work is committed directly on the milestone branch. On restart, deriveState() reads the branch state as-is. The orphaned-branch recovery path exists solely because of slice branches — removing branches removes the failure mode it recovers from.

Concern 2: "Concurrent edits to shared root docs (PROJECT.md, DECISIONS.md) from two terminals."

Rebuttal: Valid edge case. If /sf queue edits DECISIONS.md on main while auto-mode edits it in a worktree, there's a content conflict at squash-merge time. This is a standard git content conflict — no different from two developers editing the same file. Handled by normal merge resolution. Not caused by or solved by slice branches.

Concern 3: "Slice→milestone merges provide continuous integration. Removing them pushes conflict discovery to the end."

Rebuttal: In a single-user sequential workflow, there is nothing to integrate against within a worktree. Each slice builds on the previous one. The only conflict source is main diverging (e.g., another milestone merging first), which slice→milestone merges don't catch anyway — they merge within the worktree, not against main. Pre-flight rebase before squash-merge catches this more directly.

Concern 4: "Replace slice branches with another explicit slice-boundary primitive. Don't just delete them."

Response: Accepted in spirit. Commits with conventional tags (feat(M001/S01):, feat(M001/S01/T01):) serve as the slice boundary primitive. git log --grep="M001/S01" isolates a slice's history. git revert targets specific commits. Git tags (sf/M001/S01-complete) can mark slice completion if needed. The boundary primitive is commit metadata, not branches.

Action Items

  1. Close PR #487 (resolveMainWorktreeRoot) — contradicts this architecture
  2. Implement as a SF milestone with phases:
    • Update .gitignore and force-add existing planning artifacts
    • Remove slice branch creation/switching/merging code
    • Simplify mergeMilestoneToMain() and smartStage()
    • Remove fix-merge dispatch unit
    • Remove branch-mode isolation (git.isolation: "branch")
    • Update/delete 11 test files
    • Update README suggested gitignore
    • Migration path for existing projects with slice branches

Drift From Original Decision

Audited 2026-05-02. Items the ADR claims were deleted that are still present in the codebase:

Item Status File:Line Cleanup Pending
git-self-heal.ts (whole file) Still present src/resources/extensions/sf/git-self-heal.ts:1142 File is 142 lines; exports abortAndReset() and formatGitError(). The ADR claimed ~86 lines deleted. Delete entire file and migrate any callers of abortAndReset() to in-place reset logic.
smartStage() Still present src/resources/extensions/sf/git-service.ts:576 Still has _runtimeFilesCleanedUp migration logic, SF_MILESTONE_LOCK parallel-scope exclusions, and 49+ lines of runtime exclusion. Simplify as described in Consequences section.
shouldUseWorktreeIsolation() Still present src/resources/extensions/sf/auto.ts:357 ADR requires single-mode worktree-always behavior; this function still exists and defaults to false (worktree off unless explicit opt-in). Branch-mode fallback persists. Remove after deprecating git.isolation: "branch".
getSliceBranchName() Still present src/resources/extensions/sf/worktree.ts:261 Still used by workspace-index.ts:156 to record historical branch names. Evaluate whether this is still needed or can be removed.
MergeConflictError + fix-merge JSDoc Partially present src/resources/extensions/sf/git-service.ts:199225 MergeConflictError class and a JSDoc comment referencing "dispatch a fix-merge session" remain. The fix-merge.md prompt template is deleted; no active dispatch loop found. Remove the JSDoc reference and evaluate if MergeConflictError is still needed (it is — used by slice-cadence.ts).

Items Confirmed Deleted

  • mergeSliceToMilestone() — not found anywhere in the codebase.
  • Original mergeSliceToMain() (branch-era, from git-service.ts) — deleted. A new mergeSliceToMain() exists at src/resources/extensions/sf/slice-cadence.ts:92 but was added post-ADR for the slice-cadence collapse feature (#4765) and is architecturally consistent with the branchless model.
  • fix-merge.md prompt template — deleted (no file at src/resources/extensions/sf/prompts/fix-merge.md).
  • Conflict categorization (~80 lines) — not found.
  • withMergeHeal() — not found.
  • ensureSliceBranch() / switchToMain() / getMergeToMainMode() — not found.

Current Authoritative Merge Path

The current milestone→main merge implementation is mergeMilestoneToMain() at src/resources/extensions/sf/auto-worktree.ts:1616. It performs squash-merge after auto-committing dirty worktree state, reconciling the worktree DB, and running a pre-flight rebase. It does not use slice branches, withMergeHeal, or conflict categorization.