singularity-forge/docs/dev/PRD-branchless-worktree-architecture.md
ace-pm b29c12d5e5 refactor(native): rename gsd_parser.rs to forge_parser.rs
Final rebrand: rename remaining Rust source file to complete the gsd → forge
transition. All parser references already use forge_parser after earlier commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:58:21 +02:00

18 KiB

PRD: Branchless Worktree Architecture

Author: Lex Christopherson Date: 2026-03-15 ADR: ADR-001-branchless-worktree-architecture.md Priority: Critical — blocks reliable auto-mode operation


Problem Statement

SF's auto-mode is unreliable. Users experience:

  1. Infinite loop detection failures — the agent writes planning artifacts on slice branches that become invisible after branch switching, causing verifyExpectedArtifact() to fail repeatedly. Auto-mode burns budget retrying the same unit 3-6 times before hard-stopping. This is the #1 user complaint.

  2. State corruption across branches.sf/ planning artifacts (roadmaps, plans, decisions) are gitignored but branch-specific. Multiple branches sharing a single .sf/ directory clobber each other's state. Users see wrong milestones marked complete, wrong roadmaps loaded, and auto-mode starting from the wrong phase.

  3. Excessive complexity — 770+ lines of merge, conflict resolution, branch switching, and self-healing code exist solely to manage slice branches inside worktrees. This code has required 15+ bug fixes across versions and remains the primary source of auto-mode failures.

These problems are architectural. They cannot be fixed by patching individual symptoms.

Vision

Auto-mode uses git worktrees for isolation and sequential commits for history. No branch switching. No merge conflicts within a worktree. Planning artifacts are tracked in git and travel with the branch. The git layer is so simple it can't break.

Success Criteria

Criterion Measurement
Zero loop detection failures from branch visibility No verifyExpectedArtifact() failures caused by branch mismatch in 50 consecutive auto-mode runs
Zero .sf/ state corruption Manual worktrees created via git worktree add have correct .sf/ state without any SF-specific initialization
Code deletion Net removal of ≥500 lines of merge/conflict/branch-switching code
Test simplification Removal or simplification of ≥6 merge-specific test files
Backwards compatibility Existing projects with sf/M001/S01 slice branches continue to work (read-only; new work uses new model)
No new git primitives The implementation uses only: worktrees, commits, squash-merge. No new branch types, merge strategies, or conflict resolution.

Non-Goals

  • Parallel slice execution within a single worktree (if needed later, use separate worktrees)
  • Changing how milestones relate to main (squash-merge stays)
  • Modifying the dispatch unit types or state machine (except removing fix-merge)
  • Changing the worktree-manager.ts manual worktree API (/worktree command)

Current Architecture

Branch Model (M003, v2.13.0)

main
  └─ milestone/M001 (worktree at .sf/worktrees/M001/)
       ├─ sf/M001/S01 (slice branch — code + .sf/ artifacts)
       │   └── merge --no-ff → milestone/M001
       ├─ sf/M001/S02
       │   └── merge --no-ff → milestone/M001
       └── squash merge → main

Data Flow

Agent writes file → on slice branch → handleAgentEnd → auto-commit on slice branch
→ switch to milestone branch → verifyExpectedArtifact → FILE NOT FOUND (it's on slice branch)
→ loop counter++ → retry → same result → HARD STOP

Code Involved

File Lines Purpose
auto-worktree.ts 512 Worktree lifecycle + slice→milestone merge
git-service.ts 915 Branch creation, switching, merge with conflict resolution
git-self-heal.ts 198 Merge failure recovery
auto.ts ~150 lines Merge dispatch guards, fix-merge routing, branch-mode vs worktree-mode branching
worktree.ts ~40 lines Slice branch delegates
11 test files ~2000 lines Merge/branch/worktree test coverage

.sf/ Tracking (Current — Contradictory)

  • .gitignore line 52: .sf/ — ignores everything
  • smartStage() lines 338-349: force-adds SF_DURABLE_PATHS — tracks milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md
  • Result: .sf/milestones/ is partially tracked on some branches, fully ignored on others. The code fights the config.

Proposed Architecture

Branch Model

main
  └─ milestone/M001 (worktree at .sf/worktrees/M001/)
       │
       commit: feat(M001): context + roadmap
       commit: feat(M001/S01): research
       commit: feat(M001/S01): plan
       commit: feat(M001/S01/T01): implement auth service
       commit: feat(M001/S01/T02): implement auth tests
       commit: feat(M001/S01): summary + UAT
       commit: docs(M001): reassess roadmap after S01
       commit: feat(M001/S02): research
       commit: feat(M001/S02): plan
       commit: ...
       commit: feat(M001): milestone complete
       │
       └── squash merge → main

One branch. Sequential commits. No merges within the worktree.

Data Flow

Agent writes file → on milestone branch → handleAgentEnd → auto-commit on milestone branch
→ verifyExpectedArtifact → FILE FOUND (same branch) → persist completion → next dispatch

.sf/ Tracking (Proposed — Coherent)

Tracked (travels with branch):

.sf/milestones/**/*.md    (except CONTINUE markers)
.sf/milestones/**/*.json  (META.json integration records)
.sf/PROJECT.md
.sf/DECISIONS.md
.sf/REQUIREMENTS.md
.sf/QUEUE.md

Gitignored (ephemeral):

.sf/auto.lock
.sf/completed-units.json
.sf/STATE.md
.sf/metrics.json
.sf/sf.db
.sf/activity/
.sf/runtime/
.sf/worktrees/
.sf/DISCUSSION-MANIFEST.json
.sf/milestones/**/*-CONTINUE.md
.sf/milestones/**/continue.md

Why This Works

Problem How It's Solved
Artifact invisibility after branch switch No branch switching. Artifacts commit on the one branch.
.sf/ state clobbering Artifacts tracked in git. Each branch carries its own .sf/. git worktree add and git checkout give correct state.
Merge conflict complexity No merges within a worktree. Only merge is milestone→main (squash).
Manual worktree initialization Tracked artifacts are checked out with the branch. No SF-specific bootstrap needed.
Dual isolation mode maintenance Single mode: worktree. Branch-mode (git.isolation: "branch") deprecated.

Implementation Plan

Phase 1: .gitignore + Tracking Fix

Goal: Planning artifacts are tracked in git. .gitignore reflects reality.

  1. Update .gitignore:

    • Remove blanket .sf/ ignore
    • Add explicit runtime-only ignores (see proposed list above)
  2. Force-add existing planning artifacts on current branch:

    git add --force .sf/milestones/ .sf/PROJECT.md .sf/DECISIONS.md .sf/REQUIREMENTS.md .sf/QUEUE.md
    
  3. Ensure runtime files are NOT tracked:

    git rm --cached -r .sf/runtime/ .sf/activity/ .sf/STATE.md .sf/metrics.json .sf/completed-units.json .sf/auto.lock
    
  4. Update README suggested .gitignore section

  5. Remove smartStage() force-add of SF_DURABLE_PATHS — no longer needed since .gitignore doesn't block them

Verification: git status shows planning artifacts tracked, runtime files untracked. git worktree add on a new worktree has correct .sf/milestones/ state.

Phase 2: Remove Slice Branch Creation + Switching

Goal: No code creates, switches to, or references slice branches for new work.

  1. Remove ensureSliceBranch() from git-service.ts (lines 485-544)
  2. Remove switchToMain() from git-service.ts (lines 549-563)
  3. Remove getSliceBranchName() from worktree.ts (lines 94-98)
  4. Remove isOnSliceBranch() and getActiveSliceBranch() from worktree.ts
  5. Update auto.ts dispatch paths — remove branch creation before execute-task
  6. Update handleAgentEnd — remove branch-switching logic post-dispatch

Verification: Auto-mode runs a full slice (research → plan → execute → complete) without creating any branches. All commits land on milestone/<MID>.

Phase 3: Remove Slice Merge Code

Goal: All slice→milestone and slice→main merge code is deleted.

  1. Remove mergeSliceToMilestone() from auto-worktree.ts (lines 253-350)
  2. Remove mergeSliceToMain() from git-service.ts (lines 705-893)
  3. Remove merge dispatch guards from auto.ts (lines 1635-1679)
  4. Remove fix-merge dispatch unit type from auto.ts
  5. Remove buildPromptForFixMerge() from auto.ts
  6. Remove withMergeHeal() from git-self-heal.ts (lines 99-136)
  7. Remove abortAndReset() from git-self-heal.ts (lines 37-84) — or simplify to crash-recovery-only
  8. Remove shouldUseWorktreeIsolation() preference resolution — worktree is the only mode
  9. Remove getMergeToMainMode() — milestone merge is the only mode
  10. Deprecate git.isolation: "branch" and git.merge_to_main: "slice" preferences

Verification: git grep mergeSliceToMilestone returns zero results. git grep mergeSliceToMain returns zero results. git grep fix-merge returns zero results (outside of changelog/docs).

Phase 4: Simplify mergeMilestoneToMain()

Goal: Milestone→main merge is clean and minimal.

The function becomes:

  1. Auto-commit any dirty state in worktree
  2. process.chdir(originalBasePath) — back to main repo
  3. git checkout main
  4. git merge --squash milestone/<MID>
  5. Build commit message with milestone summary + slice manifest
  6. git commit
  7. Optional: git push
  8. removeWorktree() + git branch -D milestone/<MID>

No conflict categorization. No runtime file stripping (runtime files are gitignored, not in the merge). No .sf/ special handling.

If squash-merge conflicts (parallel milestone edge case): stop auto-mode with clear error, user resolves manually or SF dispatches a one-time resolution session.

Verification: Complete a full milestone in auto-mode. main receives one squash commit with all code and planning artifacts.

Phase 5: Test Cleanup

Goal: Test suite reflects the simplified architecture.

  1. Delete or rewrite:

    • auto-worktree-merge.test.ts — tests slice→milestone merge (deleted)
    • auto-worktree-milestone-merge.test.ts — rewrite for simplified milestone→main
    • worktree-e2e.test.ts — rewrite for branchless flow
    • worktree-integration.test.ts — rewrite for branchless flow
    • Merge-related test cases in git-service.test.ts
  2. Add new tests:

    • Branchless worktree lifecycle: create → commit → commit → squash-merge → cleanup
    • .sf/ tracking: planning artifacts tracked, runtime files ignored
    • Manual worktree: git worktree add has correct .sf/ state
    • Crash recovery: dirty state on milestone branch, restart, auto-commit, continue
  3. Remove merge-specific doctor checks or simplify:

    • corrupt_merge_state — keep (still relevant for milestone→main)
    • orphaned_auto_worktree — keep
    • stale_milestone_branch — keep
    • tracked_runtime_files — keep

Verification: npm run test passes. No test references mergeSliceToMilestone, mergeSliceToMain, or ensureSliceBranch.

Phase 6: Migration + Backwards Compatibility

Goal: Existing projects with slice branches continue to work.

  1. State derivation (deriveState()) continues to read sf/M001/S01 branch naming for legacy detection
  2. On first run after upgrade:
    • Detect existing slice branches
    • Notify user: "SF no longer creates slice branches. Existing branches are preserved but new work commits directly to the milestone branch."
    • No forced migration — legacy branches are read-only context
  3. Doctor check: legacy_slice_branches — informational, not auto-fix
  4. Update shouldUseWorktreeIsolation() preference handling:
    • git.isolation: "worktree" → default behavior (only option)
    • git.isolation: "branch" → warning, treated as worktree
    • Remove preference UI for isolation mode

Verification: Open a project with existing sf/M001/S01 branches. SF reads state correctly, new work commits on milestone branch without slice branches.

Stress Test Results

Validated by three independent models:

Gemini 2.5 Pro — 6 Attack Vectors

Attack Severity Mitigation
Parallel milestone code conflict at squash-merge Medium git rebase main before squash. Rare in single-user.
SQLite desync after git reset --hard Low DB rebuilt from tracked markdown on startup (M001/S02 importers).
Ghost lock after SIGKILL Low Existing heartbeat lock detection handles this.
Squash merge loses bisect granularity Low Commit messages tag slices. Branch preservable if needed.
Disk space with multiple worktrees Low Single active milestone at a time. Immediate cleanup.
Plan-action atomicity gap (crash between write and commit) Low handleAgentEnd auto-commits. Sequential model simplifies recovery.

GPT-5.4 (Codex) — Codebase-Informed Analysis

  • Confirmed smartStage() force-add already implements tracked-artifact intent
  • Confirmed resolveMainWorktreeRoot (PR #487) contradicts this architecture
  • Confirmed .sf/milestones/ partially tracked on main despite .gitignore
  • Verdict: Model is sound. Removes only accidental complexity.

GPT-5.4 (Codex) — Dissenting Opinion

Codex agreed on tracked artifacts and worktree-per-milestone, but pushed back on removing slice branches, calling it "a redesign, not a simplification." Specific concerns:

Concern Rebuttal
Crash recovery for orphaned slice branches disappears The failure mode (orphaned branch needing merge) is caused by slice branches. Removing branches removes the failure. Sequential commits on one branch need no orphan recovery.
Concurrent edits to shared root docs (DECISIONS.md) from two terminals Standard content conflict at squash-merge time. Not caused by or solved by slice branches.
Continuous integration via slice→milestone merges In sequential single-user work, there's nothing to integrate against within the worktree. Pre-flight rebase before squash-merge is more direct.
Need a replacement slice-boundary primitive Accepted: conventional commit tags (feat(M001/S01):) + optional git tags (sf/M001/S01-complete) serve as boundaries.

Codex's analysis confirms the tracked-artifact approach but recommends treating branchless as a deliberate redesign with explicit replacement primitives, not a casual deletion.

Edge Case: Two Milestones Touching Same Source Files

Scenario: M001 and M002 both modify src/auth.ts. M001 squash-merges first.

Resolution: Before M002 squash-merges, rebase onto updated main:

cd .sf/worktrees/M002
git fetch origin main
git rebase main
# Resolve any conflicts (code-only, never .sf/)
# Then squash-merge

This is standard git workflow. SF can automate the rebase step as a pre-merge check.

Edge Case: Agent Crash Mid-Commit

Scenario: Power loss during git commit on the milestone branch.

Resolution: Git's internal journaling protects the object store. On restart:

  • If commit completed: state is consistent
  • If commit didn't complete: working directory has uncommitted changes, handleAgentEnd auto-commits on next dispatch
  • No branch to be "stuck between" — single branch means no split-brain state

Edge Case: User Edits Main While Worktree Active

Scenario: User makes manual commits on main while M001 worktree is active.

Resolution: Worktree is on milestone/M001 branch, independent of main. Manual main commits don't affect the worktree. At squash-merge time, git merge --squash handles the divergence normally. If there's a conflict, it's resolved once.

Metrics

Before (Current)

Metric Value
Merge/conflict/branch code 770+ lines across 4 files
Merge-related test files 11 files
Branch types 4 (main, milestone/, sf//, worktree/)
Merge strategies 3 (--no-ff, --squash, conflict resolution)
Dispatch unit types with merge logic 2 (complete-slice, fix-merge)
Isolation modes 2 (branch, worktree)
Doctor git checks 4

After (Proposed)

Metric Value
Merge/conflict/branch code ~50 lines (simplified mergeMilestoneToMain only)
Merge-related test files 3-4 files (rewritten)
Branch types 2 (main, milestone/*)
Merge strategies 1 (--squash)
Dispatch unit types with merge logic 0
Isolation modes 1 (worktree)
Doctor git checks 3-4 (simplified)

Net Impact

  • ~720 lines deleted (net, after simplified replacements)
  • ~7 test files deleted or consolidated
  • 2 branch types eliminated
  • 2 merge strategies eliminated
  • 1 dispatch unit type eliminated (fix-merge)
  • 1 isolation mode eliminated (branch)
  • 0 merge conflicts possible within a worktree

Dependencies

  • M001 (Memory Database): The SQLite database (sf.db) must remain gitignored. The M001/S02 importer layer rebuilds it from tracked markdown. This PRD's .gitignore update explicitly ignores sf.db.

  • PR #487: Must be closed. The resolveMainWorktreeRoot approach (sharing .sf/ across worktrees) contradicts tracked-artifact architecture.

Open Questions

  1. Squash vs --no-ff for milestone→main merge? Squash gives clean history on main but loses bisect granularity. --no-ff preserves granular commits but clutters main. Current proposal: squash (matching existing behavior), with option to preserve milestone branch for debugging.

  2. Should worktrees/ move outside .sf/? Having worktrees inside .sf/ creates a nesting-doll pattern (worktree contains .sf/ which is inside .sf/worktrees/). Relocating to .sf-worktrees/ or ~/.sf/worktrees/<repo-hash>/ is cleaner but changes the filesystem layout. Recommendation: defer, address separately if it causes issues.

  3. Pre-flight rebase automation? Before milestone→main squash-merge, should SF automatically git rebase main? Gemini recommends yes. Risk: rebase can fail with conflicts, adding a code path. Recommendation: implement as a doctor check ("milestone branch is behind main by N commits") with manual resolution, automate later if needed.