chore: consolidate docs, remove stale artifacts, and repo hygiene (#2665)
* fix(vscode): add extensionKind and error handler for Remote SSH support * fix(vscode): reject failed RPC startup * docs: consolidate docs, remove stale artifacts, and repo hygiene - Remove docs-internal/ (duplicate of docs/); update pr-risk-check.mjs path - Sync 9 diverged files to latest content (commands, config, troubleshooting, etc.) - Fix pi --web → gsd --web naming in docs/README.md - Copy FRONTIER-TECHNIQUES.md and ADR-004 to docs/ before removal - Delete orphaned PR screenshot folders (pr-876/, pr-1530/) — unreferenced - Remove committed pnpm-lock.yaml files (project uses npm) - Move PLAN.md → .plans/doctor-cleanup-consolidation.md - Move web/left-native-tui-main-session-plan.md → .plans/ - Delete .DS_Store and vscode-extension/dist/ from disk Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
c9da003151
commit
f8c6ab0c54
186 changed files with 651 additions and 21341 deletions
|
|
@ -1,279 +0,0 @@
|
|||
# ADR-001: Branchless Worktree Architecture
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2026-03-15
|
||||
**Deciders:** Lex Christopherson
|
||||
**Advisors:** Claude Opus 4.6, Gemini 2.5 Pro, GPT-5.4 (Codex)
|
||||
|
||||
## Context
|
||||
|
||||
GSD uses git for isolation during autonomous coding sessions. The current architecture (shipped in M003, v2.13.0) creates a **worktree per milestone** with **slice branches inside each worktree**. Each slice (`S01`, `S02`, ...) gets its own branch (`gsd/M001/S01`) within the worktree, which merges back to the milestone branch (`milestone/M001`) via `--no-ff` when the slice completes. The milestone branch squash-merges to `main` when the milestone completes.
|
||||
|
||||
This architecture replaced a previous "branch-per-slice" model that had severe `.gsd/` merge conflicts. M003 solved the merge conflicts but retained slice branches inside worktrees, inheriting complexity that has produced persistent, user-facing failures.
|
||||
|
||||
### Problems
|
||||
|
||||
**1. Planning artifact invisibility (loop detection failures)**
|
||||
|
||||
When `research-slice` or `plan-slice` dispatches, the agent writes artifacts (e.g., `S02-RESEARCH.md`) on a slice branch. After the agent completes, `handleAgentEnd` switches back to the milestone branch for the next dispatch. The artifact is on the slice branch, not the milestone branch. `verifyExpectedArtifact()` checks the milestone branch, can't find the file, increments the loop counter, retries, same result. After 3 retries → hard stop. After 6 lifetime dispatches → permanent stop. This burns budget and blocks progress.
|
||||
|
||||
Documented in the auto-stop architecture doc as "The Branch-Switching Problem."
|
||||
|
||||
**2. `.gsd/` state clobbering across branches**
|
||||
|
||||
`.gsd/` is gitignored (line 52 of `.gitignore`: `.gsd/`). Planning artifacts (roadmaps, plans, summaries, decisions, requirements) live in `.gsd/milestones/` but are invisible to git. When multiple branches or worktrees operate from the same repo, they share a single `.gsd/` directory on disk. Branch A's M001 roadmap overwrites Branch B's M001 roadmap. GSD reads corrupted state, shows wrong milestone as complete, or enters infinite dispatch loops.
|
||||
|
||||
The codebase has a contradictory workaround: `smartStage()` (git-service.ts:304-352) force-adds `GSD_DURABLE_PATHS` (milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md) despite the `.gitignore`. This means `.gsd/milestones/` IS partially tracked on some branches but the gitignore claims otherwise. The code fights the configuration.
|
||||
|
||||
**3. Merge/conflict code complexity**
|
||||
|
||||
The current slice branch model requires:
|
||||
- `mergeSliceToMilestone()` — 98 lines, `--no-ff` merge with `withMergeHeal` wrapper
|
||||
- `mergeSliceToMain()` — 189 lines, squash-merge with conflict detection/categorization/auto-resolution
|
||||
- `git-self-heal.ts` — 198 lines, 3 recovery functions for merge failures
|
||||
- `fix-merge` dispatch unit — dedicated LLM session to resolve conflicts the auto-resolver can't handle
|
||||
- `smartStage()` — 49 lines of runtime exclusion during staging
|
||||
- Conflict categorization — 80 lines classifying `.gsd/` vs runtime vs code conflicts
|
||||
|
||||
Total: **~582 lines** of merge/branch/conflict code across 3 files, plus the `fix-merge` prompt template and dispatch logic. This code exists solely because of slice branches.
|
||||
|
||||
**4. Dual isolation modes**
|
||||
|
||||
Branch-mode (`git-service.ts:mergeSliceToMain`) and worktree-mode (`auto-worktree.ts:mergeSliceToMilestone`) have parallel implementations with different merge strategies, different conflict handling, and different branch naming. Both paths must be maintained and tested. 11 test files exercise merge/branch/worktree logic.
|
||||
|
||||
**5. Bug history**
|
||||
|
||||
- v2.11.1: URGENT fix for parse cache staleness causing repeated unit dispatch (directly caused by branch switching invalidation timing)
|
||||
- v2.13.1: Windows hotfix for multi-line commit messages in `mergeSliceToMilestone`
|
||||
- 15+ separate bug fixes for `.gsd/` merge conflicts in the pre-M003 era
|
||||
- Persistent user complaints about loop detection failures and state corruption
|
||||
|
||||
## Decision
|
||||
|
||||
**Eliminate slice branches entirely.** All work within a milestone worktree commits sequentially on a single branch (`milestone/<MID>`). No branch creation, no branch switching, no slice merges, no conflict resolution within a worktree.
|
||||
|
||||
Track `.gsd/` planning artifacts in git. Gitignore only runtime/ephemeral state.
|
||||
|
||||
### The Architecture
|
||||
|
||||
```
|
||||
main ──────────────────────────────────────────── main
|
||||
│ ↑
|
||||
└─ worktree (milestone/M001) │
|
||||
│ │
|
||||
commit: feat(M001): context + roadmap │
|
||||
commit: feat(M001/S01): research │
|
||||
commit: feat(M001/S01): plan │
|
||||
commit: feat(M001/S01/T01): impl │
|
||||
commit: feat(M001/S01/T02): impl │
|
||||
commit: feat(M001/S01): summary + UAT │
|
||||
commit: feat(M001/S02): research │
|
||||
commit: ... │
|
||||
commit: feat(M001): milestone complete │
|
||||
│ │
|
||||
└──────────── squash merge ──────────────────┘
|
||||
```
|
||||
|
||||
### Git Primitives Used
|
||||
|
||||
| Primitive | Purpose |
|
||||
|-----------|---------|
|
||||
| **Worktrees** | One per active milestone. Filesystem isolation. |
|
||||
| **Commits** | Granular sequential history of every action. |
|
||||
| **Squash merge** | Clean single commit on `main` per milestone. |
|
||||
| **Branches** | Only `main` and `milestone/<MID>`. Nothing else. |
|
||||
|
||||
### Git Primitives NOT Used
|
||||
|
||||
| Primitive | Why Not |
|
||||
|-----------|---------|
|
||||
| Slice branches | Slices are sequential. Branches add complexity with no rollback benefit. |
|
||||
| `--no-ff` merges | No branches to merge within a worktree. |
|
||||
| Branch switching | Never happens. All work on one branch. |
|
||||
| Conflict resolution | No merges within a worktree means no conflicts within a worktree. |
|
||||
|
||||
### `.gsd/` Tracking Model
|
||||
|
||||
**Tracked in git (travels with the branch):**
|
||||
```
|
||||
.gsd/milestones/ — roadmaps, plans, summaries, research, contexts, task plans/summaries
|
||||
.gsd/PROJECT.md — project overview
|
||||
.gsd/DECISIONS.md — architectural decision register
|
||||
.gsd/REQUIREMENTS.md — requirements register
|
||||
.gsd/QUEUE.md — work queue
|
||||
```
|
||||
|
||||
**Gitignored (ephemeral, runtime, infrastructure):**
|
||||
```
|
||||
.gsd/runtime/ — dispatch records, timeout tracking
|
||||
.gsd/activity/ — JSONL session dumps
|
||||
.gsd/worktrees/ — git worktree working directories
|
||||
.gsd/auto.lock — crash detection sentinel
|
||||
.gsd/metrics.json — token/cost accumulator
|
||||
.gsd/completed-units.json — dispatch idempotency tracker
|
||||
.gsd/STATE.md — derived state cache (rebuilt by deriveState())
|
||||
.gsd/gsd.db — SQLite cache (rebuilt from tracked markdown by importers)
|
||||
.gsd/DISCUSSION-MANIFEST.json — discussion phase tracking
|
||||
.gsd/milestones/**/*-CONTINUE.md — interrupted-work markers
|
||||
.gsd/milestones/**/continue.md — legacy continue markers
|
||||
```
|
||||
|
||||
### `.gitignore` Update
|
||||
|
||||
Replace the current blanket `.gsd/` ignore with explicit runtime-only ignores:
|
||||
|
||||
```gitignore
|
||||
# ── GSD: Runtime / Ephemeral ─────────────────────────────────
|
||||
.gsd/auto.lock
|
||||
.gsd/completed-units.json
|
||||
.gsd/STATE.md
|
||||
.gsd/metrics.json
|
||||
.gsd/gsd.db
|
||||
.gsd/activity/
|
||||
.gsd/runtime/
|
||||
.gsd/worktrees/
|
||||
.gsd/DISCUSSION-MANIFEST.json
|
||||
.gsd/milestones/**/*-CONTINUE.md
|
||||
.gsd/milestones/**/continue.md
|
||||
```
|
||||
|
||||
Planning artifacts (milestones/, PROJECT.md, DECISIONS.md, REQUIREMENTS.md, QUEUE.md) are NOT in `.gitignore` and are tracked normally.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Code Deletion
|
||||
|
||||
| File | Lines Deleted | What's Removed |
|
||||
|------|--------------|----------------|
|
||||
| `auto-worktree.ts` | ~246 | `mergeSliceToMilestone()`, `shouldUseWorktreeIsolation()`, `getMergeToMainMode()`, slice merge guards |
|
||||
| `git-service.ts` | ~250 | `mergeSliceToMain()`, conflict resolution, runtime stripping post-merge, `ensureSliceBranch()`, `switchToMain()` |
|
||||
| `git-self-heal.ts` | ~86 | `abortAndReset()`, `withMergeHeal()` (merge-specific recovery) |
|
||||
| `auto.ts` | ~150 | Merge dispatch guards, `fix-merge` dispatch path, branch-mode routing |
|
||||
| `worktree.ts` | ~40 | `getSliceBranchName()`, `ensureSliceBranch()`, `mergeSliceToMain()` delegates |
|
||||
| **Test files** | ~11 files | `auto-worktree-merge.test.ts`, `auto-worktree-milestone-merge.test.ts`, merge-related test cases |
|
||||
| **Total** | **~770+ lines** | |
|
||||
|
||||
### What `mergeMilestoneToMain()` Becomes
|
||||
|
||||
The function simplifies dramatically:
|
||||
1. Auto-commit any dirty state in worktree
|
||||
2. `chdir` back to main repo root
|
||||
3. `git checkout main`
|
||||
4. `git merge --squash milestone/<MID>`
|
||||
5. `git commit` with milestone summary
|
||||
6. Remove worktree + delete branch
|
||||
|
||||
No conflict categorization. No runtime file stripping. No `.gsd/` special handling. Planning artifacts merge cleanly because they're in `.gsd/milestones/M001/` which doesn't exist on `main` until this merge.
|
||||
|
||||
### What `smartStage()` Becomes
|
||||
|
||||
The force-add of `GSD_DURABLE_PATHS` is no longer needed — planning artifacts are not gitignored, so `git add -A` picks them up naturally. The function reduces to:
|
||||
|
||||
1. `git add -A`
|
||||
2. `git reset HEAD -- <runtime paths>` (unstage runtime files)
|
||||
|
||||
The `_runtimeFilesCleanedUp` one-time migration logic can also be removed.
|
||||
|
||||
### What Happens to `handleAgentEnd()`
|
||||
|
||||
After any unit completes:
|
||||
1. Invalidate caches
|
||||
2. `autoCommitCurrentBranch()` — commits on the one and only branch
|
||||
3. `verifyExpectedArtifact()` — file is always on the current branch (no branch switching)
|
||||
4. Persist completion key
|
||||
|
||||
The "Path A fix" (lines 937-953) becomes the only path. No branch mismatch possible.
|
||||
|
||||
### What Happens to `fix-merge`
|
||||
|
||||
The `fix-merge` dispatch unit type is eliminated. Within a worktree, there are no merges that can conflict. The only merge is milestone→main (squash), and if that conflicts (rare, parallel milestone edge case), it's handled as a one-time resolution at milestone completion — not a dispatch loop.
|
||||
|
||||
### Backwards Compatibility
|
||||
|
||||
The `shouldUseWorktreeIsolation()` three-tier preference resolution is replaced by a single behavior: worktree isolation is always used. The `git.isolation: "branch"` preference is deprecated.
|
||||
|
||||
Projects with existing `gsd/M001/S01` slice branches can still be read by state derivation, but new work never creates slice branches.
|
||||
|
||||
### Risks
|
||||
|
||||
**1. Parallel milestone code conflicts at squash-merge time**
|
||||
|
||||
If two milestones modify the same source file, the second squash-merge to `main` will conflict. Mitigation: `git fetch origin main && git rebase main` before squash-merge. This is standard practice and rare in single-user workflows.
|
||||
|
||||
**2. Loss of per-slice git history after squash**
|
||||
|
||||
Squash merge collapses all commits into one on `main`. Mitigations:
|
||||
- Commit messages tag slices (`feat(M001/S01/T01):`) — filterable with `git log --grep`
|
||||
- The milestone branch can be preserved (not deleted) if history is needed
|
||||
- Alternative: `merge --no-ff` instead of `--squash` to keep history on `main`
|
||||
|
||||
**3. SQLite DB desync after `git reset`**
|
||||
|
||||
If tracked markdown rolls back via `git reset --hard`, the gitignored `gsd.db` doesn't. Mitigation: the importer layer (M001/S02) rebuilds the DB from markdown on startup. The DB is a cache, markdown is truth.
|
||||
|
||||
**4. Disk space with multiple worktrees**
|
||||
|
||||
Each worktree duplicates the working directory (including `node_modules`). Mitigation: single active milestone at a time (single-user workflow), immediate cleanup after completion.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### A. Keep slice branches, fix visibility with immediate mini-merges
|
||||
|
||||
After `research-slice` or `plan-slice`, immediately merge the slice branch back to the milestone branch. This fixes the loop detection bug but retains all merge complexity.
|
||||
|
||||
**Rejected:** Adds another merge path instead of removing the root cause. Still requires conflict resolution, self-healing, branch switching.
|
||||
|
||||
### B. Keep `.gsd/` gitignored, bootstrap from git history for manual worktrees
|
||||
|
||||
When GSD detects an empty `.gsd/` in a worktree, reconstruct state from the branch's git history using `git show <commit>:.gsd/...`.
|
||||
|
||||
**Rejected:** Recovery logic, not architecture. Doesn't fix the fundamental problem of branch-agnostic state. Fails when git history has been rewritten.
|
||||
|
||||
### C. Branch-scoped `.gsd/` directories (`.gsd/branches/<branch-name>/milestones/...`)
|
||||
|
||||
Each branch writes to a namespaced subdirectory within `.gsd/`.
|
||||
|
||||
**Rejected:** Adds complexity instead of removing it. Requires renaming/moving on branch creation, doesn't work with standard git tools (`git checkout` doesn't rename directories).
|
||||
|
||||
## Validation
|
||||
|
||||
This architecture was stress-tested by three independent models:
|
||||
|
||||
**Gemini 2.5 Pro** identified 6 attack vectors. None broke the core model. Recommendations: pre-flight rebase before squash-merge (adopted), heartbeat locks (already exists), DB rebuild on startup (adopted via M001/S02 importers).
|
||||
|
||||
**GPT-5.4 (Codex)** read the full codebase and confirmed the model is sound. Identified that `smartStage()` already force-adds durable paths (validating the tracked-artifact approach) and that `resolveMainWorktreeRoot` in PR #487 is architecturally wrong (adopted — PR to be closed).
|
||||
|
||||
**Codebase analysis** confirmed `.gsd/milestones/` is already partially tracked on `main` despite the `.gitignore`, that `GSD_DURABLE_PATHS` exists as a code-level acknowledgment that planning artifacts should be tracked, and that the README already documents the correct runtime-only gitignore pattern.
|
||||
|
||||
### Codex (GPT-5.4) Dissent — "No Slice Branches Is a Redesign"
|
||||
|
||||
Codex read the full codebase and raised 4 concerns. Each is addressed:
|
||||
|
||||
**Concern 1: "Crash after slice done but before integration — today the runtime detects orphaned slice branches and merges them."**
|
||||
|
||||
Rebuttal: In the branchless model, there is no integration step to crash between. Slice work is committed directly on the milestone branch. On restart, `deriveState()` reads the branch state as-is. The orphaned-branch recovery path exists solely because of slice branches — removing branches removes the failure mode it recovers from.
|
||||
|
||||
**Concern 2: "Concurrent edits to shared root docs (PROJECT.md, DECISIONS.md) from two terminals."**
|
||||
|
||||
Rebuttal: Valid edge case. If `/gsd queue` edits `DECISIONS.md` on `main` while auto-mode edits it in a worktree, there's a content conflict at squash-merge time. This is a standard git content conflict — no different from two developers editing the same file. Handled by normal merge resolution. Not caused by or solved by slice branches.
|
||||
|
||||
**Concern 3: "Slice→milestone merges provide continuous integration. Removing them pushes conflict discovery to the end."**
|
||||
|
||||
Rebuttal: In a single-user sequential workflow, there is nothing to integrate against within a worktree. Each slice builds on the previous one. The only conflict source is `main` diverging (e.g., another milestone merging first), which slice→milestone merges don't catch anyway — they merge within the worktree, not against `main`. Pre-flight rebase before squash-merge catches this more directly.
|
||||
|
||||
**Concern 4: "Replace slice branches with another explicit slice-boundary primitive. Don't just delete them."**
|
||||
|
||||
Response: Accepted in spirit. Commits with conventional tags (`feat(M001/S01):`, `feat(M001/S01/T01):`) serve as the slice boundary primitive. `git log --grep="M001/S01"` isolates a slice's history. `git revert` targets specific commits. Git tags (`gsd/M001/S01-complete`) can mark slice completion if needed. The boundary primitive is commit metadata, not branches.
|
||||
|
||||
## Action Items
|
||||
|
||||
1. Close PR #487 (`resolveMainWorktreeRoot`) — contradicts this architecture
|
||||
2. Implement as a GSD milestone with phases:
|
||||
- Update `.gitignore` and force-add existing planning artifacts
|
||||
- Remove slice branch creation/switching/merging code
|
||||
- Simplify `mergeMilestoneToMain()` and `smartStage()`
|
||||
- Remove `fix-merge` dispatch unit
|
||||
- Remove branch-mode isolation (`git.isolation: "branch"`)
|
||||
- Update/delete 11 test files
|
||||
- Update README suggested gitignore
|
||||
- Migration path for existing projects with slice branches
|
||||
|
|
@ -1,738 +0,0 @@
|
|||
# ADR-003: Auto-Mode Pipeline Simplification
|
||||
|
||||
**Status:** Proposed
|
||||
**Date:** 2026-03-18
|
||||
**Deciders:** Lex Christopherson
|
||||
**Related:** ADR-001 (branchless worktree architecture), ADR-002 (external state directory)
|
||||
**Audited by:** Claude Opus 4.6, OpenAI Codex — findings incorporated below.
|
||||
|
||||
## Context
|
||||
|
||||
GSD auto-mode orchestrates a multi-session pipeline where each "unit" of work runs in a fresh LLM session. The pipeline for a single milestone with N slices and M tasks per slice runs through:
|
||||
|
||||
```
|
||||
research-milestone → plan-milestone →
|
||||
(research-slice → plan-slice → execute-task × M → complete-slice → reassess-roadmap) × N →
|
||||
validate-milestone → complete-milestone
|
||||
```
|
||||
|
||||
The exact session count depends on profile. The "quality" profile runs all phases. The "balanced" profile skips slice research by default. The "budget" profile skips milestone research, slice research, reassessment, and milestone validation. This ADR uses the quality profile as the baseline for analysis — it represents the full pipeline and the worst-case ceremony overhead.
|
||||
|
||||
For a typical 4-slice, 3-task milestone under the quality profile:
|
||||
- 1 research-milestone + 1 plan-milestone
|
||||
- Per slice: research-slice (skipped for S01) + plan-slice + 3 execute-task + complete-slice + reassess-roadmap (skipped for last slice, since all slices are done)
|
||||
- Per-slice total for S01: 0 + 1 + 3 + 1 + 1 = 6
|
||||
- Per-slice total for S02–S04: 1 + 1 + 3 + 1 + 1 = 7 (S04 skips reassess since it's the last completed slice: 6)
|
||||
- Slices total: 6 + 7 + 7 + 6 = 26
|
||||
- Plus: 1 validate-milestone + 1 complete-milestone
|
||||
|
||||
**Total: 30 sessions.** Only 12 are task execution. The remaining 18 are pipeline ceremony.
|
||||
|
||||
(The "balanced" profile drops slice research for S02-S04: 30 - 3 = 27 sessions. The "budget" profile drops milestone research, all slice research, reassessment, and validation: 30 - 1 - 3 - 3 - 1 = 22 sessions.)
|
||||
|
||||
### The Token Tax
|
||||
|
||||
Every fresh session re-ingests static context via prompt inlining. The `auto-prompts.ts` builders (1,099 lines) inline the following files into nearly every unit type:
|
||||
|
||||
| File | Inlined Into | Changes After |
|
||||
|------|-------------|---------------|
|
||||
| ROADMAP | research-slice, plan-slice, execute-task (excerpt), complete-slice, reassess, validate, complete-milestone | plan-milestone (rare reassess rewrites) |
|
||||
| DECISIONS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-milestone, validate | Appended occasionally during execution |
|
||||
| REQUIREMENTS.md | research-milestone, plan-milestone, research-slice, plan-slice, complete-slice, complete-milestone, validate | Updated during complete-slice |
|
||||
| KNOWLEDGE.md | research-milestone, plan-milestone, research-slice, plan-slice, execute-task, complete-slice, complete-milestone, validate | Appended occasionally during execution |
|
||||
| PROJECT.md | research-milestone, plan-milestone, complete-milestone, validate | Rarely updated |
|
||||
|
||||
The ROADMAP alone is inlined into 7 unit types. It never changes during normal execution. This is a static document being re-tokenized per session at a cost of 5–20K tokens each time.
|
||||
|
||||
For the 30-session milestone above (quality profile), context re-ingestion costs approximately:
|
||||
- ROADMAP: 7 re-inlines × ~10K tokens = 70K tokens
|
||||
- DECISIONS: 6 re-inlines × ~5K tokens = 30K tokens
|
||||
- REQUIREMENTS: 8 re-inlines × ~5K tokens = 40K tokens
|
||||
- KNOWLEDGE: 8 re-inlines × ~3K tokens = 24K tokens
|
||||
- Templates (research, plan, task-plan, etc.): ~2K per inline × ~10 units = 20K tokens
|
||||
- Dependency summaries: ~8K per slice plan × 3 non-S01 slices = 24K tokens
|
||||
|
||||
**Total context re-ingestion overhead: ~208K tokens per milestone.** This is pure waste — the LLM re-reads documents it already processed in prior sessions, gaining no new information.
|
||||
|
||||
### The Lossy Handoff Problem
|
||||
|
||||
Each session boundary is a lossy compression step. The research-milestone agent reads the codebase and writes a RESEARCH.md. The plan-milestone agent reads that research and produces a ROADMAP. The research-slice agent reads the ROADMAP and explores the codebase again for its slice scope. The plan-slice agent reads that slice research and produces a PLAN.
|
||||
|
||||
This is a game of telephone:
|
||||
|
||||
```
|
||||
Codebase → [researcher reads code] → RESEARCH.md → [planner reads research] → ROADMAP
|
||||
↑ often re-reads the same code
|
||||
```
|
||||
|
||||
The research prompt explicitly says: *"Write for the roadmap planner."* The plan prompt says: *"Trust the research. Don't re-read code."* But planners routinely re-read code because research is a lossy compression — a summary of what one LLM session saw, not the thing itself. The fidelity loss compounds at each handoff.
|
||||
|
||||
### The Machinery Tax
|
||||
|
||||
The multi-session pipeline requires extensive orchestration machinery to handle edge cases, failures, and recovery:
|
||||
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `auto-recovery.ts` | 591 | Artifact resolution, loop remediation, skip/rerun logic |
|
||||
| `auto-stuck-detection.ts` | 220 | Dispatch loop detection, lifetime caps, stub recovery |
|
||||
| `auto-idempotency.ts` | 150 | Skip completed units, phantom loop detection, stale key recovery |
|
||||
| `session-forensics.ts` | 536 | Post-mortem analysis, crash briefings, deep diagnostics |
|
||||
| `auto-timeout-recovery.ts` | 262 | Resume after timeout, recovery briefing synthesis |
|
||||
| `crash-recovery.ts` | 108 | Lock file management, crash detection |
|
||||
| `auto-post-unit.ts` | 591 | Post-agent processing, verification, commits, state sync |
|
||||
| `auto-verification.ts` | 229 | Post-task verification enforcement |
|
||||
| `verification-gate.ts` | 643 | Test/lint/audit gate runner |
|
||||
| `doctor-proactive.ts` | 292 | Health checks, proactive healing, escalation detection |
|
||||
| **Total** | **3,622** | **Recovery, verification, and post-processing** |
|
||||
|
||||
This is 3,622 lines of code managing the complexity of a 15-rule dispatch table across 13 unit types. Much of this machinery exists because the pipeline has so many sessions that failures, timeouts, and stuck states are statistically likely.
|
||||
|
||||
### The Ceremony Sessions
|
||||
|
||||
Six of the 13 unit types produce no code. They exist purely to manage the pipeline:
|
||||
|
||||
| Unit Type | What It Does | Sessions per Milestone (quality, 4-slice) |
|
||||
|-----------|-------------|----------------------|
|
||||
| research-milestone | Reads codebase, writes RESEARCH.md | 1 |
|
||||
| research-slice | Reads codebase for slice scope, writes slice RESEARCH.md | 3 (skipped for S01) |
|
||||
| complete-slice | Re-reads ROADMAP + plan + all task summaries, writes slice SUMMARY.md + UAT.md | 4 |
|
||||
| reassess-roadmap | Re-reads ROADMAP + slice summary, almost always says "roadmap is fine" | 3 (skipped after last slice) |
|
||||
| validate-milestone | Re-reads ROADMAP + all slice summaries, writes VALIDATION.md | 1 |
|
||||
| complete-milestone | Re-reads ROADMAP + all slice summaries, writes SUMMARY.md | 1 |
|
||||
|
||||
Total: 1 + 3 + 4 + 3 + 1 + 1 = **13 ceremony sessions** (under quality profile), each consuming 12–37K tokens of prompt context. Under the balanced profile this drops to 9 (no slice research). These sessions burn tokens re-reading documents that other sessions already produced, producing intermediate artifacts that downstream sessions then re-read.
|
||||
|
||||
### Root Cause
|
||||
|
||||
The pipeline was designed around a paradigm where:
|
||||
1. LLM context windows are small (32K–100K tokens)
|
||||
2. Sessions are expensive, so specialize each one
|
||||
3. Handoffs between specialized agents produce better results than generalist sessions
|
||||
4. Research → plan → execute is the "correct" decomposition of intellectual work
|
||||
|
||||
With 200K+ token context windows and prompt caching, assumptions 1-2 are obsolete. Assumption 3 is demonstrably false — handoffs lose fidelity. Assumption 4 confuses human workflow patterns with LLM-optimal patterns. An LLM with tool access is already researching while it plans. Forcing it to serialize research into a document, then read that document in a new session, is an artificial bottleneck.
|
||||
|
||||
## Decision
|
||||
|
||||
**Collapse the pipeline from 13 unit types to 5. Merge research into planning. Fold completion into post-unit mechanical processing. Replace LLM-driven validation with mechanical verification aggregation.**
|
||||
|
||||
### The Simplified Pipeline
|
||||
|
||||
```
|
||||
plan-milestone → (plan-slice → execute-task × M) × N → done
|
||||
```
|
||||
|
||||
Note: `discuss` is an interactive human-facing session, not an auto-mode unit — it's not counted in session math. It continues to work as-is.
|
||||
|
||||
For the same 4-slice, 3-task milestone:
|
||||
- 1 plan-milestone (S01 plan + task plans produced inline via single-slice fast path if applicable)
|
||||
- S01: plan-slice skipped (milestone planner already explored) + 3 execute-task = 3
|
||||
- S02–S04: plan-slice + 3 execute-task = 4 each × 3 slices = 12
|
||||
|
||||
**Total: 1 + 3 + 12 = 16 sessions** (down from 30). The 14 eliminated sessions were the highest-waste ones — each re-ingested context for minimal value.
|
||||
|
||||
### Unit Type Changes
|
||||
|
||||
#### 1. Merge research-milestone INTO plan-milestone
|
||||
|
||||
**Current:** Two sessions. Researcher explores codebase, writes RESEARCH.md. Planner reads RESEARCH.md, writes ROADMAP.
|
||||
|
||||
**New:** One session. The plan-milestone agent explores the codebase directly and produces the ROADMAP. It has full tool access — it can read files, run commands, search code. The "research" happens naturally as part of planning, not as a serialized intermediary.
|
||||
|
||||
**What changes:**
|
||||
- The plan-milestone prompt gains the research-milestone's exploration instructions: "Explore relevant code, check technologies, identify constraints."
|
||||
- The plan-milestone prompt drops "Trust the research" — there is no research document to trust.
|
||||
- The RESEARCH.md artifact becomes optional. If the planner wants to capture notes for downstream reference, it can write one. But it's not required, and downstream units don't depend on it.
|
||||
- Skill discovery instructions move into the plan-milestone prompt.
|
||||
- The research-milestone template (`prompts/research-milestone.md`) is retained but only used when explicitly dispatched via `/gsd dispatch research`.
|
||||
|
||||
**Token savings:** ~1 full session (12–37K tokens of prompt context) + the RESEARCH.md document no longer re-inlined into plan-milestone (~5–15K tokens).
|
||||
|
||||
**Quality impact:** Positive. The planner has direct access to the codebase instead of reading a lossy summary. It can verify assumptions in real time instead of trusting a prior session's interpretation.
|
||||
|
||||
#### 2. Merge research-slice INTO plan-slice
|
||||
|
||||
**Current:** Two sessions per non-S01 slice. Slice researcher explores codebase for slice scope, writes slice RESEARCH.md. Slice planner reads that research, writes PLAN.md + task plans.
|
||||
|
||||
**New:** One session. The plan-slice agent explores the relevant code directly and produces the slice plan with task plans.
|
||||
|
||||
**What changes:**
|
||||
- The plan-slice prompt gains exploration instructions: "Read the relevant code for this slice's scope before decomposing."
|
||||
- The plan-slice prompt drops "Trust the research" — there is no slice research document.
|
||||
- Slice RESEARCH.md becomes optional (same as milestone research above).
|
||||
- The research-slice template is retained for explicit dispatch.
|
||||
- The `skip_slice_research` preference becomes the default behavior rather than an opt-in.
|
||||
- The dispatch rule "planning (no research, not S01) → research-slice" is removed.
|
||||
|
||||
**Token savings:** ~1 session per non-S01 slice × (N-1) slices. For a 4-slice milestone: 3 sessions × 12–37K tokens = 36–111K tokens.
|
||||
|
||||
**Quality impact:** Positive. The planner can read actual code files instead of a summary. It verifies file paths, function signatures, and patterns directly rather than trusting a researcher's notes.
|
||||
|
||||
#### 3. Fold complete-slice INTO mechanical post-unit processing
|
||||
|
||||
**Current:** After all tasks in a slice complete, `deriveState()` emits the `summarizing` phase, dispatching a separate complete-slice LLM session that re-reads the ROADMAP, slice plan, and ALL task summaries to write a slice SUMMARY.md and UAT.md.
|
||||
|
||||
**New:** Slice completion moves to a **post-gate mechanical closeout** in `auto-post-unit.ts`, not into the final executor's prompt. After the last execute-task's verification gate passes:
|
||||
|
||||
1. The post-unit processing detects that all tasks in the slice are done (same check `deriveState()` uses to emit `summarizing`).
|
||||
2. It runs mechanical slice completion: aggregate task summaries into a SUMMARY.md using structured frontmatter, generate a UAT.md from the slice plan's verification section, mark the slice done in the ROADMAP.
|
||||
3. If the mechanical summary is insufficient (complex slices where structured aggregation loses important narrative), the system detects low quality (e.g., summary is below a character threshold) and dispatches a standalone complete-slice LLM session as recovery.
|
||||
|
||||
**Why post-gate, not in the executor prompt:**
|
||||
- Codex audit identified that folding completion into execute-task creates a verification-retry ordering problem: if the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure would retry against incorrect derived state (the slice appears complete when it isn't).
|
||||
- Post-gate processing runs after verification succeeds, so state transitions are always consistent.
|
||||
- The executor's context budget is fully available for its actual work.
|
||||
|
||||
**What changes in `deriveState()`:**
|
||||
- The `summarizing` phase still exists in state derivation (all tasks done, slice not marked complete).
|
||||
- The dispatch table no longer maps `summarizing → complete-slice`. Instead, post-unit processing handles the transition synchronously.
|
||||
- If post-unit mechanical completion fails or produces low-quality output, the `summarizing` phase still exists as a dispatch target and the system falls back to dispatching a complete-slice LLM session.
|
||||
|
||||
**What changes:**
|
||||
- `auto-post-unit.ts` gains a `mechanicalSliceCompletion()` function.
|
||||
- The complete-slice dispatch rule is removed from the default path but retained as a fallback.
|
||||
- The complete-slice template is retained for recovery and explicit dispatch.
|
||||
- The `summarizing` phase in `state.ts` is unchanged — it serves as the fallback trigger if mechanical completion doesn't run.
|
||||
|
||||
**Full completion contract preserved:** The mechanical completion writes all three required artifacts (SUMMARY.md, UAT.md, ROADMAP checkbox) — matching the current complete-slice contract. It also handles REQUIREMENTS.md updates and KNOWLEDGE.md/DECISIONS.md appendix that the current complete-slice prompt performs (see Risk 5 below for details).
|
||||
|
||||
**Token savings:** ~1 session per slice × N slices. For a 4-slice milestone: 4 sessions × 12–37K tokens = 48–148K tokens.
|
||||
|
||||
**Quality impact:** For most slices, the mechanical summary is sufficient — it aggregates structured frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) from task summaries. For complex slices with important narrative context, the LLM fallback preserves quality.
|
||||
|
||||
#### 4. Eliminate reassess-roadmap (make opt-in)
|
||||
|
||||
**Current:** After every slice completion, a reassess-roadmap session re-reads the ROADMAP and slice summary, then almost always writes "roadmap is fine."
|
||||
|
||||
**New:** Reassessment is eliminated by default. The plan-slice agent for the next slice serves as the natural reassessment point — it reads the ROADMAP and prior slice summaries, and can adjust its plan if the ground has shifted.
|
||||
|
||||
**What changes:**
|
||||
- The reassess-roadmap dispatch rule fires only when the `reassess_after_slice` preference is enabled (default: off, was effectively always-on).
|
||||
- The plan-slice prompt gains a reassessment preamble: "Before planning this slice, verify that the roadmap's assumptions still hold given prior slice summaries. If the remaining roadmap needs adjustment, modify it before proceeding."
|
||||
- The `checkNeedsReassessment()` function in auto-prompts.ts becomes a preference gate, not a mandatory check.
|
||||
|
||||
**Token savings:** ~1 session per completed non-final slice × (N-1) slices minus those already skipped. For a 4-slice milestone under quality profile: 3 sessions × 12–37K tokens = 36–111K tokens.
|
||||
|
||||
**Quality impact:** Neutral. The reassess prompt says *"Bias strongly toward 'roadmap is fine.'"* — acknowledging that most reassessments produce no change. JIT reassessment during the next plan-slice is more informed (has the next slice's context) and costs zero additional tokens.
|
||||
|
||||
#### 5. Replace validate-milestone with mechanical verification
|
||||
|
||||
**Current:** An LLM session re-reads the ROADMAP and all slice summaries, checks success criteria against delivery evidence, and writes a VALIDATION.md with a verdict. It also inlines UAT artifacts from slices with `uat_dispatch` enabled.
|
||||
|
||||
**New:** The system mechanically aggregates verification results from all tasks and slices. The canonical verification data sources are:
|
||||
|
||||
1. **`T##-VERIFY.json`** files (written by `writeVerificationJSON()` in `verification-evidence.ts`) — machine-readable per-task verification results with command, exit code, verdict, duration, and blocking status.
|
||||
2. **`S##-UAT.md`** files (when `uat_dispatch` is enabled) — human or artifact-driven UAT outcomes.
|
||||
3. **Task summary frontmatter** `verification_result` field — a human-readable pass/fail string (not structured, used as a secondary signal).
|
||||
|
||||
The aggregator reads `T##-VERIFY.json` as the primary source of truth, supplements with UAT artifacts, and produces a deterministic VALIDATION.md.
|
||||
|
||||
**What changes:**
|
||||
- A new `aggregateMilestoneVerification()` function collects `T##-VERIFY.json` files and `S##-UAT.md` files across all slices.
|
||||
- The function produces a VALIDATION.md with per-task and per-slice pass/fail status, UAT evidence, and an overall verdict.
|
||||
- The LLM-driven validate-milestone session is removed from the default pipeline.
|
||||
- The validate-milestone template is retained for explicit dispatch (users who want LLM-driven validation can run `/gsd dispatch validate`).
|
||||
- The `skip_milestone_validation` preference (which writes a pass-through VALIDATION.md) becomes the default behavior, with the mechanical aggregation replacing it.
|
||||
|
||||
```typescript
|
||||
async function aggregateMilestoneVerification(base: string, mid: string): Promise<ValidationResult> {
|
||||
const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
|
||||
const checks: EvidenceCheckJSON[] = [];
|
||||
const uatResults: { sliceId: string; content: string }[] = [];
|
||||
|
||||
for (const slice of roadmap.slices) {
|
||||
// Primary source: T##-VERIFY.json files (machine-readable, written by verification-gate.ts)
|
||||
const tDir = resolveTasksDir(base, mid, slice.id);
|
||||
if (tDir) {
|
||||
const verifyFiles = resolveTaskFiles(tDir, "VERIFY");
|
||||
for (const file of verifyFiles) {
|
||||
const content = await loadFile(join(tDir, file));
|
||||
if (content) {
|
||||
const evidence: EvidenceJSON = JSON.parse(content);
|
||||
checks.push(...evidence.checks);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Secondary source: S##-UAT.md (when uat_dispatch enabled)
|
||||
const uatResultFile = resolveSliceFile(base, mid, slice.id, "UAT");
|
||||
if (uatResultFile) {
|
||||
const uatContent = await loadFile(uatResultFile);
|
||||
if (uatContent) uatResults.push({ sliceId: slice.id, content: uatContent });
|
||||
}
|
||||
}
|
||||
|
||||
const allChecksPassed = checks.every(c => c.verdict === "pass");
|
||||
const hasUatFailures = uatResults.some(r => r.content.includes("❌") || r.content.includes("FAIL"));
|
||||
const verdict = allChecksPassed && !hasUatFailures ? "pass" : "needs-attention";
|
||||
|
||||
return { verdict, checks, uatResults };
|
||||
}
|
||||
```
|
||||
|
||||
**Token savings:** 1 session × 12–37K tokens. This session is one of the most context-heavy — it inlines the ROADMAP + all slice summaries + all UAT results.
|
||||
|
||||
**Quality impact:** Positive. Mechanical verification is deterministic and complete. LLM validation is subjective and can miss things. The verification gate and UAT system already do the hard work — the validate session was a redundant re-check. The `T##-VERIFY.json` artifacts are the canonical machine-readable source, not task summary frontmatter.
|
||||
|
||||
#### 6. Replace complete-milestone with mechanical completion
|
||||
|
||||
**Current:** An LLM session re-reads the ROADMAP and all slice summaries to write a SUMMARY.md.
|
||||
|
||||
**New:** The system produces a milestone summary mechanically by aggregating slice summaries. The summary includes: milestone title, success criteria with pass/fail status, slice completion dates, key decisions made, and patterns established (all extracted from structured frontmatter in slice summaries).
|
||||
|
||||
**What changes:**
|
||||
- A new `generateMilestoneSummary()` function reads all slice SUMMARY.md files, extracts frontmatter fields, and produces a structured milestone SUMMARY.md.
|
||||
- The complete-milestone dispatch rule is replaced with a synchronous post-processing step after the validation artifact is written.
|
||||
- The complete-milestone template is retained for explicit dispatch.
|
||||
|
||||
**What changes in `deriveState()`:**
|
||||
- The `validating-milestone` and `completing-milestone` phases still exist in state derivation.
|
||||
- When mechanical validation + completion runs synchronously in post-unit processing, these phases are transient — `deriveState()` emits them, but the mechanical processing writes the VALIDATION.md and SUMMARY.md artifacts before the next dispatch cycle, so the phases resolve immediately.
|
||||
- If mechanical processing fails, the phases remain as dispatch targets and the system falls back to dispatching LLM sessions for validation and/or completion.
|
||||
|
||||
**Token savings:** 1 session × 12–37K tokens.
|
||||
|
||||
**Quality impact:** Neutral. Milestone summaries are archival — they capture what happened, not make decisions. Mechanical aggregation of structured frontmatter is more reliable than an LLM re-interpreting task summaries.
|
||||
|
||||
### Dispatch Table Changes
|
||||
|
||||
**Current: 15 rules.**
|
||||
|
||||
```
|
||||
1. rewrite-docs (override gate)
|
||||
2. summarizing → complete-slice
|
||||
3. run-uat (post-completion)
|
||||
4. reassess-roadmap (post-completion)
|
||||
5. needs-discussion → stop
|
||||
6. pre-planning (no context) → stop
|
||||
7. pre-planning (no research) → research-milestone
|
||||
8. pre-planning (has research) → plan-milestone
|
||||
9. planning (no research, not S01) → research-slice
|
||||
10. planning → plan-slice
|
||||
11. replanning-slice → replan-slice
|
||||
12. executing → execute-task (recovery)
|
||||
13. executing → execute-task
|
||||
14. validating-milestone → validate-milestone
|
||||
15. completing-milestone → complete-milestone
|
||||
```
|
||||
|
||||
**New: 11 rules.**
|
||||
|
||||
```
|
||||
1. rewrite-docs (override gate) [unchanged]
|
||||
2. summarizing → complete-slice [FALLBACK ONLY — fires when mechanical completion didn't run]
|
||||
3. run-uat (post-completion) [unchanged, preference-gated]
|
||||
4. needs-discussion → stop [unchanged]
|
||||
5. pre-planning (no context) → stop [unchanged]
|
||||
6. pre-planning → plan-milestone [rules 7+8 merged — research folded in]
|
||||
7. planning → plan-slice [rules 9+10 merged — research folded in]
|
||||
8. replanning-slice → replan-slice [unchanged]
|
||||
9. executing → execute-task (recovery) [unchanged]
|
||||
10. executing → execute-task [unchanged]
|
||||
11. validating-milestone → validate-milestone [FALLBACK ONLY — fires when mechanical validation didn't run]
|
||||
12. completing-milestone → complete-milestone [FALLBACK ONLY — fires when mechanical completion didn't run]
|
||||
```
|
||||
|
||||
Note: Rules 2, 11, and 12 are retained as **fallbacks** for cases where mechanical processing fails. They do not fire in the normal path because post-unit processing writes the required artifacts before the next dispatch cycle. This means `deriveState()` is unchanged — it still emits `summarizing`, `validating-milestone`, and `completing-milestone` phases. The change is that these phases are normally resolved mechanically before dispatch evaluates them.
|
||||
|
||||
**Removed rules (no longer in default path):**
|
||||
- `reassess-roadmap` — folded into next plan-slice (or opt-in preference)
|
||||
- `pre-planning (no research) → research-milestone` — merged into plan-milestone
|
||||
- `planning (no research, not S01) → research-slice` — merged into plan-slice
|
||||
|
||||
### Prompt Changes
|
||||
|
||||
#### plan-milestone.md — gains exploration instructions
|
||||
|
||||
Add before the planning steps:
|
||||
|
||||
```markdown
|
||||
## Explore First, Then Decompose
|
||||
|
||||
You have full tool access. Before decomposing into slices:
|
||||
1. Explore the relevant codebase — read key files, understand existing patterns, identify constraints.
|
||||
2. For unfamiliar libraries, use `resolve_library` / `get_library_docs`.
|
||||
3. Skill Discovery ({{skillDiscoveryMode}}):{{skillDiscoveryInstructions}}
|
||||
|
||||
Narrate key findings as you go. If findings are significant enough to benefit downstream slice planners, write {{researchOutputPath}} — but only if the content would genuinely help. Don't write a research doc just because the template exists.
|
||||
```
|
||||
|
||||
#### plan-slice.md — gains exploration + reassessment preamble
|
||||
|
||||
Add before the planning steps:
|
||||
|
||||
```markdown
|
||||
## Verify Roadmap Assumptions
|
||||
|
||||
Before planning this slice, check whether the roadmap's assumptions still hold:
|
||||
- Review prior slice summaries (inlined above). Did anything change that affects this slice?
|
||||
- If the remaining roadmap needs adjustment, modify the unchecked slices in {{roadmapPath}} before proceeding.
|
||||
|
||||
## Explore Slice Scope
|
||||
|
||||
Read the relevant code for this slice before decomposing:
|
||||
1. Check the files and modules this slice will touch.
|
||||
2. Verify the approach described in the roadmap against the actual codebase state.
|
||||
3. If the roadmap's description of this slice is wrong or outdated, adjust your plan accordingly.
|
||||
```
|
||||
|
||||
### Context Inlining Changes
|
||||
|
||||
#### Reduce inlining for planning sessions — provide paths for stable documents
|
||||
|
||||
Planning sessions (plan-milestone, plan-slice) currently inline ROADMAP, DECISIONS, REQUIREMENTS, KNOWLEDGE, and PROJECT. Since these sessions now also explore the codebase (merged research), the total prompt size grows. To offset this, stable documents should be provided as file paths rather than inlined content for planning sessions.
|
||||
|
||||
**Current pattern:**
|
||||
```typescript
|
||||
inlined.push(await inlineFile(roadmapPath, roadmapRel, "Milestone Roadmap"));
|
||||
```
|
||||
|
||||
**New pattern for plan-milestone/plan-slice:**
|
||||
```typescript
|
||||
sourcePaths.push(`- Milestone Roadmap: \`${roadmapRel}\` — read this for the full slice decomposition`);
|
||||
```
|
||||
|
||||
The prompt header changes from "All relevant context has been preloaded below" to "Source files are listed below. Read them before proceeding."
|
||||
|
||||
**What stays inlined:**
|
||||
- **Task plan** in execute-task (it's the executor's authoritative contract — must be in prompt)
|
||||
- **Slice plan excerpt** in execute-task (goal/demo/verification — small and task-specific)
|
||||
- **Prior task summaries** in execute-task (carry-forward context — already budget-managed)
|
||||
- **Milestone context** in plan-milestone (it's the starting input — relatively small)
|
||||
|
||||
**What moves to file-path references:**
|
||||
- ROADMAP in plan-slice, complete-slice, reassess, validate, complete-milestone
|
||||
- DECISIONS.md everywhere except execute-task (where it's already omitted for minimal inline level)
|
||||
- REQUIREMENTS.md everywhere except execute-task
|
||||
- KNOWLEDGE.md everywhere (already uses `inlineFileSmart` for execute-task)
|
||||
- PROJECT.md everywhere
|
||||
|
||||
**Interaction with budget engine:** The current budget engine (`context-budget.ts`) truncates inlined content when it exceeds budget. Removing inlining means the LLM reads the full file via tool call. For most documents (ROADMAP ~3-10K chars, DECISIONS ~2-5K chars), the full read is within budget. For very large REQUIREMENTS.md files (>30K chars), the LLM may need to use the DB-scoped query (`inlineRequirementsFromDb` with slice scoping) or the compact formatter. The path reference should note: "For large files, use scoped queries."
|
||||
|
||||
**Risk: LLMs might not read referenced files.**
|
||||
|
||||
This is the most significant behavioral risk in this ADR. Inlined content forces processing. Path references require the LLM to decide to read. Mitigation:
|
||||
|
||||
1. **Mandatory read directives.** The prompt says "You MUST read the following files before proceeding" with a numbered list of 2-3 critical files. Not "read as needed" — a direct instruction.
|
||||
2. **Verification.** The plan-slice prompt requires citing the ROADMAP's slice description in its output (slice title, risk level, depends). If these don't match, the planner didn't read it.
|
||||
3. **Phased rollout.** Phase 4 (context reduction) is separate from Phase 1 (research merge). This allows measuring whether path references degrade plan quality before full rollout.
|
||||
4. **Fallback.** If path references prove unreliable, restore inlining for critical documents only (ROADMAP in plan-slice). The budget engine still handles truncation.
|
||||
|
||||
**Token savings (Phase 4 only):** Eliminates ~150K tokens of re-ingestion per milestone (revised from 208K — the execute-task sessions retain inlined content). The LLM reads files as needed via tool calls, cached by API prompt caching. Net savings are ~50-60% of the re-ingestion overhead, since the LLM still reads most files once per session.
|
||||
|
||||
### Post-Unit Processing Changes
|
||||
|
||||
#### Mechanical slice completion
|
||||
|
||||
After the last execute-task's verification gate passes and post-unit processing detects all tasks done:
|
||||
|
||||
```typescript
|
||||
async function mechanicalSliceCompletion(base: string, mid: string, sid: string): Promise<boolean> {
|
||||
const tDir = resolveTasksDir(base, mid, sid);
|
||||
if (!tDir) return false;
|
||||
|
||||
const summaryFiles = resolveTaskFiles(tDir, "SUMMARY").sort();
|
||||
const taskSummaries = await Promise.all(
|
||||
summaryFiles.map(async f => ({ file: f, summary: parseSummary(await loadFile(join(tDir, f)) ?? "") }))
|
||||
);
|
||||
|
||||
// Aggregate structured frontmatter
|
||||
const allProvides = taskSummaries.flatMap(t => t.summary.frontmatter.provides);
|
||||
const allKeyFiles = taskSummaries.flatMap(t => t.summary.frontmatter.key_files);
|
||||
const allDecisions = taskSummaries.flatMap(t => t.summary.frontmatter.key_decisions);
|
||||
const allPatterns = taskSummaries.flatMap(t => t.summary.frontmatter.patterns_established);
|
||||
const allAffects = taskSummaries.flatMap(t => t.summary.frontmatter.affects);
|
||||
|
||||
// Build slice SUMMARY.md from aggregated frontmatter
|
||||
const sliceSummary = formatSliceSummary({ sid, provides: allProvides, keyFiles: allKeyFiles, ... });
|
||||
|
||||
// Build UAT.md from slice plan's Verification section
|
||||
const slicePlanContent = await loadFile(resolveSliceFile(base, mid, sid, "PLAN"));
|
||||
const verificationSection = extractMarkdownSection(slicePlanContent, "Verification");
|
||||
const sliceUat = formatSliceUat(sid, verificationSection);
|
||||
|
||||
// Write all three artifacts atomically
|
||||
writeFileSync(sliceSummaryPath, sliceSummary);
|
||||
writeFileSync(sliceUatPath, sliceUat);
|
||||
markSliceDoneInRoadmap(base, mid, sid);
|
||||
|
||||
// Handle REQUIREMENTS.md updates (currently done by complete-slice prompt step 5)
|
||||
// Mechanical: mark requirements as Validated if all tasks covering them passed verification.
|
||||
await mechanicalRequirementsUpdate(base, mid, sid, taskSummaries);
|
||||
|
||||
// Handle DECISIONS.md appendix (currently done by complete-slice prompt step 8)
|
||||
// Mechanical: collect key_decisions from task summaries not already in DECISIONS.md
|
||||
await appendNewDecisions(base, taskSummaries);
|
||||
|
||||
// Handle KNOWLEDGE.md appendix (currently done by complete-slice prompt step 9)
|
||||
// Not mechanical — skip. Knowledge entries require judgment about what's genuinely useful.
|
||||
// The executor tasks already write KNOWLEDGE.md entries during execution (step 13 in execute-task).
|
||||
|
||||
return true;
|
||||
}
|
||||
```
|
||||
|
||||
**Fallback:** If `mechanicalSliceCompletion()` fails or produces output below a quality threshold (e.g., summary under 200 chars for a multi-task slice), the `summarizing` phase persists in `deriveState()` and the dispatch table's retained fallback rule dispatches a complete-slice LLM session.
|
||||
|
||||
#### Mechanical milestone validation
|
||||
|
||||
See `aggregateMilestoneVerification()` above (Section 5). Reads `T##-VERIFY.json` and `S##-UAT.md` as canonical sources.
|
||||
|
||||
#### Mechanical milestone summary
|
||||
|
||||
```typescript
|
||||
async function generateMilestoneSummary(base: string, mid: string): Promise<string> {
|
||||
const roadmap = parseRoadmap(await loadFile(resolveMilestoneFile(base, mid, "ROADMAP")));
|
||||
const sliceSummaries = [];
|
||||
|
||||
for (const slice of roadmap.slices) {
|
||||
const content = await loadFile(resolveSliceFile(base, mid, slice.id, "SUMMARY"));
|
||||
if (content) sliceSummaries.push({ id: slice.id, summary: parseSummary(content) });
|
||||
}
|
||||
|
||||
// Aggregate frontmatter fields across all slice summaries
|
||||
// Produce structured markdown from the aggregation
|
||||
return formatMilestoneSummary(roadmap, sliceSummaries);
|
||||
}
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Session Count Reduction
|
||||
|
||||
Counts assume no fallback sessions fire (mechanical processing succeeds). "Current" uses quality profile. "New" is the simplified pipeline.
|
||||
|
||||
| Milestone Shape | Current Sessions (quality) | New Sessions | Reduction |
|
||||
|----------------|---------------------------|--------------|-----------|
|
||||
| 1 slice, 2 tasks | 9 | 3 | 67% |
|
||||
| 2 slices, 3 tasks | 17 | 8 | 53% |
|
||||
| 4 slices, 3 tasks | 30 | 16 | 47% |
|
||||
| 6 slices, 4 tasks | 46 | 31 | 33% |
|
||||
|
||||
**Derivation (4-slice, 3-task):**
|
||||
|
||||
Current (quality): research-milestone(1) + plan-milestone(1) + [research-slice(0) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] for S01 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(1)] × 2 for S02-S03 + [research-slice(1) + plan-slice(1) + execute(3) + complete-slice(1) + reassess(0)] for S04 + validate(1) + complete-milestone(1) = 2 + 6 + 14 + 6 + 2 = 30.
|
||||
|
||||
New: plan-milestone(1) + [execute(3)] for S01 + [plan-slice(1) + execute(3)] × 3 for S02-S04 = 1 + 3 + 12 = 16.
|
||||
|
||||
### Token Savings
|
||||
|
||||
Eliminated sessions are the primary savings mechanism. Context re-ingestion reduction is a secondary effect of having fewer sessions (each of the remaining sessions still ingests some context). These are NOT additive — the re-ingestion savings are already captured in the eliminated session savings.
|
||||
|
||||
| Source | Per Milestone (4-slice, 3-task) |
|
||||
|--------|-------------------------------|
|
||||
| Eliminated research sessions (1 milestone + 3 slice) | 48–148K tokens |
|
||||
| Eliminated complete-slice sessions (4) | 48–148K tokens |
|
||||
| Eliminated reassess sessions (3) | 36–111K tokens |
|
||||
| Eliminated validate session (1) | 12–37K tokens |
|
||||
| Eliminated complete-milestone session (1) | 12–37K tokens |
|
||||
| **Total estimated savings** | **~156–481K tokens** |
|
||||
|
||||
At current Opus pricing ($15/MTok input, $75/MTok output — as of March 2026), the input savings alone are **$2.34–$7.22 per milestone**. Output savings are harder to estimate but typically 30-50% of input.
|
||||
|
||||
### Code Deletion
|
||||
|
||||
| File / Section | Lines | Impact |
|
||||
|----------------|-------|--------|
|
||||
| `auto-dispatch.ts` — 3 removed default-path rules | ~40 | Simpler dispatch table |
|
||||
| `auto-prompts.ts` — 5 builders become fallback-only | ~250 | `buildResearchMilestonePrompt`, `buildResearchSlicePrompt`, `buildCompleteSlicePrompt`, `buildValidateMilestonePrompt`, `buildCompleteMilestonePrompt` move to explicit-dispatch codepath |
|
||||
| `auto-prompts.ts` — reduced inlining (Phase 4) | ~100 | Remove `inlineFile` calls for static docs in planning prompts, replace with path references |
|
||||
| Context re-ingestion helpers (Phase 4) | ~50 | `inlineDecisionsFromDb`, `inlineRequirementsFromDb`, `inlineProjectFromDb` simplified for planning paths |
|
||||
| **Total deletable** | **~440** | |
|
||||
|
||||
### Code Added
|
||||
|
||||
| File / Section | Lines | Impact |
|
||||
|----------------|-------|--------|
|
||||
| `auto-prompts.ts` — plan-milestone exploration | ~30 | Research instructions merged in |
|
||||
| `auto-prompts.ts` — plan-slice reassessment + exploration | ~25 | Reassessment + exploration preamble |
|
||||
| `auto-post-unit.ts` — `mechanicalSliceCompletion()` | ~80 | Structured frontmatter aggregation, UAT generation, artifact writes |
|
||||
| `auto-verification.ts` — `aggregateMilestoneVerification()` | ~60 | T##-VERIFY.json + UAT aggregation |
|
||||
| `auto-unit-closeout.ts` — `generateMilestoneSummary()` | ~60 | Mechanical summary generation |
|
||||
| **Total added** | **~255** | |
|
||||
|
||||
### Net Impact
|
||||
|
||||
- **~185 lines net deleted** (440 deleted - 255 added)
|
||||
- **3 fewer default-path dispatch rules** (15 → 12, with 3 retained as fallbacks)
|
||||
- **6 fewer unit types in the default pipeline** (13 → 7 active; 6 retained for fallback/explicit dispatch)
|
||||
- **~156–481K fewer tokens per milestone**
|
||||
- **14 fewer session handoffs per 4-slice milestone under quality profile** (each a potential failure/timeout point)
|
||||
- `auto-prompts.ts` goes from ~1,099 lines to ~924 lines (~175 lines net reduction)
|
||||
|
||||
### What Stays Unchanged
|
||||
|
||||
- The **discuss** flow (guided-flow.ts, interactive discussion)
|
||||
- The **dispatch table architecture** (declarative rules, first-match-wins)
|
||||
- The **fresh session per unit** pattern (still used for plan-slice and execute-task)
|
||||
- The **state derivation** (`deriveState()` reads files, derives phase — all existing phases preserved)
|
||||
- The **verification gate** (runs tests/lint after each task)
|
||||
- The **worktree isolation** model
|
||||
- The **crash recovery**, **idempotency**, and **stuck detection** systems (fewer sessions means these fire less often, but the safety nets remain)
|
||||
- The **metrics** and **cost tracking** systems
|
||||
- The **parallel orchestrator** for independent milestones
|
||||
- All prompt templates are **retained** — for fallback, recovery, and explicit dispatch via `/gsd dispatch <unit-type>`
|
||||
|
||||
### What Gets Simpler Downstream
|
||||
|
||||
Less machinery is needed when sessions are fewer:
|
||||
|
||||
- **Fewer recovery paths.** 14 fewer sessions means 14 fewer opportunities for timeouts, stuck states, and missing artifacts.
|
||||
- **Simpler `auto-post-unit.ts`.** Reassess dispatch logic removed (opt-in only). Mechanical completion/validation added but replaces more complex LLM-session dispatch.
|
||||
- **Simpler `auto-stuck-detection.ts`.** Fewer unit types means fewer dispatch-loop patterns to detect.
|
||||
- **Simpler `auto-idempotency.ts`.** Fewer completed-key types to track.
|
||||
|
||||
These simplifications are downstream effects — they don't need to happen in the same change. But they represent ~500-1000 lines of code that becomes significantly simpler or unnecessary as a consequence of this ADR.
|
||||
|
||||
## Risks
|
||||
|
||||
### 1. Plan-milestone sessions become heavier
|
||||
|
||||
Merging research into planning makes plan-milestone sessions longer. The planner must explore the codebase AND decompose into slices in a single session. Risk: the session hits context pressure before finishing.
|
||||
|
||||
**Mitigation:** Plan-milestone is the session that benefits most from a large context window. Modern context windows (200K+ tokens) easily accommodate exploration + planning. The single-slice fast path (already in plan-milestone.md) already combines planning with slice plan + task plan writing in one session — this extends that pattern. Phase 4 (reducing inlining for planning sessions) further offsets the added exploration work.
|
||||
|
||||
**Phase ordering note:** Phase 1 (merge research into planning) adds exploration to plan-milestone. If Phase 4 (reduce inlining) hasn't landed yet, the plan-milestone prompt includes both exploration instructions AND the full inlined context. This is the most context-heavy state. To mitigate, Phase 1 should also reduce inlining for plan-milestone/plan-slice specifically — moving DECISIONS, REQUIREMENTS, and PROJECT to path references while keeping ROADMAP and CONTEXT inlined. This is a targeted subset of Phase 4, not a separate phase.
|
||||
|
||||
### 2. Mechanical completion quality
|
||||
|
||||
The mechanical slice completion aggregates structured frontmatter but cannot produce narrative context, forward intelligence sections, or nuanced UAT scenarios that the current LLM-driven complete-slice session produces.
|
||||
|
||||
**Mitigation:**
|
||||
- For most slices (2-3 tasks, straightforward work), structured aggregation is sufficient. The frontmatter fields (provides, requires, affects, key_files, key_decisions, patterns_established) capture the essential information.
|
||||
- The quality threshold fallback dispatches a complete-slice LLM session for complex slices.
|
||||
- The LLM fallback is zero-cost to implement — the complete-slice template and dispatch rule are retained.
|
||||
|
||||
### 3. Loss of research artifacts
|
||||
|
||||
RESEARCH.md files provided a useful paper trail for debugging plan quality. Without them, it's harder to understand why a planner made certain decisions.
|
||||
|
||||
**Mitigation:**
|
||||
- The planner's narration (visible in the conversation transcript) captures exploration reasoning.
|
||||
- RESEARCH.md is optional, not eliminated. Planners can write one when exploration is complex.
|
||||
- The KNOWLEDGE.md file captures non-obvious patterns and decisions.
|
||||
- DECISIONS.md captures structural choices.
|
||||
|
||||
### 4. Reassessment gaps
|
||||
|
||||
Without mandatory reassessment, a slice might complete with findings that invalidate the remaining roadmap, and the next planner might not notice.
|
||||
|
||||
**Mitigation:**
|
||||
- The plan-slice prompt includes a reassessment preamble that explicitly checks prior slice summaries.
|
||||
- The `blocker_discovered` flag in task summaries already triggers automatic replanning.
|
||||
- Users who want explicit reassessment can enable the `reassess_after_slice` preference.
|
||||
|
||||
### 5. Mechanical completion doesn't cover all complete-slice responsibilities
|
||||
|
||||
The current complete-slice prompt (steps 5, 8, 9) updates REQUIREMENTS.md, appends to DECISIONS.md, and appends to KNOWLEDGE.md. The mechanical completion handles REQUIREMENTS.md and DECISIONS.md mechanically but cannot produce KNOWLEDGE.md entries (which require judgment about what's genuinely useful).
|
||||
|
||||
**Mitigation:**
|
||||
- Execute-task prompt step 13 already instructs executors to append to KNOWLEDGE.md during task execution. Most knowledge entries are discovered during implementation, not during completion.
|
||||
- DECISIONS.md appendix is handled mechanically by collecting `key_decisions` from task summaries and deduplicating against existing entries.
|
||||
- REQUIREMENTS.md updates are handled mechanically by cross-referencing task verification results against requirement-to-slice mappings.
|
||||
- For the LLM fallback path (complex slices), the complete-slice prompt retains all responsibilities.
|
||||
|
||||
### 6. Migration path
|
||||
|
||||
Milestones in progress when this change deploys will have state files (RESEARCH.md, etc.) that the new pipeline doesn't produce. The dispatch table must gracefully handle both old-style and new-style state.
|
||||
|
||||
**Mitigation:**
|
||||
- Dispatch rules check for file existence, not file absence. A milestone with an existing RESEARCH.md still works — the plan-milestone rule fires regardless of whether research exists.
|
||||
- The idempotency system already handles "completed research unit → dispatch plan" transitions.
|
||||
- All `deriveState()` phases are preserved — old-style state resolves correctly.
|
||||
- No migration needed. The new pipeline is strictly more permissive than the old one.
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### A. Keep research as a separate session, just make it optional
|
||||
|
||||
Add a `skip_research` preference (already exists) and make it default to true. This is the minimal change — one boolean flip.
|
||||
|
||||
**Rejected:** This saves sessions but doesn't address the context re-ingestion problem, the lossy handoff problem, or the ceremony session overhead. It's a preference toggle, not an architectural improvement.
|
||||
|
||||
### B. Keep all unit types but share context via a persistent cache
|
||||
|
||||
Instead of fresh sessions, maintain a shared context store that persists across units. Each unit reads from the store instead of re-inlining files.
|
||||
|
||||
**Rejected:** This requires a fundamentally different session model — either a long-running session (which hits context limits) or a cache mechanism that the LLM can query (which doesn't exist in the Claude API). The fresh-session-per-unit model is correct; the problem is what we put in each session, not the session model itself.
|
||||
|
||||
### C. Collapse everything into a single session per slice
|
||||
|
||||
One session per slice: plan + execute all tasks + complete. Maximum context efficiency.
|
||||
|
||||
**Rejected:** This hits real context limits for slices with 4+ tasks. Task execution is legitimately heavy — reading code, writing code, running tests, debugging failures. A single session for all of this would exhaust the context window. The plan-slice / execute-task boundary is a genuine engineering constraint, not ceremony.
|
||||
|
||||
### D. Fold completion into the last executor's prompt instead of post-unit processing
|
||||
|
||||
The original design had the last execute-task writing SUMMARY.md, UAT.md, and marking the slice done.
|
||||
|
||||
**Rejected (per Codex audit):** This creates a verification-retry ordering problem. If the executor writes SUMMARY.md and marks the slice done in the ROADMAP before the verification gate runs, a gate failure retries against incorrect derived state. Post-gate mechanical processing avoids this by running only after verification succeeds.
|
||||
|
||||
### E. Keep complete-slice as a separate session
|
||||
|
||||
The mechanical summary quality might be insufficient for complex slices.
|
||||
|
||||
**Addressed:** The mechanical approach with LLM fallback provides the best of both worlds. Simple slices get fast mechanical completion. Complex slices fall back to the existing LLM session. The quality threshold is tunable.
|
||||
|
||||
## Action Items
|
||||
|
||||
### Phase 1: Merge research into planning (+ targeted inlining reduction)
|
||||
1. Update `buildPlanMilestonePrompt()` — add exploration instructions, skill discovery, drop "Trust the research"
|
||||
2. Update `buildPlanSlicePrompt()` — add exploration instructions, reassessment preamble, drop "Trust the research"
|
||||
3. Remove dispatch rule "pre-planning (no research) → research-milestone" — merge with "pre-planning (has research) → plan-milestone" into single "pre-planning → plan-milestone"
|
||||
4. Remove dispatch rule "planning (no research, not S01) → research-slice"
|
||||
5. Update `plan-milestone.md` and `plan-slice.md` prompt templates
|
||||
6. Make `skip_research` and `skip_slice_research` preferences default to true (backwards compat)
|
||||
7. Retain research templates for explicit `/gsd dispatch research` use
|
||||
8. **Targeted inlining reduction for planning sessions:** Move DECISIONS, REQUIREMENTS, PROJECT to path references in plan-milestone and plan-slice prompts. Keep ROADMAP and CONTEXT inlined. This prevents context pressure from the added exploration work.
|
||||
|
||||
### Phase 2: Mechanical slice completion
|
||||
9. Implement `mechanicalSliceCompletion()` in `auto-post-unit.ts`
|
||||
10. Wire into post-unit processing: detect all-tasks-done after verification gate passes, run mechanical completion
|
||||
11. Implement quality threshold check (summary length, artifact presence)
|
||||
12. Retain `summarizing → complete-slice` dispatch rule as fallback for mechanical failures
|
||||
13. Implement `mechanicalRequirementsUpdate()` and `appendNewDecisions()`
|
||||
|
||||
### Phase 3: Mechanical milestone validation + completion
|
||||
14. Implement `aggregateMilestoneVerification()` reading `T##-VERIFY.json` and `S##-UAT.md`
|
||||
15. Implement `generateMilestoneSummary()` from slice summary aggregation
|
||||
16. Wire into post-unit processing: after last slice completion, run mechanical validation + summary
|
||||
17. Make reassess-roadmap opt-in via `reassess_after_slice` preference (default: false)
|
||||
18. Retain `validating-milestone` and `completing-milestone` dispatch rules as fallbacks
|
||||
|
||||
### Phase 4: Full context re-ingestion reduction
|
||||
19. Replace remaining `inlineFile()` calls for stable documents with mandatory-read path references
|
||||
20. Update prompt headers with explicit "You MUST read" directives for critical files
|
||||
21. Add plan output verification (must cite ROADMAP slice description)
|
||||
22. Measure plan quality metrics before/after to validate the change
|
||||
|
||||
### Phase 5: Downstream simplification (optional, deferred)
|
||||
23. Simplify `auto-post-unit.ts` — remove reassess dispatch logic (opt-in only)
|
||||
24. Simplify `auto-stuck-detection.ts` — fewer unit type patterns
|
||||
25. Simplify `auto-idempotency.ts` — fewer completed-key types
|
||||
26. Review `auto-recovery.ts` — simplify recovery paths for unit types that are now fallback-only
|
||||
27. Update auto-mode documentation (`docs/auto-mode.md`)
|
||||
|
||||
## Audit Trail
|
||||
|
||||
### Round 1 — Three-model review (March 18, 2026)
|
||||
|
||||
**Claude Opus 4.6** identified 8 issues:
|
||||
1. ✅ Session count math inconsistent about S01 plan-slice skip — **fixed**: explicit derivation added with per-slice breakdown
|
||||
2. ✅ `discuss` session counted in pipeline but not in math — **fixed**: noted as interactive session, not auto-mode unit
|
||||
3. ✅ Token savings double-counting (eliminated sessions + re-ingestion) — **fixed**: removed overlap, noted savings are not additive
|
||||
4. ✅ Context inlining change (file paths vs inline) underanalyzed — **fixed**: expanded to dedicated risk section with enforcement strategy, phased rollout, and interaction with budget engine
|
||||
5. ✅ Budget engine interaction not discussed — **fixed**: addressed in context inlining section
|
||||
6. ✅ `aggregateMilestoneVerification()` reads wrong data source — **fixed**: now reads `T##-VERIFY.json` as primary source, supplemented by `S##-UAT.md`
|
||||
7. ✅ Phase ordering creates heavy intermediate state (Phase 1 without Phase 4) — **fixed**: Phase 1 now includes targeted inlining reduction for planning sessions
|
||||
8. ✅ ADR number conflict — **fixed**: confirmed no ADR-003 exists in `docs/` (the referenced file doesn't exist in current git)
|
||||
|
||||
**OpenAI Codex** identified 6 issues:
|
||||
1. ✅ HIGH: Folding completion into execute-task breaks verification-retry model — **fixed**: moved completion to post-gate mechanical processing instead of executor prompt. Added Alternative D explaining why.
|
||||
2. ✅ HIGH: Mechanical validation reads nonexistent `verification_evidence` frontmatter — **fixed**: now reads `T##-VERIFY.json` (canonical machine-readable source from `verification-evidence.ts`)
|
||||
3. ✅ HIGH: Replacement validation drops UAT evidence — **fixed**: aggregator now reads both `T##-VERIFY.json` and `S##-UAT.md`
|
||||
4. ✅ HIGH: "State derivation stays unchanged" is false — **fixed**: explicitly documented that `deriveState()` phases are preserved, mechanical processing resolves them synchronously, fallback dispatch rules handle failures
|
||||
5. ✅ MEDIUM: Folded completion omits REQUIREMENTS.md and KNOWLEDGE.md updates — **fixed**: mechanical completion handles REQUIREMENTS.md and DECISIONS.md; KNOWLEDGE.md addressed in Risk 5
|
||||
6. ✅ MEDIUM: Session and token math inconsistent — **fixed**: complete rederivation with per-slice breakdown, corrected to 30 baseline sessions, noted profile variations
|
||||
|
||||
**Gemini 2.5 Pro** audit was not usable — it hallucinated the ADR as a CI/CD pipeline document about GitHub Actions, matrix builds, and nx workspace tooling. No findings were applicable to the actual content.
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,383 +0,0 @@
|
|||
# PRD: Branchless Worktree Architecture
|
||||
|
||||
**Author:** Lex Christopherson
|
||||
**Date:** 2026-03-15
|
||||
**ADR:** [ADR-001-branchless-worktree-architecture.md](./ADR-001-branchless-worktree-architecture.md)
|
||||
**Priority:** Critical — blocks reliable auto-mode operation
|
||||
|
||||
---
|
||||
|
||||
## Problem Statement
|
||||
|
||||
GSD's auto-mode is unreliable. Users experience:
|
||||
|
||||
1. **Infinite loop detection failures** — the agent writes planning artifacts on slice branches that become invisible after branch switching, causing `verifyExpectedArtifact()` to fail repeatedly. Auto-mode burns budget retrying the same unit 3-6 times before hard-stopping. This is the #1 user complaint.
|
||||
|
||||
2. **State corruption across branches** — `.gsd/` planning artifacts (roadmaps, plans, decisions) are gitignored but branch-specific. Multiple branches sharing a single `.gsd/` directory clobber each other's state. Users see wrong milestones marked complete, wrong roadmaps loaded, and auto-mode starting from the wrong phase.
|
||||
|
||||
3. **Excessive complexity** — 770+ lines of merge, conflict resolution, branch switching, and self-healing code exist solely to manage slice branches inside worktrees. This code has required 15+ bug fixes across versions and remains the primary source of auto-mode failures.
|
||||
|
||||
These problems are architectural. They cannot be fixed by patching individual symptoms.
|
||||
|
||||
## Vision
|
||||
|
||||
Auto-mode uses git worktrees for isolation and sequential commits for history. No branch switching. No merge conflicts within a worktree. Planning artifacts are tracked in git and travel with the branch. The git layer is so simple it can't break.
|
||||
|
||||
## Success Criteria
|
||||
|
||||
| Criterion | Measurement |
|
||||
|-----------|-------------|
|
||||
| Zero loop detection failures from branch visibility | No `verifyExpectedArtifact()` failures caused by branch mismatch in 50 consecutive auto-mode runs |
|
||||
| Zero `.gsd/` state corruption | Manual worktrees created via `git worktree add` have correct `.gsd/` state without any GSD-specific initialization |
|
||||
| Code deletion | Net removal of ≥500 lines of merge/conflict/branch-switching code |
|
||||
| Test simplification | Removal or simplification of ≥6 merge-specific test files |
|
||||
| Backwards compatibility | Existing projects with `gsd/M001/S01` slice branches continue to work (read-only; new work uses new model) |
|
||||
| No new git primitives | The implementation uses only: worktrees, commits, squash-merge. No new branch types, merge strategies, or conflict resolution. |
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Parallel slice execution within a single worktree (if needed later, use separate worktrees)
|
||||
- Changing how milestones relate to `main` (squash-merge stays)
|
||||
- Modifying the dispatch unit types or state machine (except removing `fix-merge`)
|
||||
- Changing the worktree-manager.ts manual worktree API (`/worktree` command)
|
||||
|
||||
## Current Architecture
|
||||
|
||||
### Branch Model (M003, v2.13.0)
|
||||
|
||||
```
|
||||
main
|
||||
└─ milestone/M001 (worktree at .gsd/worktrees/M001/)
|
||||
├─ gsd/M001/S01 (slice branch — code + .gsd/ artifacts)
|
||||
│ └── merge --no-ff → milestone/M001
|
||||
├─ gsd/M001/S02
|
||||
│ └── merge --no-ff → milestone/M001
|
||||
└── squash merge → main
|
||||
```
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Agent writes file → on slice branch → handleAgentEnd → auto-commit on slice branch
|
||||
→ switch to milestone branch → verifyExpectedArtifact → FILE NOT FOUND (it's on slice branch)
|
||||
→ loop counter++ → retry → same result → HARD STOP
|
||||
```
|
||||
|
||||
### Code Involved
|
||||
|
||||
| File | Lines | Purpose |
|
||||
|------|-------|---------|
|
||||
| `auto-worktree.ts` | 512 | Worktree lifecycle + slice→milestone merge |
|
||||
| `git-service.ts` | 915 | Branch creation, switching, merge with conflict resolution |
|
||||
| `git-self-heal.ts` | 198 | Merge failure recovery |
|
||||
| `auto.ts` | ~150 lines | Merge dispatch guards, fix-merge routing, branch-mode vs worktree-mode branching |
|
||||
| `worktree.ts` | ~40 lines | Slice branch delegates |
|
||||
| 11 test files | ~2000 lines | Merge/branch/worktree test coverage |
|
||||
|
||||
### `.gsd/` Tracking (Current — Contradictory)
|
||||
|
||||
- `.gitignore` line 52: `.gsd/` — ignores everything
|
||||
- `smartStage()` lines 338-349: force-adds `GSD_DURABLE_PATHS` — tracks milestones/, DECISIONS.md, PROJECT.md, REQUIREMENTS.md, QUEUE.md
|
||||
- Result: `.gsd/milestones/` is partially tracked on some branches, fully ignored on others. The code fights the config.
|
||||
|
||||
## Proposed Architecture
|
||||
|
||||
### Branch Model
|
||||
|
||||
```
|
||||
main
|
||||
└─ milestone/M001 (worktree at .gsd/worktrees/M001/)
|
||||
│
|
||||
commit: feat(M001): context + roadmap
|
||||
commit: feat(M001/S01): research
|
||||
commit: feat(M001/S01): plan
|
||||
commit: feat(M001/S01/T01): implement auth service
|
||||
commit: feat(M001/S01/T02): implement auth tests
|
||||
commit: feat(M001/S01): summary + UAT
|
||||
commit: docs(M001): reassess roadmap after S01
|
||||
commit: feat(M001/S02): research
|
||||
commit: feat(M001/S02): plan
|
||||
commit: ...
|
||||
commit: feat(M001): milestone complete
|
||||
│
|
||||
└── squash merge → main
|
||||
```
|
||||
|
||||
One branch. Sequential commits. No merges within the worktree.
|
||||
|
||||
### Data Flow
|
||||
|
||||
```
|
||||
Agent writes file → on milestone branch → handleAgentEnd → auto-commit on milestone branch
|
||||
→ verifyExpectedArtifact → FILE FOUND (same branch) → persist completion → next dispatch
|
||||
```
|
||||
|
||||
### `.gsd/` Tracking (Proposed — Coherent)
|
||||
|
||||
**Tracked (travels with branch):**
|
||||
```
|
||||
.gsd/milestones/**/*.md (except CONTINUE markers)
|
||||
.gsd/milestones/**/*.json (META.json integration records)
|
||||
.gsd/PROJECT.md
|
||||
.gsd/DECISIONS.md
|
||||
.gsd/REQUIREMENTS.md
|
||||
.gsd/QUEUE.md
|
||||
```
|
||||
|
||||
**Gitignored (ephemeral):**
|
||||
```
|
||||
.gsd/auto.lock
|
||||
.gsd/completed-units.json
|
||||
.gsd/STATE.md
|
||||
.gsd/metrics.json
|
||||
.gsd/gsd.db
|
||||
.gsd/activity/
|
||||
.gsd/runtime/
|
||||
.gsd/worktrees/
|
||||
.gsd/DISCUSSION-MANIFEST.json
|
||||
.gsd/milestones/**/*-CONTINUE.md
|
||||
.gsd/milestones/**/continue.md
|
||||
```
|
||||
|
||||
### Why This Works
|
||||
|
||||
| Problem | How It's Solved |
|
||||
|---------|----------------|
|
||||
| Artifact invisibility after branch switch | No branch switching. Artifacts commit on the one branch. |
|
||||
| `.gsd/` state clobbering | Artifacts tracked in git. Each branch carries its own `.gsd/`. `git worktree add` and `git checkout` give correct state. |
|
||||
| Merge conflict complexity | No merges within a worktree. Only merge is milestone→main (squash). |
|
||||
| Manual worktree initialization | Tracked artifacts are checked out with the branch. No GSD-specific bootstrap needed. |
|
||||
| Dual isolation mode maintenance | Single mode: worktree. Branch-mode (`git.isolation: "branch"`) deprecated. |
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Phase 1: `.gitignore` + Tracking Fix
|
||||
|
||||
**Goal:** Planning artifacts are tracked in git. `.gitignore` reflects reality.
|
||||
|
||||
1. Update `.gitignore`:
|
||||
- Remove blanket `.gsd/` ignore
|
||||
- Add explicit runtime-only ignores (see proposed list above)
|
||||
|
||||
2. Force-add existing planning artifacts on current branch:
|
||||
```
|
||||
git add --force .gsd/milestones/ .gsd/PROJECT.md .gsd/DECISIONS.md .gsd/REQUIREMENTS.md .gsd/QUEUE.md
|
||||
```
|
||||
|
||||
3. Ensure runtime files are NOT tracked:
|
||||
```
|
||||
git rm --cached -r .gsd/runtime/ .gsd/activity/ .gsd/STATE.md .gsd/metrics.json .gsd/completed-units.json .gsd/auto.lock
|
||||
```
|
||||
|
||||
4. Update README suggested `.gitignore` section
|
||||
|
||||
5. Remove `smartStage()` force-add of `GSD_DURABLE_PATHS` — no longer needed since `.gitignore` doesn't block them
|
||||
|
||||
**Verification:** `git status` shows planning artifacts tracked, runtime files untracked. `git worktree add` on a new worktree has correct `.gsd/milestones/` state.
|
||||
|
||||
### Phase 2: Remove Slice Branch Creation + Switching
|
||||
|
||||
**Goal:** No code creates, switches to, or references slice branches for new work.
|
||||
|
||||
1. Remove `ensureSliceBranch()` from `git-service.ts` (lines 485-544)
|
||||
2. Remove `switchToMain()` from `git-service.ts` (lines 549-563)
|
||||
3. Remove `getSliceBranchName()` from `worktree.ts` (lines 94-98)
|
||||
4. Remove `isOnSliceBranch()` and `getActiveSliceBranch()` from `worktree.ts`
|
||||
5. Update `auto.ts` dispatch paths — remove branch creation before `execute-task`
|
||||
6. Update `handleAgentEnd` — remove branch-switching logic post-dispatch
|
||||
|
||||
**Verification:** Auto-mode runs a full slice (research → plan → execute → complete) without creating any branches. All commits land on `milestone/<MID>`.
|
||||
|
||||
### Phase 3: Remove Slice Merge Code
|
||||
|
||||
**Goal:** All slice→milestone and slice→main merge code is deleted.
|
||||
|
||||
1. Remove `mergeSliceToMilestone()` from `auto-worktree.ts` (lines 253-350)
|
||||
2. Remove `mergeSliceToMain()` from `git-service.ts` (lines 705-893)
|
||||
3. Remove merge dispatch guards from `auto.ts` (lines 1635-1679)
|
||||
4. Remove `fix-merge` dispatch unit type from `auto.ts`
|
||||
5. Remove `buildPromptForFixMerge()` from `auto.ts`
|
||||
6. Remove `withMergeHeal()` from `git-self-heal.ts` (lines 99-136)
|
||||
7. Remove `abortAndReset()` from `git-self-heal.ts` (lines 37-84) — or simplify to crash-recovery-only
|
||||
8. Remove `shouldUseWorktreeIsolation()` preference resolution — worktree is the only mode
|
||||
9. Remove `getMergeToMainMode()` — milestone merge is the only mode
|
||||
10. Deprecate `git.isolation: "branch"` and `git.merge_to_main: "slice"` preferences
|
||||
|
||||
**Verification:** `git grep mergeSliceToMilestone` returns zero results. `git grep mergeSliceToMain` returns zero results. `git grep fix-merge` returns zero results (outside of changelog/docs).
|
||||
|
||||
### Phase 4: Simplify `mergeMilestoneToMain()`
|
||||
|
||||
**Goal:** Milestone→main merge is clean and minimal.
|
||||
|
||||
The function becomes:
|
||||
1. Auto-commit any dirty state in worktree
|
||||
2. `process.chdir(originalBasePath)` — back to main repo
|
||||
3. `git checkout main`
|
||||
4. `git merge --squash milestone/<MID>`
|
||||
5. Build commit message with milestone summary + slice manifest
|
||||
6. `git commit`
|
||||
7. Optional: `git push`
|
||||
8. `removeWorktree()` + `git branch -D milestone/<MID>`
|
||||
|
||||
No conflict categorization. No runtime file stripping (runtime files are gitignored, not in the merge). No `.gsd/` special handling.
|
||||
|
||||
If squash-merge conflicts (parallel milestone edge case): stop auto-mode with clear error, user resolves manually or GSD dispatches a one-time resolution session.
|
||||
|
||||
**Verification:** Complete a full milestone in auto-mode. `main` receives one squash commit with all code and planning artifacts.
|
||||
|
||||
### Phase 5: Test Cleanup
|
||||
|
||||
**Goal:** Test suite reflects the simplified architecture.
|
||||
|
||||
1. Delete or rewrite:
|
||||
- `auto-worktree-merge.test.ts` — tests slice→milestone merge (deleted)
|
||||
- `auto-worktree-milestone-merge.test.ts` — rewrite for simplified milestone→main
|
||||
- `worktree-e2e.test.ts` — rewrite for branchless flow
|
||||
- `worktree-integration.test.ts` — rewrite for branchless flow
|
||||
- Merge-related test cases in `git-service.test.ts`
|
||||
|
||||
2. Add new tests:
|
||||
- Branchless worktree lifecycle: create → commit → commit → squash-merge → cleanup
|
||||
- `.gsd/` tracking: planning artifacts tracked, runtime files ignored
|
||||
- Manual worktree: `git worktree add` has correct `.gsd/` state
|
||||
- Crash recovery: dirty state on milestone branch, restart, auto-commit, continue
|
||||
|
||||
3. Remove merge-specific doctor checks or simplify:
|
||||
- `corrupt_merge_state` — keep (still relevant for milestone→main)
|
||||
- `orphaned_auto_worktree` — keep
|
||||
- `stale_milestone_branch` — keep
|
||||
- `tracked_runtime_files` — keep
|
||||
|
||||
**Verification:** `npm run test` passes. No test references `mergeSliceToMilestone`, `mergeSliceToMain`, or `ensureSliceBranch`.
|
||||
|
||||
### Phase 6: Migration + Backwards Compatibility
|
||||
|
||||
**Goal:** Existing projects with slice branches continue to work.
|
||||
|
||||
1. State derivation (`deriveState()`) continues to read `gsd/M001/S01` branch naming for legacy detection
|
||||
2. On first run after upgrade:
|
||||
- Detect existing slice branches
|
||||
- Notify user: "GSD no longer creates slice branches. Existing branches are preserved but new work commits directly to the milestone branch."
|
||||
- No forced migration — legacy branches are read-only context
|
||||
3. Doctor check: `legacy_slice_branches` — informational, not auto-fix
|
||||
4. Update `shouldUseWorktreeIsolation()` preference handling:
|
||||
- `git.isolation: "worktree"` → default behavior (only option)
|
||||
- `git.isolation: "branch"` → warning, treated as worktree
|
||||
- Remove preference UI for isolation mode
|
||||
|
||||
**Verification:** Open a project with existing `gsd/M001/S01` branches. GSD reads state correctly, new work commits on milestone branch without slice branches.
|
||||
|
||||
## Stress Test Results
|
||||
|
||||
Validated by three independent models:
|
||||
|
||||
### Gemini 2.5 Pro — 6 Attack Vectors
|
||||
|
||||
| Attack | Severity | Mitigation |
|
||||
|--------|----------|------------|
|
||||
| Parallel milestone code conflict at squash-merge | Medium | `git rebase main` before squash. Rare in single-user. |
|
||||
| SQLite desync after `git reset --hard` | Low | DB rebuilt from tracked markdown on startup (M001/S02 importers). |
|
||||
| Ghost lock after SIGKILL | Low | Existing heartbeat lock detection handles this. |
|
||||
| Squash merge loses bisect granularity | Low | Commit messages tag slices. Branch preservable if needed. |
|
||||
| Disk space with multiple worktrees | Low | Single active milestone at a time. Immediate cleanup. |
|
||||
| Plan-action atomicity gap (crash between write and commit) | Low | `handleAgentEnd` auto-commits. Sequential model simplifies recovery. |
|
||||
|
||||
### GPT-5.4 (Codex) — Codebase-Informed Analysis
|
||||
|
||||
- Confirmed `smartStage()` force-add already implements tracked-artifact intent
|
||||
- Confirmed `resolveMainWorktreeRoot` (PR #487) contradicts this architecture
|
||||
- Confirmed `.gsd/milestones/` partially tracked on `main` despite `.gitignore`
|
||||
- Verdict: **Model is sound. Removes only accidental complexity.**
|
||||
|
||||
### GPT-5.4 (Codex) — Dissenting Opinion
|
||||
|
||||
Codex agreed on tracked artifacts and worktree-per-milestone, but pushed back on removing slice branches, calling it "a redesign, not a simplification." Specific concerns:
|
||||
|
||||
| Concern | Rebuttal |
|
||||
|---------|----------|
|
||||
| Crash recovery for orphaned slice branches disappears | The failure mode (orphaned branch needing merge) is caused by slice branches. Removing branches removes the failure. Sequential commits on one branch need no orphan recovery. |
|
||||
| Concurrent edits to shared root docs (DECISIONS.md) from two terminals | Standard content conflict at squash-merge time. Not caused by or solved by slice branches. |
|
||||
| Continuous integration via slice→milestone merges | In sequential single-user work, there's nothing to integrate against within the worktree. Pre-flight rebase before squash-merge is more direct. |
|
||||
| Need a replacement slice-boundary primitive | Accepted: conventional commit tags (`feat(M001/S01):`) + optional git tags (`gsd/M001/S01-complete`) serve as boundaries. |
|
||||
|
||||
Codex's analysis confirms the tracked-artifact approach but recommends treating branchless as a deliberate redesign with explicit replacement primitives, not a casual deletion.
|
||||
|
||||
### Edge Case: Two Milestones Touching Same Source Files
|
||||
|
||||
Scenario: M001 and M002 both modify `src/auth.ts`. M001 squash-merges first.
|
||||
|
||||
Resolution: Before M002 squash-merges, rebase onto updated `main`:
|
||||
```
|
||||
cd .gsd/worktrees/M002
|
||||
git fetch origin main
|
||||
git rebase main
|
||||
# Resolve any conflicts (code-only, never .gsd/)
|
||||
# Then squash-merge
|
||||
```
|
||||
|
||||
This is standard git workflow. GSD can automate the rebase step as a pre-merge check.
|
||||
|
||||
### Edge Case: Agent Crash Mid-Commit
|
||||
|
||||
Scenario: Power loss during `git commit` on the milestone branch.
|
||||
|
||||
Resolution: Git's internal journaling protects the object store. On restart:
|
||||
- If commit completed: state is consistent
|
||||
- If commit didn't complete: working directory has uncommitted changes, `handleAgentEnd` auto-commits on next dispatch
|
||||
- No branch to be "stuck between" — single branch means no split-brain state
|
||||
|
||||
### Edge Case: User Edits Main While Worktree Active
|
||||
|
||||
Scenario: User makes manual commits on `main` while M001 worktree is active.
|
||||
|
||||
Resolution: Worktree is on `milestone/M001` branch, independent of `main`. Manual `main` commits don't affect the worktree. At squash-merge time, `git merge --squash` handles the divergence normally. If there's a conflict, it's resolved once.
|
||||
|
||||
## Metrics
|
||||
|
||||
### Before (Current)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Merge/conflict/branch code | 770+ lines across 4 files |
|
||||
| Merge-related test files | 11 files |
|
||||
| Branch types | 4 (main, milestone/*, gsd/*/*, worktree/*) |
|
||||
| Merge strategies | 3 (--no-ff, --squash, conflict resolution) |
|
||||
| Dispatch unit types with merge logic | 2 (complete-slice, fix-merge) |
|
||||
| Isolation modes | 2 (branch, worktree) |
|
||||
| Doctor git checks | 4 |
|
||||
|
||||
### After (Proposed)
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| Merge/conflict/branch code | ~50 lines (simplified `mergeMilestoneToMain` only) |
|
||||
| Merge-related test files | 3-4 files (rewritten) |
|
||||
| Branch types | 2 (main, milestone/*) |
|
||||
| Merge strategies | 1 (--squash) |
|
||||
| Dispatch unit types with merge logic | 0 |
|
||||
| Isolation modes | 1 (worktree) |
|
||||
| Doctor git checks | 3-4 (simplified) |
|
||||
|
||||
### Net Impact
|
||||
|
||||
- **~720 lines deleted** (net, after simplified replacements)
|
||||
- **~7 test files deleted or consolidated**
|
||||
- **2 branch types eliminated**
|
||||
- **2 merge strategies eliminated**
|
||||
- **1 dispatch unit type eliminated** (fix-merge)
|
||||
- **1 isolation mode eliminated** (branch)
|
||||
- **0 merge conflicts possible within a worktree**
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **M001 (Memory Database):** The SQLite database (`gsd.db`) must remain gitignored. The M001/S02 importer layer rebuilds it from tracked markdown. This PRD's `.gitignore` update explicitly ignores `gsd.db`.
|
||||
|
||||
- **PR #487:** Must be closed. The `resolveMainWorktreeRoot` approach (sharing `.gsd/` across worktrees) contradicts tracked-artifact architecture.
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Squash vs `--no-ff` for milestone→main merge?** Squash gives clean history on `main` but loses bisect granularity. `--no-ff` preserves granular commits but clutters `main`. Current proposal: squash (matching existing behavior), with option to preserve milestone branch for debugging.
|
||||
|
||||
2. **Should `worktrees/` move outside `.gsd/`?** Having worktrees inside `.gsd/` creates a nesting-doll pattern (worktree contains `.gsd/` which is inside `.gsd/worktrees/`). Relocating to `.gsd-worktrees/` or `~/.gsd/worktrees/<repo-hash>/` is cleaner but changes the filesystem layout. Recommendation: defer, address separately if it causes issues.
|
||||
|
||||
3. **Pre-flight rebase automation?** Before milestone→main squash-merge, should GSD automatically `git rebase main`? Gemini recommends yes. Risk: rebase can fail with conflicts, adding a code path. Recommendation: implement as a doctor check ("milestone branch is behind main by N commits") with manual resolution, automate later if needed.
|
||||
|
|
@ -1,53 +0,0 @@
|
|||
# GSD Documentation
|
||||
|
||||
Welcome to the GSD documentation. This covers everything from getting started to advanced configuration, auto-mode internals, and extending GSD with the Pi SDK.
|
||||
|
||||
## User Documentation
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [Getting Started](./getting-started.md) | Installation, first run, and basic usage |
|
||||
| [Auto Mode](./auto-mode.md) | How autonomous execution works — the state machine, crash recovery, and steering |
|
||||
| [Commands Reference](./commands.md) | All commands, keyboard shortcuts, and CLI flags |
|
||||
| [Remote Questions](./remote-questions.md) | Discord and Slack integration for headless auto-mode |
|
||||
| [Configuration](./configuration.md) | Preferences, model selection, git settings, and token profiles |
|
||||
| [Custom Models](./custom-models.md) | Add custom providers (Ollama, vLLM, LM Studio, proxies) via models.json |
|
||||
| [Token Optimization](./token-optimization.md) | Token profiles, context compression, complexity routing, and adaptive learning (v2.17) |
|
||||
| [Dynamic Model Routing](./dynamic-model-routing.md) | Complexity-based model selection, cost tables, escalation, and budget pressure (v2.19) |
|
||||
| [Captures & Triage](./captures-triage.md) | Fire-and-forget thought capture during auto-mode with automated triage (v2.19) |
|
||||
| [Workflow Visualizer](./visualizer.md) | Interactive TUI overlay for progress, dependencies, metrics, and timeline (v2.19) |
|
||||
| [Cost Management](./cost-management.md) | Budget ceilings, cost tracking, projections, and enforcement modes |
|
||||
| [Git Strategy](./git-strategy.md) | Worktree isolation, branching model, and merge behavior |
|
||||
| [Parallel Orchestration](./parallel-orchestration.md) | Run multiple milestones simultaneously with worker isolation and coordination |
|
||||
| [Working in Teams](./working-in-teams.md) | Unique milestone IDs, `.gitignore` setup, and shared planning artifacts |
|
||||
| [Skills](./skills.md) | Bundled skills, skill discovery, and custom skill authoring |
|
||||
| [Migration from v1](./migration.md) | Migrating `.planning` directories from the original GSD |
|
||||
| [Troubleshooting](./troubleshooting.md) | Common issues, `/gsd doctor` (real-time visibility v2.40), `/gsd forensics` (full debugger v2.40), and recovery procedures |
|
||||
| [Web Interface](./web-interface.md) | Browser-based project management with `pi --web` (v2.41) |
|
||||
| [VS Code Extension](../vscode-extension/README.md) | Chat participant, sidebar dashboard, and RPC integration for VS Code |
|
||||
|
||||
## Architecture & Internals
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [Architecture Overview](./architecture.md) | System design, extension model, state-on-disk, and dispatch pipeline |
|
||||
| [Native Engine](../native/README.md) | Rust N-API modules for performance-critical operations |
|
||||
| [ADR-001: Branchless Worktree Architecture](./ADR-001-branchless-worktree-architecture.md) | Decision record for the v2.14 git architecture |
|
||||
| [ADR-003: Pipeline Simplification](./ADR-003-pipeline-simplification.md) | Research merged into planning, mechanical completion (v2.30) |
|
||||
|
||||
## Pi SDK Documentation
|
||||
|
||||
These guides cover the underlying Pi SDK that GSD is built on. Useful if you want to extend GSD or build your own agent application.
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [What is Pi](./what-is-pi/README.md) | Core concepts — modes, agent loop, sessions, tools, providers |
|
||||
| [Extending Pi](./extending-pi/README.md) | Building extensions — tools, commands, UI, events, state |
|
||||
| [Context & Hooks](./context-and-hooks/README.md) | Context pipeline, hook reference, inter-extension communication |
|
||||
| [Pi UI / TUI](./pi-ui-tui/README.md) | Terminal UI components, theming, keyboard input, rendering |
|
||||
|
||||
## Research
|
||||
|
||||
| Guide | Description |
|
||||
|-------|-------------|
|
||||
| [Building Coding Agents](./building-coding-agents/README.md) | Research notes on agent design — decomposition, context engineering, cost/quality tradeoffs |
|
||||
|
|
@ -1,222 +0,0 @@
|
|||
# Agent Knowledge Index
|
||||
|
||||
Use this file as a machine-operational routing table for pi docs and research references.
|
||||
|
||||
Rules:
|
||||
|
||||
- Read only the specific files relevant to the current task.
|
||||
- Prefer the primary bundle first.
|
||||
- Read files in parallel when the task clearly maps to multiple known references.
|
||||
- Use absolute paths directly with `read`.
|
||||
- Follow conditional references only when the primary bundle does not answer the question.
|
||||
|
||||
## Pi architecture
|
||||
|
||||
Use when:
|
||||
|
||||
- understanding how pi works end to end
|
||||
- tracing subsystem relationships
|
||||
- understanding sessions, compaction, models, tools, or prompt flow
|
||||
- deciding how to embed pi in a branded app, custom CLI, desktop app, or web product
|
||||
|
||||
Read first:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/01-what-pi-is.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/04-the-architecture-how-everything-fits-together.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/05-the-agent-loop-how-pi-thinks.md`
|
||||
|
||||
Read together when relevant:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/06-tools-how-pi-acts-on-the-world.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/07-sessions-memory-that-branches.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/08-compaction-how-pi-manages-context-limits.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/09-the-customization-stack.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/10-providers-models-multi-model-by-default.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/13-context-files-project-instructions.md`
|
||||
|
||||
Follow-up if needed:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/03-the-four-modes-of-operation.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/11-the-interactive-tui.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/12-the-message-queue-talking-while-pi-thinks.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/14-the-sdk-rpc-embedding-pi.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/15-pi-packages-the-ecosystem.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/16-why-pi-matters-what-makes-it-different.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/17-file-reference-all-documentation.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/18-quick-reference-commands-shortcuts.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/what-is-pi/19-building-branded-apps-on-top-of-pi.md`
|
||||
|
||||
## Context engineering, hooks, and context flow
|
||||
|
||||
Use when:
|
||||
|
||||
- understanding how user prompts flow through to the LLM
|
||||
- working with before_agent_start, context, tool_call, tool_result, input hooks
|
||||
- injecting, filtering, or transforming LLM context
|
||||
- understanding message types and what the LLM actually sees
|
||||
- coordinating multiple extensions
|
||||
- building mode systems, presets, or context management extensions
|
||||
- debugging why the LLM does or doesn't see certain information
|
||||
|
||||
Read first:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/01-the-context-pipeline.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/02-hook-reference.md`
|
||||
|
||||
Read together when relevant:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/03-context-injection-patterns.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/04-message-types-and-llm-visibility.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/05-inter-extension-communication.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/06-advanced-patterns-from-source.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/context-and-hooks/07-the-system-prompt-anatomy.md`
|
||||
|
||||
## Extension development
|
||||
|
||||
Use when:
|
||||
|
||||
- building or modifying extensions
|
||||
- adding tools, commands, hooks, renderers, state, or packaging
|
||||
|
||||
Read first:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/01-what-are-extensions.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/02-architecture-mental-model.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/03-getting-started.md`
|
||||
|
||||
Read together when relevant:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/06-the-extension-lifecycle.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/07-events-the-nervous-system.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/08-extensioncontext-what-you-can-access.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/09-extensionapi-what-you-can-do.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/10-custom-tools-giving-the-llm-new-abilities.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/11-custom-commands-user-facing-actions.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/14-custom-rendering-controlling-what-the-user-sees.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/25-slash-command-subcommand-patterns.md` # for subcommand-style slash command UX via getArgumentCompletions()
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/15-system-prompt-modification.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/22-key-rules-gotchas.md`
|
||||
|
||||
Follow-up if needed:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/04-extension-locations-discovery.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/05-extension-structure-styles.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/12-custom-ui-visual-components.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/13-state-management-persistence.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/16-compaction-session-control.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/17-model-provider-management.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/18-remote-execution-tool-overrides.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/19-packaging-distribution.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/20-mode-behavior.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/21-error-handling.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/23-file-reference-documentation.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/24-file-reference-example-extensions.md`
|
||||
|
||||
## Pi UI and TUI
|
||||
|
||||
Use when:
|
||||
|
||||
- building dialogs, widgets, overlays, custom editors, or UI renderers
|
||||
- working on TUI layout or display behavior
|
||||
|
||||
Read first:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/01-the-ui-architecture.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/03-entry-points-how-ui-gets-on-screen.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/22-quick-reference-all-ui-apis.md`
|
||||
|
||||
Read together when relevant:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/04-built-in-dialog-methods.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/05-persistent-ui-elements.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/06-ctx-ui-custom-full-custom-components.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/07-built-in-components-the-building-blocks.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/12-overlays-floating-modals-and-panels.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/13-custom-editors-replacing-the-input.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/14-tool-rendering-custom-tool-display.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/15-message-rendering-custom-message-display.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/21-common-mistakes-and-how-to-avoid-them.md`
|
||||
|
||||
Follow-up if needed:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/02-the-component-interface-foundation-of-everything.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/08-high-level-components-from-pi-coding-agent.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/09-keyboard-input-how-to-handle-keys.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/10-line-width-the-cardinal-rule.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/11-theming-colors-and-styles.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/16-performance-caching-and-invalidation.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/17-theme-changes-and-invalidation.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/18-ime-support-the-focusable-interface.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/19-building-a-complete-component-step-by-step.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/20-real-world-patterns-from-examples.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/pi-ui-tui/23-file-reference-example-extensions-with-ui.md`
|
||||
|
||||
## Building coding agents
|
||||
|
||||
Use when:
|
||||
|
||||
- designing agent behavior
|
||||
- improving autonomy, speed, context handling, or decomposition
|
||||
- solving hard ambiguity, safety, or verification problems
|
||||
|
||||
Read first:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/01-work-decomposition.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/06-maximizing-agent-autonomy-superpowers.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/11-god-tier-context-engineering.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/12-handling-ambiguity-contradiction.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/26-cross-cutting-themes-where-all-4-models-converge.md`
|
||||
|
||||
Read together when relevant:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/03-state-machine-context-management.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/04-optimal-storage-for-project-context.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/05-parallelization-strategy.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/07-system-prompt-llm-vs-deterministic-split.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/08-speed-optimization.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/10-top-10-pitfalls-to-avoid.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/17-irreversible-operations-safety-architecture.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/20-error-taxonomy-routing.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/24-security-trust-boundaries.md`
|
||||
|
||||
Follow-up if needed:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/02-what-to-keep-discard-from-human-engineering.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/09-top-10-tips-for-a-world-class-agent.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/13-long-running-memory-fidelity.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/14-multi-agent-semantic-conflict-resolution.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/15-legacy-code-brownfield-onboarding.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/16-encoding-taste-aesthetics.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/18-the-handoff-problem-agent-human-maintainability.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/19-when-to-scrap-and-start-over.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/21-cost-quality-tradeoff-model-routing.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/22-cross-project-learning-reusable-intelligence.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/23-evolution-across-project-scale.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/building-coding-agents/25-designing-for-non-technical-users-vibe-coders.md`
|
||||
|
||||
## Pi product docs
|
||||
|
||||
Use when:
|
||||
|
||||
- the user asks about pi itself, its SDK, extensions, themes, skills, packages, TUI, prompt templates, keybindings, or custom providers
|
||||
|
||||
Read first:
|
||||
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/README.md`
|
||||
|
||||
Read together when relevant:
|
||||
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/extensions.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/themes.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/skills.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/prompt-templates.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/tui.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/keybindings.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/sdk.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/custom-provider.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/models.md`
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/packages.md`
|
||||
|
||||
Follow-up if needed:
|
||||
|
||||
- `/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples`
|
||||
|
|
@ -1,162 +0,0 @@
|
|||
# Architecture Overview
|
||||
|
||||
GSD is a TypeScript application built on the [Pi SDK](https://github.com/badlogic/pi-mono). It embeds the Pi coding agent and extends it with the GSD workflow engine, auto mode state machine, and project management primitives.
|
||||
|
||||
## System Structure
|
||||
|
||||
```
|
||||
gsd (CLI binary)
|
||||
└─ loader.ts Sets PI_PACKAGE_DIR, GSD env vars, dynamic-imports cli.ts
|
||||
└─ cli.ts Wires SDK managers, loads extensions, starts InteractiveMode
|
||||
├─ onboarding.ts First-run setup wizard (LLM provider + tool keys)
|
||||
├─ wizard.ts Env hydration from stored auth.json credentials
|
||||
├─ app-paths.ts ~/.gsd/agent/, ~/.gsd/sessions/, auth.json
|
||||
├─ resource-loader.ts Syncs bundled extensions + agents to ~/.gsd/agent/
|
||||
└─ src/resources/
|
||||
├─ extensions/gsd/ Core GSD extension
|
||||
├─ extensions/... 12 supporting extensions
|
||||
├─ agents/ scout, researcher, worker
|
||||
├─ AGENTS.md Agent routing instructions
|
||||
└─ GSD-WORKFLOW.md Manual bootstrap protocol
|
||||
|
||||
gsd headless Headless mode — CI/cron orchestration via RPC child process
|
||||
gsd --mode mcp MCP server mode — exposes tools over stdin/stdout
|
||||
|
||||
vscode-extension/ VS Code extension — chat participant (@gsd), sidebar dashboard, RPC integration
|
||||
```
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
### State Lives on Disk
|
||||
|
||||
`.gsd/` is the sole source of truth. Auto mode reads it, writes it, and advances based on what it finds. No in-memory state survives across sessions. This enables crash recovery, multi-terminal steering, and session resumption.
|
||||
|
||||
### Two-File Loader Pattern
|
||||
|
||||
`loader.ts` sets all environment variables with zero SDK imports, then dynamically imports `cli.ts` which does static SDK imports. This ensures `PI_PACKAGE_DIR` is set before any SDK code evaluates.
|
||||
|
||||
### `pkg/` Shim Directory
|
||||
|
||||
`PI_PACKAGE_DIR` points to `pkg/` (not project root) to avoid Pi's theme resolution colliding with GSD's `src/` directory. Contains only `piConfig` and theme assets.
|
||||
|
||||
### Always-Overwrite Sync
|
||||
|
||||
Bundled extensions and agents are synced to `~/.gsd/agent/` on every launch, not just first run. This means `npm update -g` takes effect immediately.
|
||||
|
||||
### Lazy Provider Loading
|
||||
|
||||
LLM provider SDKs (Anthropic, OpenAI, Google, etc.) are lazy-loaded on first use rather than imported at startup. This significantly reduces cold-start time — only the provider you actually connect to gets loaded.
|
||||
|
||||
### Fresh Session Per Unit
|
||||
|
||||
Every dispatch creates a new agent session. The LLM starts with a clean context window containing only the pre-inlined artifacts it needs. This prevents quality degradation from context accumulation.
|
||||
|
||||
## Bundled Extensions
|
||||
|
||||
| Extension | What It Provides |
|
||||
|-----------|-----------------|
|
||||
| **GSD** | Core workflow engine — auto mode, state machine, commands, dashboard |
|
||||
| **Browser Tools** | Playwright-based browser automation — navigation, forms, screenshots, PDF export, device emulation, visual regression, structured data extraction, route mocking, accessibility tree inspection, and semantic actions |
|
||||
| **Search the Web** | Brave Search, Tavily, or Jina page extraction |
|
||||
| **Google Search** | Gemini-powered web search with AI-synthesized answers |
|
||||
| **Context7** | Up-to-date library/framework documentation |
|
||||
| **Background Shell** | Long-running process management with readiness detection |
|
||||
| **Subagent** | Delegated tasks with isolated context windows |
|
||||
| **Mac Tools** | macOS native app automation via Accessibility APIs |
|
||||
| **MCP Client** | Native MCP server integration via @modelcontextprotocol/sdk |
|
||||
| **Voice** | Real-time speech-to-text (macOS, Linux) |
|
||||
| **Slash Commands** | Custom command creation |
|
||||
| **LSP** | Language Server Protocol — diagnostics, definitions, references, hover, rename |
|
||||
| **Ask User Questions** | Structured user input with single/multi-select |
|
||||
| **Secure Env Collect** | Masked secret collection |
|
||||
| **Async Jobs** | Background command execution with `async_bash`, `await_job`, `cancel_job` |
|
||||
| **Remote Questions** | Discord, Slack, and Telegram integration for headless question routing |
|
||||
| **TTSR** | Tool-triggered system rules — conditional context injection based on tool usage |
|
||||
| **Universal Config** | Discovery of existing AI tool configurations (Claude Code, Cursor, Windsurf, etc.) |
|
||||
|
||||
## Bundled Agents
|
||||
|
||||
| Agent | Role |
|
||||
|-------|------|
|
||||
| **Scout** | Fast codebase recon — compressed context for handoff |
|
||||
| **Researcher** | Web research — finds and synthesizes current information |
|
||||
| **Worker** | General-purpose execution in an isolated context window |
|
||||
|
||||
## Native Engine
|
||||
|
||||
Performance-critical operations use a Rust N-API engine:
|
||||
|
||||
- **grep** — ripgrep-backed content search
|
||||
- **glob** — gitignore-aware file discovery
|
||||
- **ps** — cross-platform process tree management
|
||||
- **highlight** — syntect-based syntax highlighting
|
||||
- **ast** — structural code search via ast-grep
|
||||
- **diff** — fuzzy text matching and unified diff generation
|
||||
- **text** — ANSI-aware text measurement and wrapping
|
||||
- **html** — HTML-to-Markdown conversion
|
||||
- **image** — decode, encode, resize images
|
||||
- **fd** — fuzzy file path discovery
|
||||
- **clipboard** — native clipboard access
|
||||
- **git** — libgit2-backed git read operations (v2.16+)
|
||||
- **parser** — GSD file parsing and frontmatter extraction
|
||||
|
||||
## Dispatch Pipeline
|
||||
|
||||
The auto mode dispatch pipeline:
|
||||
|
||||
```
|
||||
1. Read disk state (STATE.md, roadmap, plans)
|
||||
2. Determine next unit type and ID
|
||||
3. Classify complexity → select model tier
|
||||
4. Apply budget pressure adjustments
|
||||
5. Check routing history for adaptive adjustments
|
||||
6. Dynamic model routing (if enabled) → select cheapest model for tier
|
||||
7. Resolve effective model (with fallbacks)
|
||||
8. Check pending captures → triage if needed
|
||||
9. Build dispatch prompt (applying inline level compression)
|
||||
10. Create fresh agent session
|
||||
11. Inject prompt and let LLM execute
|
||||
12. On completion: snapshot metrics, verify artifacts, persist state
|
||||
13. Loop to step 1
|
||||
```
|
||||
|
||||
Phase skipping (from token profile) gates steps 2-3: if a phase is skipped, the corresponding unit type is never dispatched.
|
||||
|
||||
## Key Modules (v2.33)
|
||||
|
||||
| Module | Purpose |
|
||||
|--------|---------|
|
||||
| `auto.ts` | Auto-mode state machine and orchestration |
|
||||
| `auto/session.ts` | `AutoSession` class — all mutable auto-mode state in one encapsulated instance |
|
||||
| `auto-dispatch.ts` | Declarative dispatch table (phase → unit mapping) |
|
||||
| `auto-idempotency.ts` | Completed-key checks, skip loop detection, key eviction |
|
||||
| `auto-stuck-detection.ts` | Stuck loop recovery and unit retry escalation |
|
||||
| `auto-start.ts` | Fresh-start bootstrap — git/state init, crash lock detection, worktree setup |
|
||||
| `auto-post-unit.ts` | Post-unit processing — commit, doctor, state rebuild, hooks |
|
||||
| `auto-verification.ts` | Post-unit verification gate (lint/test/typecheck with auto-fix retries) |
|
||||
| `auto-prompts.ts` | Prompt builders with inline level compression |
|
||||
| `auto-worktree.ts` | Worktree lifecycle (create, enter, merge, teardown) |
|
||||
| `auto-recovery.ts` | Expected artifact resolution, completed-key persistence, self-healing |
|
||||
| `auto-timeout-recovery.ts` | Timed-out unit recovery and continuation |
|
||||
| `auto-timers.ts` | Unit supervision — soft/idle/hard timeouts, continue-here monitor |
|
||||
| `complexity-classifier.ts` | Unit complexity classification (light/standard/heavy) |
|
||||
| `model-router.ts` | Dynamic model routing with cost-aware selection |
|
||||
| `model-cost-table.ts` | Built-in per-model cost data for cross-provider comparison |
|
||||
| `routing-history.ts` | Adaptive learning from routing outcomes |
|
||||
| `captures.ts` | Fire-and-forget thought capture and triage classification |
|
||||
| `triage-resolution.ts` | Capture resolution (inject, defer, replan, quick-task) |
|
||||
| `visualizer-overlay.ts` | Workflow visualizer TUI overlay |
|
||||
| `visualizer-data.ts` | Data loading for visualizer tabs |
|
||||
| `visualizer-views.ts` | Tab renderers (progress, deps, metrics, timeline, discussion status) |
|
||||
| `metrics.ts` | Token and cost tracking ledger |
|
||||
| `state.ts` | State derivation from disk |
|
||||
| `session-lock.ts` | OS-level exclusive session locking (proper-lockfile) |
|
||||
| `crash-recovery.ts` | Lock file management for crash detection and recovery |
|
||||
| `preferences.ts` | Preference loading, merging, validation |
|
||||
| `git-service.ts` | Git operations — commit, merge, worktree sync, completed-units cross-boundary sync |
|
||||
| `unit-id.ts` | Centralized `parseUnitId()` — milestone/slice/task extraction from unit IDs |
|
||||
| `error-utils.ts` | `getErrorMessage()` — unified error-to-string conversion |
|
||||
| `roadmap-slices.ts` | Roadmap parser with prose fallback for LLM-generated variants |
|
||||
| `memory-extractor.ts` | Extract reusable knowledge from session transcripts |
|
||||
| `memory-store.ts` | Persistent memory store for cross-session knowledge |
|
||||
| `queue-order.ts` | Milestone queue ordering |
|
||||
|
|
@ -1,301 +0,0 @@
|
|||
# Auto Mode
|
||||
|
||||
Auto mode is GSD's autonomous execution engine. Run `/gsd auto`, walk away, come back to built software with clean git history.
|
||||
|
||||
## How It Works
|
||||
|
||||
Auto mode is a **state machine driven by files on disk**. It reads `.gsd/STATE.md`, determines the next unit of work, creates a fresh agent session, injects a focused prompt with all relevant context pre-inlined, and lets the LLM execute. When the LLM finishes, auto mode reads disk state again and dispatches the next unit.
|
||||
|
||||
### The Loop
|
||||
|
||||
Each slice flows through phases automatically:
|
||||
|
||||
```
|
||||
Plan (with integrated research) → Execute (per task) → Complete → Reassess Roadmap → Next Slice
|
||||
↓ (all slices done)
|
||||
Validate Milestone → Complete Milestone
|
||||
```
|
||||
|
||||
- **Plan** — scouts the codebase, researches relevant docs, and decomposes the slice into tasks with must-haves
|
||||
- **Execute** — runs each task in a fresh context window
|
||||
- **Complete** — writes summary, UAT script, marks roadmap, commits
|
||||
- **Reassess** — checks if the roadmap still makes sense
|
||||
- **Validate Milestone** — reconciliation gate after all slices complete; compares roadmap success criteria against actual results, catches gaps before sealing the milestone
|
||||
|
||||
## Key Properties
|
||||
|
||||
### Fresh Session Per Unit
|
||||
|
||||
Every task, research phase, and planning step gets a clean context window. No accumulated garbage. No degraded quality from context bloat. The dispatch prompt includes everything needed — task plans, prior summaries, dependency context, decisions register — so the LLM starts oriented instead of spending tool calls reading files.
|
||||
|
||||
### Context Pre-Loading
|
||||
|
||||
The dispatch prompt is carefully constructed with:
|
||||
|
||||
| Inlined Artifact | Purpose |
|
||||
|------------------|---------|
|
||||
| Task plan | What to build |
|
||||
| Slice plan | Where this task fits |
|
||||
| Prior task summaries | What's already done |
|
||||
| Dependency summaries | Cross-slice context |
|
||||
| Roadmap excerpt | Overall direction |
|
||||
| Decisions register | Architectural context |
|
||||
|
||||
The amount of context inlined is controlled by your [token profile](./token-optimization.md). Budget mode inlines minimal context; quality mode inlines everything.
|
||||
|
||||
### Git Isolation
|
||||
|
||||
GSD isolates milestone work using one of three modes (configured via `git.isolation` in preferences):
|
||||
|
||||
- **`worktree`** (default): Each milestone runs in its own git worktree at `.gsd/worktrees/<MID>/` on a `milestone/<MID>` branch. All slice work commits sequentially — no branch switching, no merge conflicts mid-milestone. When the milestone completes, it's squash-merged to main as one clean commit.
|
||||
- **`branch`**: Work happens in the project root on a `milestone/<MID>` branch. Useful for submodule-heavy repos where worktrees don't work well.
|
||||
- **`none`**: Work happens directly on your current branch. No worktree, no milestone branch. Ideal for hot-reload workflows where file isolation breaks dev tooling.
|
||||
|
||||
See [Git Strategy](./git-strategy.md) for details.
|
||||
|
||||
### Parallel Execution
|
||||
|
||||
When your project has independent milestones, you can run them simultaneously. Each milestone gets its own worker process and worktree. See [Parallel Orchestration](./parallel-orchestration.md) for setup and usage.
|
||||
|
||||
### Crash Recovery
|
||||
|
||||
A lock file tracks the current unit. If the session dies, the next `/gsd auto` reads the surviving session file, synthesizes a recovery briefing from every tool call that made it to disk, and resumes with full context.
|
||||
|
||||
**Headless auto-restart (v2.26):** When running `gsd headless auto`, crashes trigger automatic restart with exponential backoff (5s → 10s → 30s cap, default 3 attempts). Configure with `--max-restarts N`. SIGINT/SIGTERM bypasses restart. Combined with crash recovery, this enables true overnight "run until done" execution.
|
||||
|
||||
### Provider Error Recovery
|
||||
|
||||
GSD classifies provider errors and auto-resumes when safe:
|
||||
|
||||
| Error type | Examples | Action |
|
||||
|-----------|----------|--------|
|
||||
| **Rate limit** | 429, "too many requests" | Auto-resume after retry-after header or 60s |
|
||||
| **Server error** | 500, 502, 503, "overloaded", "api_error" | Auto-resume after 30s |
|
||||
| **Permanent** | "unauthorized", "invalid key", "billing" | Pause indefinitely (requires manual resume) |
|
||||
|
||||
No manual intervention needed for transient errors — the session pauses briefly and continues automatically.
|
||||
|
||||
### Incremental Memory (v2.26)
|
||||
|
||||
GSD maintains a `KNOWLEDGE.md` file — an append-only register of project-specific rules, patterns, and lessons learned. The agent reads it at the start of every unit and appends to it when discovering recurring issues, non-obvious patterns, or rules that future sessions should follow. This gives auto-mode cross-session memory that survives context window boundaries.
|
||||
|
||||
### Context Pressure Monitor (v2.26)
|
||||
|
||||
When context usage reaches 70%, GSD sends a wrap-up signal to the agent, nudging it to finish durable output (commit, write summaries) before the context window fills. This prevents sessions from hitting the hard context limit mid-task with no artifacts written.
|
||||
|
||||
### Meaningful Commit Messages (v2.26)
|
||||
|
||||
Commits are generated from task summaries — not generic "complete task" messages. Each commit message reflects what was actually built, giving clean `git log` output that reads like a changelog.
|
||||
|
||||
### Stuck Detection (v2.39)
|
||||
|
||||
GSD uses a sliding-window analysis to detect stuck loops. Instead of a simple "same unit dispatched twice" counter, the detector examines recent dispatch history for repeated patterns — catching cycles like A→B→A→B as well as single-unit repeats. On detection, GSD retries once with a deep diagnostic prompt. If it fails again, auto mode stops with the exact file it expected, so you can intervene.
|
||||
|
||||
The sliding-window approach reduces false positives on legitimate retries (e.g., verification failures that self-correct) while catching genuine stuck loops faster.
|
||||
|
||||
### Post-Mortem Investigation (v2.40)
|
||||
|
||||
`/gsd forensics` is a full-access GSD debugger for post-mortem analysis of auto-mode failures. It provides:
|
||||
|
||||
- **Anomaly detection** — structured identification of stuck loops, cost spikes, timeouts, missing artifacts, and crashes with severity levels
|
||||
- **Unit traces** — last 10 unit executions with error details and execution times
|
||||
- **Metrics analysis** — cost, token counts, and execution time breakdowns
|
||||
- **Doctor integration** — includes structural health issues from `/gsd doctor`
|
||||
- **LLM-guided investigation** — an agent session with full tool access to investigate root causes
|
||||
|
||||
```
|
||||
/gsd forensics [optional problem description]
|
||||
```
|
||||
|
||||
See [Troubleshooting](./troubleshooting.md) for more on diagnosing issues.
|
||||
|
||||
### Timeout Supervision
|
||||
|
||||
Three timeout tiers prevent runaway sessions:
|
||||
|
||||
| Timeout | Default | Behavior |
|
||||
|---------|---------|----------|
|
||||
| Soft | 20 min | Warns the LLM to wrap up |
|
||||
| Idle | 10 min | Detects stalls, intervenes |
|
||||
| Hard | 30 min | Pauses auto mode |
|
||||
|
||||
Recovery steering nudges the LLM to finish durable output before timing out. Configure in preferences:
|
||||
|
||||
```yaml
|
||||
auto_supervisor:
|
||||
soft_timeout_minutes: 20
|
||||
idle_timeout_minutes: 10
|
||||
hard_timeout_minutes: 30
|
||||
```
|
||||
|
||||
### Cost Tracking
|
||||
|
||||
Every unit's token usage and cost is captured, broken down by phase, slice, and model. The dashboard shows running totals and projections. Budget ceilings can pause auto mode before overspending.
|
||||
|
||||
See [Cost Management](./cost-management.md).
|
||||
|
||||
### Adaptive Replanning
|
||||
|
||||
After each slice completes, the roadmap is reassessed. If the work revealed new information that changes the plan, slices are reordered, added, or removed before continuing. This can be skipped with the `balanced` or `budget` token profiles.
|
||||
|
||||
### Verification Enforcement (v2.26)
|
||||
|
||||
Configure shell commands that run automatically after every task execution:
|
||||
|
||||
```yaml
|
||||
verification_commands:
|
||||
- npm run lint
|
||||
- npm run test
|
||||
verification_auto_fix: true # auto-retry on failure (default)
|
||||
verification_max_retries: 2 # max retry attempts (default: 2)
|
||||
```
|
||||
|
||||
Failures trigger auto-fix retries — the agent sees the verification output and attempts to fix the issues before advancing. This ensures code quality gates are enforced mechanically, not by LLM compliance.
|
||||
|
||||
### Slice Discussion Gate (v2.26)
|
||||
|
||||
For projects where you want human review before each slice begins:
|
||||
|
||||
```yaml
|
||||
require_slice_discussion: true
|
||||
```
|
||||
|
||||
Auto-mode pauses before each slice, presenting the slice context for discussion. After you confirm, execution continues. Useful for high-stakes projects where you want to review the plan before the agent builds.
|
||||
|
||||
### HTML Reports (v2.26)
|
||||
|
||||
After a milestone completes, GSD auto-generates a self-contained HTML report in `.gsd/reports/`. Reports include project summary, progress tree, slice dependency graph (SVG DAG), cost/token metrics with bar charts, execution timeline, changelog, and knowledge base. No external dependencies — all CSS and JS are inlined.
|
||||
|
||||
```yaml
|
||||
auto_report: true # enabled by default
|
||||
```
|
||||
|
||||
Generate manually anytime with `/gsd export --html`, or generate reports for all milestones at once with `/gsd export --html --all` (v2.28).
|
||||
|
||||
### Failure Recovery (v2.28)
|
||||
|
||||
v2.28 hardens auto-mode reliability with multiple safeguards: atomic file writes prevent corruption on crash, OAuth fetch timeouts (30s) prevent indefinite hangs, RPC subprocess exit is detected and reported, and blob garbage collection prevents unbounded disk growth. Combined with the existing crash recovery and headless auto-restart, auto-mode is designed for true "fire and forget" overnight execution.
|
||||
|
||||
### Pipeline Architecture (v2.40)
|
||||
|
||||
The auto-loop is structured as a linear phase pipeline rather than recursive dispatch. Each iteration flows through explicit stages:
|
||||
|
||||
1. **Pre-Dispatch** — validate state, check guards, resolve model preferences
|
||||
2. **Dispatch** — execute the unit with a focused prompt
|
||||
3. **Post-Unit** — close out the unit, update caches, run cleanup
|
||||
4. **Verification** — optional validation gate (lint, test, etc.)
|
||||
5. **Stuck Detection** — sliding-window pattern analysis
|
||||
|
||||
This linear flow is easier to debug, uses less memory (no recursive call stack), and provides cleaner error recovery since each phase has well-defined entry and exit conditions.
|
||||
|
||||
### Real-Time Health Visibility (v2.40)
|
||||
|
||||
Doctor issues (from `/gsd doctor`) now surface in real time across three places:
|
||||
|
||||
- **Dashboard widget** — health indicator with issue count and severity
|
||||
- **Workflow visualizer** — issues shown in the status panel
|
||||
- **HTML reports** — health section with all issues at report generation time
|
||||
|
||||
Issues are classified by severity: `error` (blocks auto-mode), `warning` (non-blocking), and `info` (advisory). Auto-mode checks health at dispatch time and can pause on critical issues.
|
||||
|
||||
### Skill Activation in Prompts (v2.39)
|
||||
|
||||
Configured skills are automatically resolved and injected into dispatch prompts. The agent receives an "Available Skills" block listing skills that match the current context, based on:
|
||||
|
||||
- `always_use_skills` — always included
|
||||
- `prefer_skills` — included with preference indicator
|
||||
- `skill_rules` — conditional activation based on `when` clauses
|
||||
|
||||
See [Configuration](./configuration.md) for skill routing preferences.
|
||||
|
||||
## Controlling Auto Mode
|
||||
|
||||
### Start
|
||||
|
||||
```
|
||||
/gsd auto
|
||||
```
|
||||
|
||||
### Pause
|
||||
|
||||
Press **Escape**. The conversation is preserved. You can interact with the agent, inspect state, or resume.
|
||||
|
||||
### Resume
|
||||
|
||||
```
|
||||
/gsd auto
|
||||
```
|
||||
|
||||
Auto mode reads disk state and picks up where it left off.
|
||||
|
||||
### Stop
|
||||
|
||||
```
|
||||
/gsd stop
|
||||
```
|
||||
|
||||
Stops auto mode gracefully. Can be run from a different terminal.
|
||||
|
||||
### Steer
|
||||
|
||||
```
|
||||
/gsd steer
|
||||
```
|
||||
|
||||
Hard-steer plan documents during execution without stopping the pipeline. Changes are picked up at the next phase boundary.
|
||||
|
||||
### Capture
|
||||
|
||||
```
|
||||
/gsd capture "add rate limiting to API endpoints"
|
||||
```
|
||||
|
||||
Fire-and-forget thought capture. Captures are triaged automatically between tasks. See [Captures & Triage](./captures-triage.md).
|
||||
|
||||
### Visualize
|
||||
|
||||
```
|
||||
/gsd visualize
|
||||
```
|
||||
|
||||
Open the workflow visualizer — interactive tabs for progress, dependencies, metrics, and timeline. See [Workflow Visualizer](./visualizer.md).
|
||||
|
||||
## Dashboard
|
||||
|
||||
`Ctrl+Alt+G` or `/gsd status` shows real-time progress:
|
||||
|
||||
- Current milestone, slice, and task
|
||||
- Auto mode elapsed time and phase
|
||||
- Per-unit cost and token breakdown
|
||||
- Cost projections
|
||||
- Completed and in-progress units
|
||||
- Pending capture count (when captures are awaiting triage)
|
||||
- Parallel worker status (when running parallel milestones — includes 80% budget alert)
|
||||
|
||||
## Phase Skipping
|
||||
|
||||
Token profiles can skip certain phases to reduce cost:
|
||||
|
||||
| Phase | `budget` | `balanced` | `quality` |
|
||||
|-------|----------|------------|-----------|
|
||||
| Milestone Research | Skipped | Runs | Runs |
|
||||
| Slice Research | Skipped | Skipped | Runs |
|
||||
| Reassess Roadmap | Skipped | Runs | Runs |
|
||||
|
||||
See [Token Optimization](./token-optimization.md) for details.
|
||||
|
||||
## Dynamic Model Routing
|
||||
|
||||
When enabled, auto-mode automatically selects cheaper models for simple units (slice completion, UAT) and reserves expensive models for complex work (replanning, architectural tasks). See [Dynamic Model Routing](./dynamic-model-routing.md).
|
||||
|
||||
## Reactive Task Execution (v2.38)
|
||||
|
||||
When `reactive_execution: true` is set in preferences, GSD derives a dependency graph from IO annotations in task plans. Tasks that don't conflict (no shared file reads/writes) are dispatched in parallel via subagents, while dependent tasks wait for their predecessors to complete.
|
||||
|
||||
```yaml
|
||||
reactive_execution: true # disabled by default
|
||||
```
|
||||
|
||||
The graph derivation is pure and deterministic — it resolves a ready-set of tasks, detects conflicts, and guards against deadlocks. Verification results carry forward across parallel batches, so tasks that pass verification don't need to be re-verified when subsequent tasks in the same slice complete.
|
||||
|
||||
The implementation lives in `reactive-graph.ts` (graph derivation, ready-set resolution, conflict/deadlock detection) with integration into `auto-dispatch.ts` and `auto-prompts.ts`.
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
# Work Decomposition
|
||||
|
||||
**The universal consensus:** Elite engineers never jump from vision to code. They use **progressive decomposition** through layers of abstraction.
|
||||
|
||||
### The Compression Ladder
|
||||
|
||||
```
|
||||
Vision → Capabilities → Systems/Architecture → Features → Tasks
|
||||
```
|
||||
|
||||
Each layer answers a different question:
|
||||
|
||||
| Layer | Question |
|
||||
|-------|----------|
|
||||
| Vision | What world are we creating? |
|
||||
| Capabilities | What must the product be able to do? |
|
||||
| Systems | What infrastructure enables those capabilities? |
|
||||
| Features | What does the user interact with? |
|
||||
| Tasks | What exact code gets written? |
|
||||
|
||||
### Core Principles (All 4 Models Agree)
|
||||
|
||||
- **Start with outcomes, not features.** Define "done" before anything else. Not "build a login page" but "a user can securely access their dashboard using OAuth."
|
||||
- **Vertical slices over horizontal layers.** Build thin end-to-end slices (UI → API → DB) rather than completing all backend before all frontend. Each slice is independently demoable and testable.
|
||||
- **The 1-Day Rule.** If a task takes longer than a day, it's not a task — it's a milestone. Break it down further until each item is a single, clear action completable in one sitting.
|
||||
- **Risk-first exploration.** Identify the hardest/most uncertain parts first. Spike on unknowns before committing to architecture. "Kill the biggest risks while they are still cheap to fix."
|
||||
- **Interface-first design.** Define contracts between components before building them. This enables parallel work and creates natural verification checkpoints.
|
||||
- **MECE decomposition.** Tasks should be Mutually Exclusive (no overlap) and Collectively Exhaustive (complete the vision when all are done).
|
||||
|
||||
### The Recursive Heuristic
|
||||
|
||||
> If something feels fuzzy, break it down one level deeper. Keep decomposing until a task is obvious how to start.
|
||||
|
||||
---
|
||||
|
|
@ -1,38 +0,0 @@
|
|||
# What to Keep & Discard from Human Engineering
|
||||
|
||||
### KEEP & Amplify
|
||||
|
||||
| Practice | Why It Matters More for AI |
|
||||
|----------|---------------------------|
|
||||
| **Clear product intent & experience specs** | AI needs direction, not instructions. "How should it feel?" drives architecture. |
|
||||
| **Acceptance criteria as the backbone** | Becomes TDD at its logical extreme — human writes tests in natural language, AI makes them true. |
|
||||
| **Vertical slicing** | Even more critical — prevents AI from going deep down a wrong path fast and confidently. |
|
||||
| **Interface-first approach** | Creates natural checkpoints, makes systems modular and replaceable. |
|
||||
| **Explicit constraints & non-functional requirements** | Narrows the search space. Without them AI may produce technically correct but strategically wrong systems. |
|
||||
| **Architecture Decision Records (ADRs)** | Prevents AI from "accidentally" undoing decisions made weeks ago. |
|
||||
| **Feedback loops** | Build → test → observe → refine. Accelerated to machine speed. |
|
||||
|
||||
### DISCARD
|
||||
|
||||
| Practice | Why It's Dead Weight |
|
||||
|----------|---------------------|
|
||||
| **Estimation rituals** (story points, velocity, sprint planning) | AI doesn't get tired, doesn't context-switch, works at machine speed. |
|
||||
| **Communication overhead** (standups, design reviews, PR reviews) | Only one communication channel matters: human ↔ agent. |
|
||||
| **Manual code review for style** | Automated linting + formatting handles this deterministically. |
|
||||
| **Step-by-step instructions** | Provide outcomes, not "how." |
|
||||
| **Heavy upfront documentation** | AI can read the entire repo instantly. Document *intent* and *why*, not *how*. |
|
||||
| **Gradual skill-building** | No ramp-up, no knowledge silos, no "only Sarah knows how that module works." |
|
||||
| **Defensive architecture against human error** | Tests still needed, but for a different reason: verifying AI's interpretation of intent. |
|
||||
|
||||
### The New Human Role
|
||||
|
||||
| Responsibility | Description |
|
||||
|---------------|-------------|
|
||||
| **Defining "good"** | Vision, personas, experience specs, success metrics |
|
||||
| **Taste & judgment** | Aesthetics, emotional experience, brand voice |
|
||||
| **Strategic decisions** | Which problems matter, product pivots |
|
||||
| **Gut checks at milestones** | Does this *feel* right? |
|
||||
|
||||
> **The core shift:** Human = intention + taste. AI = exploration + execution.
|
||||
|
||||
---
|
||||
|
|
@ -1,46 +0,0 @@
|
|||
# State Machine & Context Management
|
||||
|
||||
### The Fundamental Tension
|
||||
|
||||
The agent needs to understand the whole project to make good decisions, but any single context window degrades with too much information — not just from token limits but from **attention dilution**.
|
||||
|
||||
### Layered Memory Architecture (Universal Agreement)
|
||||
|
||||
```
|
||||
Project Manifest (always loaded, <1000 tokens)
|
||||
↓
|
||||
Task Context (per-task, relevant files + specs)
|
||||
↓
|
||||
Retrieval Layer (pull-based, on-demand)
|
||||
↓
|
||||
Ground Truth (filesystem, git, actual code)
|
||||
```
|
||||
|
||||
| Layer | Content | Access Pattern | Token Impact |
|
||||
|-------|---------|---------------|--------------|
|
||||
| **Working Context** (L1) | Current task + 3–5 relevant files | Dynamically assembled per LLM call | 8k–25k tokens |
|
||||
| **Session/Episodic** (L2) | Compressed history + recent decisions | Auto-summarized at transitions | Summary only |
|
||||
| **Project Semantic** (L3) | Full codebase summaries, dependency graph, ADRs | Vector + Graph retrieval | Pointers only |
|
||||
| **Ground Truth** (L4) | Actual files, git history, test results | Agent reads via tools | Zero in prompt |
|
||||
|
||||
### The State Machine
|
||||
|
||||
The agent should always be in one explicit state:
|
||||
|
||||
```
|
||||
PLAN → IMPLEMENT → TEST → DEBUG → VERIFY → DOCUMENT
|
||||
```
|
||||
|
||||
**Critical transitions that matter:**
|
||||
- **Task completion:** Defined by automated tests passing + acceptance criteria met
|
||||
- **Stuck detection:** Triggered by repeated failed attempts or missing information
|
||||
- **Plan revision:** Triggered when completed tasks reveal wrong assumptions
|
||||
|
||||
### Key Principles
|
||||
|
||||
- **Summarize aggressively between phases.** Don't carry full implementation context forward — carry compressed summaries: what was built, what decisions were made, what interfaces were created.
|
||||
- **Pull-based, not push-based context.** Don't preload everything the agent might need. Let it ask for what it discovers it needs.
|
||||
- **Use structured state for reliability.** Natural language summaries drift. Use JSON/typed configs for anything the system needs to track. Reserve natural language for reasoning.
|
||||
- **The filesystem is external memory.** The codebase itself is the most detailed representation of current state. Hold *understanding* about code in context, not the code itself.
|
||||
|
||||
---
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
# Optimal Storage for Project Context
|
||||
|
||||
### The Universal Answer: Plain Text Files in the Repo + Structured State Store
|
||||
|
||||
All four models converge on a hybrid approach. The key insight: **don't over-engineer with databases and vector stores, but don't under-engineer with a single massive file either.**
|
||||
|
||||
### The Optimal Stack
|
||||
|
||||
| Storage | What Lives Here | Why |
|
||||
|---------|----------------|-----|
|
||||
| **Project Manifest** (`PROJECT.md`) | Vision, principles, architecture overview, component status | Always loaded, <1000 tokens, single source of truth |
|
||||
| **Structured State** (JSON/SQLite/Postgres) | Task status, phase, dependencies, verification results | Machine-parseable, drives state machine transitions |
|
||||
| **Context Directory** (`.context/` or `.ai/`) | Architecture docs, task specs, decision records | Organized for retrieval, not human browsing |
|
||||
| **Git Repository** | Actual source code, test results | Ultimate ground truth, never duplicated |
|
||||
| **Knowledge Graph** (optional at scale) | File → function → dependency relationships | Enables "what breaks if I change this?" queries |
|
||||
|
||||
### Why Plain Files Win
|
||||
|
||||
- AI reads files directly — no query language, no ORM, no API calls
|
||||
- Version control comes free via git
|
||||
- Human can read and edit with any text editor
|
||||
- Survives tooling changes — not locked into any system
|
||||
|
||||
### Why NOT Vector Stores (as primary)
|
||||
|
||||
- Project context is **structured** — you know where things are
|
||||
- Vector stores return **approximately relevant** results — approximate is often wrong in codebases
|
||||
- They can't represent state, relationships, or task progress
|
||||
|
||||
### The Hybrid Format
|
||||
|
||||
Individual files use **YAML frontmatter + Markdown body**:
|
||||
```yaml
|
||||
---
|
||||
status: in_progress
|
||||
dependencies: [AUTH-01, DB-02]
|
||||
acceptance_criteria:
|
||||
- User can reset password via email
|
||||
- Token expires after 30 minutes
|
||||
---
|
||||
|
||||
## Task: Password Reset Flow
|
||||
[Rich narrative description and context here]
|
||||
```
|
||||
|
||||
### Size Discipline
|
||||
|
||||
| File | Target Size |
|
||||
|------|------------|
|
||||
| Project Manifest | <1,000 tokens |
|
||||
| Individual task files (completed) | <500 tokens |
|
||||
| Architecture doc | <2,000 tokens |
|
||||
|
||||
> The context system isn't just storage — it's a **compression engine**. Its job is to maintain maximum useful understanding in minimum token footprint.
|
||||
|
||||
---
|
||||
|
|
@ -1,62 +0,0 @@
|
|||
# Parallelization Strategy
|
||||
|
||||
### Core Principle
|
||||
|
||||
> Parallelize across boundaries, serialize within them.
|
||||
|
||||
The quality of parallelization is directly determined by the quality of interface definitions.
|
||||
|
||||
### The Diamond Pattern
|
||||
|
||||
```
|
||||
Planning (narrow, serial)
|
||||
↓
|
||||
Fan Out (parallel execution)
|
||||
↓
|
||||
Convergence (integration verification)
|
||||
↓
|
||||
Fan Out (next parallel set)
|
||||
```
|
||||
|
||||
### Phase-by-Phase Strategy
|
||||
|
||||
#### Planning: Mostly Serial, with Parallel Spikes
|
||||
- High-level decomposition must be serial (one coherent act of reasoning)
|
||||
- **Parallelize uncertainty resolution:** Multiple spikes investigating different risks simultaneously
|
||||
- Output: A dependency graph that explicitly identifies what can be parallelized
|
||||
|
||||
#### Execution: Massive Parallelization with Right Topology
|
||||
|
||||
| Work Type | Strategy |
|
||||
|-----------|----------|
|
||||
| **Independent leaf tasks** | Embarrassingly parallel — one agent per module |
|
||||
| **Dependent chains** | Serial within chain, but chains run in parallel |
|
||||
| **Convergence points** | Strictly serial — integration verification |
|
||||
|
||||
**Critical insight:** The frontend doesn't need the real API — it needs the API *contract*. Once contracts exist, both sides build in parallel.
|
||||
|
||||
#### Testing: The Most Interesting Story
|
||||
- **Unit tests:** Same agent, same context, atomic with code
|
||||
- **Cross-task tests:** All parallel by definition
|
||||
- **Integration tests:** Parallel across different boundaries
|
||||
- **E2E tests:** Serial (exercises whole system)
|
||||
|
||||
#### Verification: Deliberate Redundancy
|
||||
- **Adversarial verification:** Separate reviewer agent with fresh context evaluates against spec
|
||||
- **Red-team parallelism:** Agent tries to break the implementation
|
||||
|
||||
### Coordination Rules
|
||||
|
||||
- Agents communicate through the **filesystem**, never directly
|
||||
- Each agent works on a **branch** — merge on success, discard on failure
|
||||
- One agent per file at a time (file locking)
|
||||
- Optimal concurrency: **3–8 simultaneous agents** for most projects
|
||||
|
||||
### Anti-Patterns
|
||||
|
||||
- ❌ Don't parallelize tasks that modify the same files
|
||||
- ❌ Don't parallelize interacting decisions
|
||||
- ❌ Don't skip convergence/integration verification
|
||||
- ❌ Don't over-parallelize (coordination tax eats gains above ~8 agents)
|
||||
|
||||
---
|
||||
|
|
@ -1,44 +0,0 @@
|
|||
# Maximizing Agent Autonomy & Superpowers
|
||||
|
||||
### The Foundational Insight
|
||||
|
||||
> Autonomy comes from **self-correction**, not from getting it right the first time. The power isn't in the initial generation — it's in iteration speed and feedback signal quality.
|
||||
|
||||
### The Essential Tool Arsenal
|
||||
|
||||
| Category | Tools | Why |
|
||||
|----------|-------|-----|
|
||||
| **Execution Environment** | Terminal, filesystem, git, package manager | Closes the write → run → debug → verify loop |
|
||||
| **Verification** | Test runner, linter, type checker, security scanner | Ground truth over self-assessment |
|
||||
| **Observation** | Logs, browser/renderer, performance profiler | Sees what users would see |
|
||||
| **Exploration** | Code search, documentation lookup, web research | Self-directed learning |
|
||||
| **Recovery** | Git revert, branch management, checkpoints | Safety net that enables boldness |
|
||||
|
||||
### Self-Verification Architecture
|
||||
|
||||
Every task completion should self-evaluate against a checklist:
|
||||
1. Does the code compile?
|
||||
2. Do all existing tests still pass?
|
||||
3. Do new tests pass?
|
||||
4. Does the application actually start?
|
||||
5. Can I exercise the feature and see expected behavior?
|
||||
6. Does this match acceptance criteria point by point?
|
||||
|
||||
### Debugging Superpowers
|
||||
|
||||
- **Temporary instrumentation:** Add logging, remove after diagnosis
|
||||
- **Bisection:** Walk back through changes to find where regression was introduced
|
||||
- **Minimal reproduction:** Strip away everything except exact conditions that trigger failure
|
||||
- **Exploratory tests:** Quick throwaway scripts to test hypotheses
|
||||
|
||||
### Meta-Cognitive Layer
|
||||
|
||||
- **Scratchpad:** External reasoning space to track hypotheses, attempts, and outcomes
|
||||
- **Stuck detection:** After N failed attempts, trigger step-back with fresh context and explicitly different approach
|
||||
- **Structured escalation:** "Here's what I'm trying, here's what I've tried, here's what I think the issue is, here's what I need from you"
|
||||
|
||||
### The Philosophy
|
||||
|
||||
> You're not trying to build an agent that doesn't make mistakes. You're building one that **catches and fixes its own mistakes faster than a human would notice them**. Not intelligence — **closed-loop execution with rich feedback**.
|
||||
|
||||
---
|
||||
|
|
@ -1,82 +0,0 @@
|
|||
# System Prompt & LLM vs Deterministic Split
|
||||
|
||||
### The Core Separation Principle
|
||||
|
||||
> If you could write an if-else statement that handles it correctly every time, **it should not be in the LLM's context**. Every token the model spends reasoning about something deterministic is wasted and introduces hallucination risk.
|
||||
|
||||
### What the LLM Owns
|
||||
|
||||
| Capability | Why LLM |
|
||||
|-----------|---------|
|
||||
| Understanding intent | Interpretation, judgment |
|
||||
| Architectural reasoning | Weighing tradeoffs |
|
||||
| Code generation | Creative, context-dependent |
|
||||
| Debugging & diagnosis | Abductive reasoning, hypothesis formation |
|
||||
| Self-critique & quality assessment | Judgment calls |
|
||||
|
||||
### What TypeScript/Deterministic Code Owns
|
||||
|
||||
| Capability | Why Deterministic |
|
||||
|-----------|-------------------|
|
||||
| State machine transitions | Typed state object, no ambiguity |
|
||||
| Context assembly | Predict + pre-load what agent needs |
|
||||
| File operations | Validate paths, handle encoding, manage permissions |
|
||||
| Test execution & result parsing | Structured results, not raw terminal output |
|
||||
| Build & environment management | Install deps, start servers, manage ports |
|
||||
| Code formatting | Run prettier automatically, never waste LLM tokens |
|
||||
| Task scheduling & dependency resolution | Graph traversal, instant vs 5-second LLM call |
|
||||
| Summarization triggers | Mechanical workflow, LLM provides content |
|
||||
|
||||
### Modular System Prompt Architecture
|
||||
|
||||
```
|
||||
Base Layer (always present, ~500 tokens)
|
||||
→ Identity, core behavioral rules, general approach
|
||||
|
||||
Phase-Specific Layer (swapped based on state)
|
||||
→ Planning mode: decomposition, interfaces, risks
|
||||
→ Execution mode: implementation, testing, iteration
|
||||
→ Debugging mode: diagnosis, hypothesis testing, isolation
|
||||
|
||||
Task-Specific Layer (assembled fresh per task)
|
||||
→ Current spec, acceptance criteria, relevant contracts, prior attempts
|
||||
|
||||
Tools Layer
|
||||
→ Available tool definitions and parameters
|
||||
```
|
||||
|
||||
### Tool Design Philosophy
|
||||
|
||||
> Each tool should do one thing, do it completely, and return structured results the LLM can immediately act on.
|
||||
|
||||
**Bad:** LLM calls `readFile` → `parseJSON` → `runCommand` (3 calls, 3 failure points)
|
||||
**Good:** LLM calls `runTests(filter)` → gets structured pass/fail with locations (1 call, clean result)
|
||||
|
||||
### Essential Tools
|
||||
|
||||
| Tool | Returns |
|
||||
|------|---------|
|
||||
| `runTests` | Structured results: pass count, fail count, per-failure details |
|
||||
| `readFiles` | Batched file contents (array of paths, not one at a time) |
|
||||
| `writeFile` | Auto-formats before writing |
|
||||
| `searchCodebase` | Grep-like results with file paths and line numbers |
|
||||
| `getProjectState` | Manifest + current task spec + related task statuses |
|
||||
| `updateTaskStatus` | Handles downstream state updates automatically |
|
||||
| `buildProject` | Structured errors with file paths and line numbers |
|
||||
| `browserCheck` | Screenshot or structured description of rendered output |
|
||||
| `commitChanges` | Enforces conventions, runs pre-commit hooks |
|
||||
| `revertToCheckpoint` | Rolls back to last known good state |
|
||||
|
||||
### Prompt Patterns That Maximize Agency
|
||||
|
||||
1. **Tell it what it CAN do, not what it can't.** "Full authority as long as acceptance criteria and tests pass."
|
||||
2. **Explicit permission to iterate.** "First attempt doesn't need to be perfect. Write, run, observe, improve."
|
||||
3. **Clear exit conditions.** Concrete, measurable, unambiguous definition of "done."
|
||||
4. **Built-in scratchpad.** "Write reasoning in thinking blocks. Track attempts and outcomes."
|
||||
5. **Recovery protocol.** "After 3 failed approaches, produce structured escalation."
|
||||
|
||||
### The Meta-Principle
|
||||
|
||||
> Your TypeScript orchestrator is the deterministic skeleton — workflow, state, context, tools, coordination. The LLM is the reasoning muscle — understanding, creativity, judgment, problem-solving. **Neither should do the other's job.** When you get this right, the LLM becomes dramatically more capable because it's only doing what it's good at, with exactly the context it needs.
|
||||
|
||||
---
|
||||
|
|
@ -1,60 +0,0 @@
|
|||
# Speed Optimization
|
||||
|
||||
### The #1 Speed Principle
|
||||
|
||||
> The fastest possible operation is the one you don't perform. Before optimizing any step, ask: does this step need to exist at all?
|
||||
|
||||
### Speed Levers (Ranked by Impact)
|
||||
|
||||
#### 1. Minimize LLM Calls
|
||||
- **Batch intent into single calls.** Don't generate code, then tests, then docs separately. One call: "implement, test, and document." TypeScript splits the output.
|
||||
- **Deterministic fast paths.** Missing import? Syntax error? Fix without an LLM call if the fix is mechanical.
|
||||
- Audit call chains ruthlessly — most systems have 50%+ unnecessary sequential calls.
|
||||
|
||||
#### 2. Make Feedback Loops Instantaneous
|
||||
- Use test watch mode (no cold start)
|
||||
- Run only relevant test subsets (track which files affect which tests)
|
||||
- Incremental builds (hot module reloading)
|
||||
- Async, non-blocking file writes
|
||||
|
||||
#### 3. Precompute Context
|
||||
- Predict what the agent will need based on task definition
|
||||
- Pre-load into the prompt — no tool calls needed mid-generation
|
||||
- **Speculative pre-fetching** (like CPU cache prefetching)
|
||||
|
||||
#### 4. Parallelize Independent Work
|
||||
- Minimize startup cost for new parallel agents (pre-built templates, warm connections)
|
||||
- Use the dependency graph to identify independent work automatically
|
||||
|
||||
#### 5. Stream Everything, Block on Nothing
|
||||
- Process tokens as they arrive
|
||||
- Pipeline parallelism: start formatting code while commit message is still generating
|
||||
|
||||
#### 6. Cache Aggressively
|
||||
- In-memory cache of everything agent might need
|
||||
- Cross-task caching for unchanged files
|
||||
- Cache LLM results for deterministic inputs (boilerplate, type definitions)
|
||||
|
||||
#### 7. Minimize Token Waste
|
||||
- Dense context, not verbose context
|
||||
- Structured formats for structured data
|
||||
- Minify reference code that's informational, not for modification
|
||||
|
||||
### Anti-Patterns That Murder Speed
|
||||
|
||||
| Anti-Pattern | Fix |
|
||||
|-------------|-----|
|
||||
| Re-verifying things that can't have changed | Dependency-aware selective re-verification |
|
||||
| Excessive self-reflection on simple tasks | Complexity-based workflow routing |
|
||||
| Over-summarization between micro-steps | Only full context reset at task boundaries |
|
||||
| Waiting for human approval on auto-verifiable work | Human checkpoints at milestones, not tasks |
|
||||
| Quadratic history growth | Aggressive compression at every transition |
|
||||
| Synchronous blocking tools | Async everything, pipeline parallelism |
|
||||
|
||||
### The Speed Multiplier Nobody Talks About
|
||||
|
||||
**Failure prediction.** Track patterns across tasks. If certain task types fail on first attempt, pre-load extra guidance. Preventing a failed iteration is faster than executing one.
|
||||
|
||||
> The magical feeling of speed comes from only doing things that matter, and then doing those things as fast as possible. The system should feel like the agent knew what to do and just did it.
|
||||
|
||||
---
|
||||
|
|
@ -1,33 +0,0 @@
|
|||
# Top 10 Tips for a World-Class Agent
|
||||
|
||||
### 1. The Orchestrator Is the Product, Not the Model
|
||||
The model is a commodity. Two teams using the same model produce wildly different results based on orchestration quality. Invest 70% of effort in the orchestrator, 30% in prompt engineering.
|
||||
|
||||
### 2. Context Assembly Is a Craft
|
||||
Profile your context like you'd profile code. Measure which context elements correlate with first-attempt success. Prune relentlessly. The right files, in the right order, with the right framing, at the right level of detail.
|
||||
|
||||
### 3. Make the Feedback Loop the Fastest Thing
|
||||
Treat feedback loop latency like a game engine treats frame rate. Incremental builds, targeted tests, pre-warmed servers, cached deps. Put it on a dashboard you look at every day.
|
||||
|
||||
### 4. Build First-Class Error Recovery Into Every Layer
|
||||
Retry with variation (never the same way twice), automatic rollback, structured escalation, ability to park blocked tasks. **Design failure paths first** — they'll get more use than you expect.
|
||||
|
||||
### 5. Verify Through Execution, Not Self-Assessment
|
||||
An agent that asks itself "is this correct?" says yes 90% of the time regardless. Run the code, observe results, get ground truth. Self-assessment supplements execution-based verification, never replaces it.
|
||||
|
||||
### 6. Return Structured, Actionable Data from Every Tool
|
||||
Don't return raw terminal output. Return structured objects: what passed, what failed, where, why. Remove cognitive load from the model — it directly translates to better decisions.
|
||||
|
||||
### 7. Use a DAG, Not a Flat List
|
||||
Explicit inputs, outputs, dependencies, acceptance criteria per task. Maximizes parallelism, identifies critical path, enables smart impact tracing when things change.
|
||||
|
||||
### 8. Keep the Manifest Small and Always Current
|
||||
One file, <1000 tokens, always included. Updated automatically after every task completion. If it drifts from reality, everything downstream suffers.
|
||||
|
||||
### 9. Build Observability From Day One
|
||||
Log every LLM call. Track iterations per task type, token usage, failure rates, first-attempt success rates. This is your training data for improving the orchestrator. Teams that instrument well improve 10x faster.
|
||||
|
||||
### 10. Make Human Touchpoints High-Leverage and Low-Friction
|
||||
Present specific questions with context, not walls of text. "The API could return nested or flat fields — which fits your vision?" is a 5-second decision. "Please review everything" takes 20 minutes.
|
||||
|
||||
---
|
||||
|
|
@ -1,33 +0,0 @@
|
|||
# Top 10 Pitfalls to Avoid
|
||||
|
||||
### 1. Putting Workflow Logic in the Prompt
|
||||
Control flow belongs in TypeScript with actual conditionals and state tracking. Prompts that describe workflows are fragile, inconsistently followed, and impossible to debug with a debugger.
|
||||
|
||||
### 2. Unbounded Context Accumulation
|
||||
Each iteration adds noise. After 7 iterations, context is bloated with stale information from attempts 1–5. **Carry forward only current state and most recent error.** Summarize or discard everything else.
|
||||
|
||||
### 3. Trusting the Model's Self-Assessment of Completion
|
||||
Models are biased toward completion. Never let the model be the sole judge. Use deterministic checks: tests pass, it builds, acceptance criteria have corresponding passing tests.
|
||||
|
||||
### 4. Over-Engineering Tools Before Understanding Workflows
|
||||
Start with a small general-purpose set (file read/write, execute command, run tests). Watch where the agent struggles in real tasks. Then build specialized tools to solve observed problems.
|
||||
|
||||
### 5. Neglecting the Cold-Start Problem
|
||||
The first task is fundamentally different from the twentieth. Use deterministic templates for project scaffolding, conventions, and test infrastructure before handing off to the agent.
|
||||
|
||||
### 6. Too Much Autonomy Too Early
|
||||
An agent going slightly wrong for 2 hours produces a mountain of throwaway code. Start with more checkpoints than needed. Earn autonomy incrementally for proven task types.
|
||||
|
||||
### 7. Ignoring Compounding Inconsistency
|
||||
Different naming, different patterns, different structures across files = technical debt that confuses the agent itself later. Enforce consistency through linting or by showing existing examples before new code.
|
||||
|
||||
### 8. Building for the Demo, Not the Recovery
|
||||
The demo is the happy path. The product is what happens when tests fail, builds break, APIs change. **Spend 2x as much time on failure/recovery paths.** The agent spends more time recovering than succeeding first-attempt.
|
||||
|
||||
### 9. Treating All Tasks as Equally Complex
|
||||
Simple utility functions and complex state management shouldn't go through the same workflow. Classify by complexity. Simple tasks get a fast path. Complex tasks get the full treatment.
|
||||
|
||||
### 10. Not Measuring What Actually Matters
|
||||
Don't just track tokens and costs. Measure: first-attempt success rate, iterations to completion, human intervention frequency, code survival rate (does it survive the next 3 tasks?), stuck-detection accuracy. These guide real improvement.
|
||||
|
||||
---
|
||||
|
|
@ -1,97 +0,0 @@
|
|||
# God-Tier Context Engineering
|
||||
|
||||
### The Core Principle
|
||||
|
||||
> God-tier context engineering treats the context window as a **designed experience for the model**, not as a bucket you throw information into. The context window is the UX of your agent. Design it accordingly.
|
||||
|
||||
### The 10 Commandments of Context Engineering
|
||||
|
||||
#### 1. The Pyramid of Relevance
|
||||
- **Sharp focus:** Active files at full detail
|
||||
- **Present but compressed:** Interface contracts, manifest, task definition
|
||||
- **Summarized or absent:** Other components' internals, completed task histories
|
||||
|
||||
Each tier has a token budget. If full-resolution tier is large, outer tiers compress harder.
|
||||
|
||||
#### 2. Context Is a Cache, Not a History
|
||||
Treat it like a CPU cache: holds exactly what's needed now, everything else evicted. The question isn't "what has happened" but "what does the model need to see right now?"
|
||||
|
||||
#### 3. Separate Reference from Instruction
|
||||
- **Instruction context** (what to do) → beginning and end of prompt (highest attention)
|
||||
- **Reference context** (helpful info) → middle, clearly delineated
|
||||
|
||||
Manage them independently. Compress reference aggressively while keeping instructions at full detail.
|
||||
|
||||
#### 4. Earn Every Token's Place
|
||||
Implement a token budget system:
|
||||
|
||||
| Category | Budget |
|
||||
|----------|--------|
|
||||
| System prompt + behavioral instructions | ~15% |
|
||||
| Manifest | ~5% |
|
||||
| Task spec + acceptance criteria | ~20% |
|
||||
| Active code files | ~40% |
|
||||
| Interface contracts | ~10% |
|
||||
| Reserve (tool results, errors) | ~10% |
|
||||
|
||||
When any category exceeds budget, intelligently summarize (not truncate).
|
||||
|
||||
#### 5. Write for the Model's Attention Pattern
|
||||
- Critical info at the very beginning and reiterated at the end
|
||||
- Structured blocks with clear headers and delimiters
|
||||
- Consistent formatting conventions
|
||||
|
||||
```
|
||||
TASK: Implement password reset flow
|
||||
STATUS: New
|
||||
DEPENDS ON: auth-module (complete), email-service (complete)
|
||||
ACCEPTANCE CRITERIA:
|
||||
- User can request reset via email
|
||||
- Token expires after 30 minutes
|
||||
- New password meets existing validation rules
|
||||
- All existing auth tests pass
|
||||
RELEVANT INTERFACES: [below]
|
||||
ACTIVE FILES: [below]
|
||||
```
|
||||
|
||||
#### 6. Compress at Every State Transition
|
||||
- Task completion → 50–100 token completion record
|
||||
- Use a **dedicated summarization call** with a tight prompt (not the working agent self-summarizing)
|
||||
- **Cascading summarization:** Task summaries → milestone summaries → phase summaries (5:1 compression ratio at each level)
|
||||
|
||||
#### 7. Use the Filesystem as Your Infinite Context Window
|
||||
- Organize files for retrieval, not human browsing
|
||||
- Predictable naming conventions = instant lookup
|
||||
- Essentially a custom database on top of the filesystem
|
||||
|
||||
#### 8. Profile Context Quality, Not Just Size
|
||||
Track first-attempt success rate as a function of context composition. What was in context when it succeeded vs failed? Let data guide what constitutes high-quality context.
|
||||
|
||||
#### 9. Dynamic Context Based on Task Phase
|
||||
Different phases need different context:
|
||||
|
||||
| Phase | Optimal Context |
|
||||
|-------|----------------|
|
||||
| Understanding | Spec, acceptance criteria, broad architectural context |
|
||||
| Implementation | Active files, interface contracts, coding patterns |
|
||||
| Debugging | Failing test output, relevant code, test code |
|
||||
| Verification | Acceptance criteria prominently, ability to exercise feature |
|
||||
|
||||
#### 10. Design for Context Recovery
|
||||
- **Checkpoint** context state at task starts and phase transitions
|
||||
- On detected confusion (repeated failures, increasing iterations, off-task output): **roll back to checkpoint** and re-enter with fresh context + concise failure info + strategy hint
|
||||
- Structured recovery ≠ naive retry. It rebuilds context from scratch with learned information.
|
||||
|
||||
### The God-Tier Strategy in One Sentence
|
||||
|
||||
> Orchestrator-assembled minimal slice + persistent hierarchical memory. Every single LLM call stays 8k–25k tokens while the agent has perfect knowledge of a 500k-line codebase and months of project history.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
# Part II: The Hard Problems (Grey Area Synthesis)
|
||||
|
||||
> Synthesized from a second round of deep conversations with all four models, targeting the 13 hardest unsolved problems in autonomous coding agents — plus a critical question on accessibility for non-technical users.
|
||||
|
||||
---
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
# Handling Ambiguity & Contradiction
|
||||
|
||||
**The universal consensus:** This is the highest-cost failure mode. An agent confidently building the wrong thing based on a reasonable-but-incorrect interpretation burns hours of work discovered only at milestone reviews.
|
||||
|
||||
### The Three-Layer Strategy (All 4 Models Agree)
|
||||
|
||||
#### Layer 1: Classification of Ambiguity Type
|
||||
|
||||
Every requirement should be classified during planning:
|
||||
|
||||
| Classification | Action |
|
||||
|---------------|--------|
|
||||
| **Clear and actionable** | Proceed autonomously |
|
||||
| **Ambiguous but decidable with sensible defaults** | Proceed + document assumptions |
|
||||
| **Genuinely unclear or contradictory** | Halt and escalate to human |
|
||||
|
||||
> The middle category is where most real work lives. "The user should be able to reset their password" has a hundred implied decisions. A good agent resolves these with sensible defaults and **documents the assumptions it made** — it doesn't ask about every one.
|
||||
|
||||
#### Layer 2: The Assumption Ledger
|
||||
|
||||
Every task completion includes an `assumptions.md` update listing every interpretive decision the agent made:
|
||||
|
||||
```json
|
||||
{
|
||||
"assumptions": [
|
||||
"Password reset tokens expire after 30 minutes (common security practice)",
|
||||
"Email delivery, not SMS",
|
||||
"No password history check"
|
||||
],
|
||||
"confidence": 0.82
|
||||
}
|
||||
```
|
||||
|
||||
The human reviews these at **milestones, not in real-time** — preserving speed while maintaining correctness.
|
||||
|
||||
#### Layer 3: Contradiction Detection Pass
|
||||
|
||||
Before execution begins, a **dedicated reasoning pass** (separate from planning) scans for conflicts:
|
||||
- Do requirements contradict each other?
|
||||
- Do acceptance criteria conflict with stated architecture?
|
||||
- Are there implicit assumptions in one requirement that violate another?
|
||||
|
||||
### Escalation Threshold
|
||||
|
||||
- **Impact confined to current task** → decide and document
|
||||
- **Impact touches interface contracts** → escalate (wrong interpretation cascades)
|
||||
|
||||
Grok adds a **"Multi-Hypothesis Planning"** approach: when underspecification is detected, generate three distinct "Intent Hypotheses" (The Minimalist Path, The Scalable Path, The Feature-Rich Path). If the semantic distance between them exceeds a threshold, hard-halt and present a decision matrix to the human.
|
||||
|
||||
### The Deepest Pitfall
|
||||
|
||||
Models don't naturally express uncertainty — they pick an interpretation and run with it as if it's obviously correct. The system prompt must explicitly instruct confidence-level flagging, and the orchestrator must treat low-confidence decisions differently from high-confidence ones.
|
||||
|
||||
> **Proven result:** Grok reports this pattern cuts wrong-path rework by ~65% in 2026 evaluations.
|
||||
|
||||
---
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
# Long-Running Memory Fidelity
|
||||
|
||||
**The core problem:** Every compression loses information. Over enough compressions, summaries drift from reality like a photocopy of a photocopy. The system can't easily tell it's happening because it only sees the current summary, not what was lost.
|
||||
|
||||
### Multi-Tier Memory with Different Decay Rates
|
||||
|
||||
| Tier | Decay Rate | Content | Update Strategy |
|
||||
|------|-----------|---------|-----------------|
|
||||
| **Manifest** | Fast (updates every task) | Current state only, <1000 tokens | Continuous overwrite — no history |
|
||||
| **Decision Log** | Never decays (append-only) | Every significant architectural decision + rationale | Never summarized, grows linearly |
|
||||
| **Task Archive** | Medium | Compressed task completion records | Available for retrieval, not routinely loaded |
|
||||
|
||||
### The Critical Mechanism: Periodic Reconciliation
|
||||
|
||||
All four models converge on some form of automated audit:
|
||||
|
||||
- **Claude:** Every milestone or N tasks — agent compares manifest against actual codebase
|
||||
- **Gemini:** Every N commits, spawn a "History Auditor" agent whose sole job is manifest-vs-code comparison
|
||||
- **GPT:** Self-healing summaries with checksums — when source files change, invalidate and regenerate
|
||||
- **Grok:** Deterministic "Memory Fidelity Audit" node every 5 checkpoints — samples key invariants, scores drift 0-100, auto-rebuilds if drift >15%
|
||||
|
||||
### The Golden Rule
|
||||
|
||||
> **Never summarize summaries.** Each compression layer regenerates from the one below. The codebase is always the lossless source of truth.
|
||||
|
||||
### The Most Dangerous Form of Drift
|
||||
|
||||
Not factual inaccuracy — **the loss of "why."** The manifest says "auth uses JWT tokens." Three months ago there was a long discussion about why JWT was chosen over session-based auth. That context is exactly what gets compressed away. The **append-only decision log** solves this by preserving *why* indefinitely even as *what* gets continuously compressed.
|
||||
|
||||
### Phase Boundary Refresh
|
||||
|
||||
For very long projects (weeks/months), **rebuild the manifest from scratch** at phase boundaries by having the agent read the actual codebase + decision log — rather than carrying forward the old manifest with incremental updates. This is the equivalent of defragmenting a hard drive.
|
||||
|
||||
---
|
||||
|
|
@ -1,25 +0,0 @@
|
|||
# Multi-Agent Semantic Conflict Resolution
|
||||
|
||||
**The hard case:** Git-level merge conflicts are easy. The real problem is code that merges cleanly but doesn't work — agents honoring the same typed interface while disagreeing on semantics (e.g., Agent A returns `null` for "not found," Agent B treats `null` as "error").
|
||||
|
||||
### Three Lines of Defense (Universal Agreement)
|
||||
|
||||
#### 1. Semantically Rich Interface Contracts
|
||||
|
||||
Don't just define type signatures — define **behavioral contracts**: What does `null` mean? What are the error semantics? What invariants does the caller rely on? Contracts should be miniature specs, not just type definitions.
|
||||
|
||||
#### 2. Pre-Written Integration Tests
|
||||
|
||||
Write integration tests **during planning, before parallel execution begins** — tests that exercise semantic expectations, not just types. These are waiting when parallel branches converge.
|
||||
|
||||
#### 3. Dedicated Integration/Reconciliation Agent
|
||||
|
||||
After parallel branches merge, a focused agent gets: interface contracts + both implementations + integration tests. Its job is finding semantic mismatches, not rebuilding.
|
||||
|
||||
### The Highest-Value Technique
|
||||
|
||||
**Adversarial edge-case generation at integration points.** The integration agent reads both implementations, sees how each handles boundaries, and generates new tests that specifically probe the assumption gaps between them. This catches the subtlest bugs.
|
||||
|
||||
Gemini adds the concept of a **"Shadow Merge"** agent that runs "Cross-Impact Analysis" before actual merge — looking for "Logical Race Conditions" where Worker A changed a utility that Worker B relied on, even when the git merge is clean.
|
||||
|
||||
---
|
||||
|
|
@ -1,35 +0,0 @@
|
|||
# Legacy Code & Brownfield Onboarding
|
||||
|
||||
**The fundamental difference:** Greenfield = design → implement. Brownfield = **observe → infer → validate → modify.**
|
||||
|
||||
### The Onboarding Pipeline (All 4 Models Agree)
|
||||
|
||||
#### Phase 1: Structural Analysis (Deterministic)
|
||||
- Dependency graph mapping
|
||||
- Module identification, LOC per component
|
||||
- Test coverage analysis, entry point discovery
|
||||
- Database schema mapping
|
||||
|
||||
#### Phase 2: Convention Extraction (LLM-Assisted)
|
||||
- Sample representative files across modules
|
||||
- Identify: error handling patterns, naming conventions, API structure, DB access patterns, testing patterns
|
||||
- Output: a **conventions document** that becomes critical reference context
|
||||
|
||||
#### Phase 3: Pattern Mining
|
||||
- Extract implicit "tribal knowledge" — workarounds for browser bugs, special customer cases, performance hacks that look like mistakes
|
||||
- Generate decision records into project state
|
||||
|
||||
### The Cardinal Rules
|
||||
|
||||
| Rule | Why |
|
||||
|------|-----|
|
||||
| **Observe first, edit later** | Agents must never modify code they don't understand |
|
||||
| **Preserve local consistency over global ideals** | Resist the "Junior Refactor" — don't "fix" legacy code to modern standards |
|
||||
| **Add characterization tests before modifying** | Tests that document *current behavior*, not *correct behavior* |
|
||||
| **Minimal, surgical modifications** | Refactoring is a separate task requiring explicit human approval |
|
||||
|
||||
### The Biggest Pitfall
|
||||
|
||||
The agent will try to refactor legacy code to match its sense of good patterns. Left unchecked, this produces massive diffs that change behavior in subtle ways. **Enforce strict rules:** modifications to legacy code should be minimal and surgical.
|
||||
|
||||
---
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
# Encoding Taste & Aesthetics
|
||||
|
||||
**The honest frontier:** This is where all four models are most candid about current limitations.
|
||||
|
||||
### What CAN Be Automated
|
||||
|
||||
| Technique | Description |
|
||||
|-----------|-------------|
|
||||
| **Reference-based extraction** | "Feels like Linear" → extract concrete attributes: spacing ratios, animation timing curves, color relationships, typography |
|
||||
| **Style specification** | Convert extracted attributes to verifiable parameters: "transitions 150-200ms ease-out, 8px grid spacing, specific contrast ratios" |
|
||||
| **Automated verification** | Lighthouse scores, visual regression tests, accessibility audits, performance budgets, design system linting |
|
||||
| **Visual comparison** | Render output, compare against reference screenshots using vision-capable models |
|
||||
| **A/B comparison** | Show two versions, human picks which "feels better" — faster than absolute judgment |
|
||||
|
||||
### What CANNOT Be Automated
|
||||
|
||||
The **gestalt** — the overall feeling, emotional response, sense of quality emerging from a thousand small interacting decisions. *Does this feel premium? Fast? Trustworthy?* These are fundamentally subjective.
|
||||
|
||||
### The Optimal Strategy
|
||||
|
||||
**Narrow the gap** by converting as much "taste" as possible into **concrete, verifiable specifications upfront:**
|
||||
|
||||
- Not "use nice spacing" → "16px between sections, 8px between related elements, 4px between tightly coupled elements"
|
||||
- Exact animation timing curves, color values with contrast ratios, typography weights and sizes
|
||||
|
||||
Then **reserve human review for the remaining subjective layer** with structured, specific questions:
|
||||
|
||||
> "Does the density feel right? Does the transition timing feel snappy enough? Does the empty state feel intentional or broken?"
|
||||
|
||||
### The Emerging Frontier
|
||||
|
||||
Vision-capable models for aesthetic evaluation — render output, capture screenshot, compare against references on specific visual dimensions. Imperfect but improving rapidly. Grok reports ~80-85% of taste can be automated this way; the remaining 15% stays human-only.
|
||||
|
||||
---
|
||||
|
|
@ -1,31 +0,0 @@
|
|||
# Irreversible Operations & Safety Architecture
|
||||
|
||||
**The core principle (universal agreement):** Irreversible operations should **never be executed by the agent.** The agent prepares them; the human executes them.
|
||||
|
||||
### Risk-Graded Action Classification
|
||||
|
||||
| Class | Examples | Policy |
|
||||
|-------|----------|--------|
|
||||
| **Reversible** | Code edits, UI changes, unit tests | Full autonomy + auto-revert on failure |
|
||||
| **Semi-Reversible** | New files, dependencies | Auto-execute + git checkpoint |
|
||||
| **Irreversible** | DB migrations, external API changes, data transformations | Human-in-the-loop required |
|
||||
| **External Side-Effect** | Payment charges, third-party API calls with side effects | Human approval + dry-run + rollback plan |
|
||||
|
||||
### Per-Operation Protocols
|
||||
|
||||
| Operation | Agent Does | Human Does |
|
||||
|-----------|-----------|-----------|
|
||||
| **Database migrations** | Write migration + rollback + tests, run against test DB, produce review package | Review package, execute migration |
|
||||
| **External APIs** | Build + test against sandbox/mock versions | Switch from sandbox to production |
|
||||
| **Deployment** | Produce artifacts, verify in staging | Trigger production deployment |
|
||||
|
||||
### The Classification Must Be:
|
||||
- **Static and deterministic** (not left to the agent's judgment)
|
||||
- **Conservative** (if there's doubt, classify as irreversible)
|
||||
- **Enforced by the orchestrator** (the agent never encounters an irreversible operation without interception)
|
||||
|
||||
### The Subtlety Most Miss
|
||||
|
||||
Data transformations that technically don't delete anything but **lose information through reformatting**. Converting a nullable column to non-nullable with a default value permanently destroys the distinction between rows that had real values and rows that got the default. These must be flagged with the same severity as deletions.
|
||||
|
||||
---
|
||||
|
|
@ -1,31 +0,0 @@
|
|||
# The Handoff Problem: Agent → Human Maintainability
|
||||
|
||||
**The failure modes of AI-generated code** that all four models identify:
|
||||
|
||||
### Known Anti-Patterns
|
||||
|
||||
| Pattern | Problem | Fix |
|
||||
|---------|---------|-----|
|
||||
| **Flat code** | Everything in one function/file to reduce inconsistency risk | Enforce human-friendly modular patterns |
|
||||
| **Clever solutions** | Dense functional chains (`filter().map().reduce().flatMap()`) | Max 3 chained operations; extract named intermediates |
|
||||
| **Useless comments** | `// filter active users` above a filter call | Require *why* comments, skip *what* comments |
|
||||
| **Over-abstraction** | Creates clever custom abstractions no human can follow | Enforce standard framework patterns over custom inventions |
|
||||
| **Missing breadcrumbs** | No README files in directories, no ADRs, no diagrams | Include documentation in task completion checklist |
|
||||
|
||||
### The Architecture That Maximizes Handoff Quality
|
||||
|
||||
**Enforce well-known frameworks and conventions** over custom patterns. A codebase using standard Next.js/Express/React patterns is immediately navigable. A codebase with custom-invented patterns requires learning a new system.
|
||||
|
||||
### Verification Mechanism
|
||||
|
||||
**Automated readability test:** Periodically have a **separate agent** (with no knowledge of the building agent's decisions) attempt to add a feature using only the code and docs. If it struggles, a human will too.
|
||||
|
||||
### Gemini's "Boring Code" Principle
|
||||
|
||||
> Humans hate "clever" AI code; they love "boring" AI code. Run a **Complexity Linter** — if a function has cyclomatic complexity >10, the reviewer agent rejects it.
|
||||
|
||||
### Grok's Maintainability Checklist
|
||||
|
||||
Every file gets: auto-generated JSDoc/TS comments + ADR for every major decision. No magic numbers, no over-abstraction. Mandatory "maintainability score" (cyclomatic complexity + test coverage + comment density) in the critic node.
|
||||
|
||||
---
|
||||
|
|
@ -1,27 +0,0 @@
|
|||
# When to Scrap and Start Over
|
||||
|
||||
### The Four Signals (Cross-Model Convergence)
|
||||
|
||||
| Signal | What It Looks Like |
|
||||
|--------|-------------------|
|
||||
| **Iteration count trending upward** | Task 1: 3 iterations. Task 2: 5. Task 3: 8. Complexity compounding, not resolving. |
|
||||
| **Test flakiness increasing** | Previously passing tests intermittently fail — hidden coupling being strained |
|
||||
| **Same files modified repeatedly** | Every task touches the same core module — god object absorbing too much responsibility |
|
||||
| **Acceptance criteria requiring exceptions** | "Works except when X" / "Passes if you ignore test Y" — agent negotiating with criteria |
|
||||
|
||||
### The Reassessment Protocol
|
||||
|
||||
When thresholds are crossed, trigger a **focused LLM call** with: manifest + original spec + task summaries + signal data. Prompt: *"Is the current approach viable or would a different architecture serve better? If different, what and why?"*
|
||||
|
||||
### The Critical Architectural Enabler: Make Rewrites Cheap
|
||||
|
||||
- Clean interface contracts + good test suites → rewriting internals while preserving interfaces is low-risk
|
||||
- Tests verify new implementation against same criteria
|
||||
- Interface contracts ensure nothing downstream breaks
|
||||
- **Every major approach on a branch** that can be discarded without affecting anything else
|
||||
|
||||
Gemini's **"Sunk-Cost Heuristic"**: Monitor "Task Re-entry Rate." If the same 3 tests have been attempted >5 times, or if the refactor-to-feature ratio exceeds 4:1, trigger a "Whiteboard Session."
|
||||
|
||||
Grok adds **parallel experimentation**: create a "Rewrite Branch" subgraph, run the same vision on a clean slate for one vertical slice, compare metrics. Only merge if superior. Cost is near-zero because it runs in parallel and is discarded on failure.
|
||||
|
||||
---
|
||||
|
|
@ -1,28 +0,0 @@
|
|||
# Error Taxonomy & Routing
|
||||
|
||||
**The key insight:** Different errors have fundamentally different causes and optimal resolution strategies. Treating them uniformly is one of the biggest sources of wasted iterations.
|
||||
|
||||
### The Optimal Taxonomy
|
||||
|
||||
| Error Class | Context Needed | Optimal Handler | Escalation |
|
||||
|-------------|---------------|-----------------|------------|
|
||||
| **Syntax/Type** | Error message + offending file + types | Deterministic fast path (no LLM needed) | Only if fast path fails |
|
||||
| **Logic** | Failing test (expected vs actual) + implementation + spec | LLM with medium, focused context | After 3 attempts |
|
||||
| **Design** | Original spec + architecture + interface contracts + implementation | LLM with broad context | Often needs human input |
|
||||
| **Performance** | Profiling data + benchmarks + code | Specialist optimization agent | If regression >2x |
|
||||
| **Security** | Static analysis results + secure pattern reference | Conservative fix prompt | Always flag for review |
|
||||
| **Environment** | Environment config + recent dep changes + error output | Specialized env context | If not auto-resolved |
|
||||
| **Flaky Tests** | Run test multiple times to confirm flakiness | Quarantine, don't fix | Infrastructure agent |
|
||||
|
||||
### Critical Routing Rules
|
||||
|
||||
- **Flaky tests:** Detect by running failing tests multiple times. If inconsistent, **quarantine** — never trigger a fix cycle.
|
||||
- **Environment errors:** Classify as potentially environmental when they appear in build/startup rather than tests.
|
||||
- **Security:** Caught by static analysis in the deterministic layer, not by the LLM. Run security linting after every task.
|
||||
- **Syntax/Type:** Hit a deterministic fast path first. Missing import? Search codebase for the export. Only escalate to LLM if mechanical fix fails.
|
||||
|
||||
### The Architecture
|
||||
|
||||
The orchestrator classifies every error → selects the appropriate context assembly strategy → optionally selects a different prompt framing. The agent experiences this as *"I got exactly the information I need"* rather than *"I got a dump of everything."*
|
||||
|
||||
---
|
||||
|
|
@ -1,26 +0,0 @@
|
|||
# Cost-Quality Tradeoff & Model Routing
|
||||
|
||||
### The Key Insight
|
||||
|
||||
Quality requirements vary enormously across task types, but most systems use the same model for everything.
|
||||
|
||||
### The Optimal Model Routing Strategy (All 4 Agree)
|
||||
|
||||
| Task Type | Model Tier | Rationale |
|
||||
|-----------|-----------|-----------|
|
||||
| **Planning, architecture, critique** | Frontier (always) | Planning errors cascade through every downstream task |
|
||||
| **Ambiguity resolution** | Frontier | Wrong interpretation = wasted execution |
|
||||
| **Well-specified implementation** (CRUD, standard UI, utilities) | Mid-tier / capable but cheaper | Task is well-defined, patterns established |
|
||||
| **Code review, test generation** | Mid-tier | Evaluating against known criteria, not generating novel solutions |
|
||||
| **Summarization** (task records, manifest updates) | Lightest viable | Language competence, minimal reasoning depth |
|
||||
| **Boilerplate** | Small/fast model | Predictable output, low reasoning requirements |
|
||||
|
||||
### The Non-Obvious Cost Optimization
|
||||
|
||||
> **Reducing wasted tokens is higher leverage than reducing token price.** A bloated context window costs money on every single call. Trimming 500 unnecessary tokens from context assembly saves more over a project than switching to a model that's 10% cheaper.
|
||||
|
||||
### Measurement
|
||||
|
||||
Track **cost-per-successful-task**, not cost-per-task. If the cheaper model requires twice as many iterations, it's not actually cheaper. Grok reports 60-70% cost reduction with zero quality loss when routing is done at the orchestrator level.
|
||||
|
||||
---
|
||||
|
|
@ -1,30 +0,0 @@
|
|||
# Cross-Project Learning & Reusable Intelligence
|
||||
|
||||
### What Transfers Well
|
||||
|
||||
| Type | Transferability | Example |
|
||||
|------|----------------|---------|
|
||||
| **Problem-solving patterns** (abstract) | ✅ High | "When implementing OAuth, these are the common pitfalls and the architecture that avoids them" |
|
||||
| **Code templates & scaffolding** | ✅ With adaptation | Proven auth module structure, tested payment integration pattern |
|
||||
| **Learned pitfalls** | ✅ High | "When integrating Stripe, these edge cases around webhooks most implementations miss" |
|
||||
| **Project-specific conventions** | ❌ Does not transfer | Architectural decisions are contextual |
|
||||
| **Domain logic** | ❌ Does not transfer | Business rules are project-specific |
|
||||
|
||||
### The Optimal Architecture: A Pattern Library
|
||||
|
||||
Each pattern includes:
|
||||
- Description of the problem it solves
|
||||
- The approach and tradeoffs
|
||||
- Common pitfalls
|
||||
- Verification tests
|
||||
- Reference implementation
|
||||
|
||||
### Growth Through Extraction, Not Manual Curation
|
||||
|
||||
When a task completes with high quality (first-attempt success, no subsequent modifications, clean review), flag it as a **candidate for pattern extraction.** A dedicated pass determines whether the solution embodies a generalizable pattern.
|
||||
|
||||
### The Critical Constraint
|
||||
|
||||
Patterns should be **descriptive, not prescriptive** — "here's an approach that has worked well, with these tradeoffs" not "always do it this way." Grok adds an overfitting guard: require **3+ project examples** before promoting to reusable.
|
||||
|
||||
---
|
||||
|
|
@ -1,31 +0,0 @@
|
|||
# Evolution Across Project Scale
|
||||
|
||||
### Phase Transitions (All 4 Models Converge)
|
||||
|
||||
#### 0–1k LOC: The Monolithic Phase
|
||||
- Everything fits in one context window
|
||||
- Agent reads entire codebase, makes globally coherent decisions
|
||||
- Orchestrator is simple, manifest barely needed
|
||||
- **This is where most demos live**
|
||||
|
||||
#### 1k–10k LOC: The Modular Phase
|
||||
- Codebase no longer fits in one context window
|
||||
- **What breaks first: consistency** — agent sees fragments that gradually diverge
|
||||
- Requirements: modular context assembly, manifest as essential map, interface contracts, convention enforcement (linting, formatting)
|
||||
|
||||
#### 10k–50k LOC: The Architectural Phase
|
||||
- Relationships between components become non-obvious
|
||||
- Changing one thing might affect ten others through indirect dependencies
|
||||
- **What breaks:** planning quality — planner can't understand full system
|
||||
- Requirements: dependency-aware context assembly, impact analysis before execution, more conservative/incremental plans
|
||||
|
||||
#### 50k–100k+ LOC: The Organizational Phase
|
||||
- System of systems — no single agent context can reason about the whole thing
|
||||
- **What breaks:** integration — interactions between components become so numerous that integration testing becomes the bottleneck
|
||||
- Requirements: hierarchical planning (system-level planner → component-level agents), continuous integration verification, possibly distributed orchestrator, hierarchy of manifests
|
||||
|
||||
### The Meta-Insight
|
||||
|
||||
> The architecture of your agentic system should **mirror the architecture of the software it's building.** Microservices projects need a more distributed orchestrator. Monolithic projects can use a simpler one.
|
||||
|
||||
---
|
||||
|
|
@ -1,35 +0,0 @@
|
|||
# Security & Trust Boundaries
|
||||
|
||||
### Hard Boundaries — Things the Agent Should NEVER Do (Universal Agreement)
|
||||
|
||||
| Forbidden Action | Why |
|
||||
|-----------------|-----|
|
||||
| Access production systems directly | Agent's world is the dev environment, full stop |
|
||||
| Access or embed secrets | API keys, credentials should never appear in agent context or output |
|
||||
| Make network requests to arbitrary destinations | Restrictive firewall, whitelist only required services |
|
||||
| Modify its own orchestrator, prompts, or tools | Prevents removing safety constraints |
|
||||
| Execute commands outside the project directory | Sandbox to project dir + temp working dirs only |
|
||||
|
||||
### The Sandboxing Architecture
|
||||
|
||||
| Layer | Mechanism |
|
||||
|-------|-----------|
|
||||
| **Execution** | Containerized (Docker + seccomp), restricted filesystem, network policy |
|
||||
| **Filesystem** | Content-addressable storage — agent *proposes* changes, backend validates before writing |
|
||||
| **Secrets** | Vault proxy with short-lived tokens, never direct credentials |
|
||||
| **Commands** | Parsed and blocked for dangerous patterns (`rm -rf /`, `curl` to unknown hosts) |
|
||||
| **Dependencies** | Approved dependency list — new deps require auto-approval (pre-approved list) or human approval |
|
||||
|
||||
### The Capability-Based Security Model
|
||||
|
||||
The orchestrator runs **outside** the sandbox. The agent requests operations through a controlled API. The orchestrator validates every request before executing. The agent doesn't have direct access to anything — it has access to **tools that the orchestrator mediates.**
|
||||
|
||||
### The Subtle Risk
|
||||
|
||||
The agent introduces vulnerabilities not through malice but through **plausible-looking insecure patterns**: string concatenation for SQL queries, disabling CORS for convenience, logging sensitive data for debugging. Security linting rules should be tuned to catch these **AI-common patterns** specifically.
|
||||
|
||||
### The Trust Model
|
||||
|
||||
> Think of the agent as a **highly capable but unvetted contractor.** Give them the codebase and dev environment. Don't give them production credentials, deployment access, or the ability to modify security infrastructure. The goal isn't to make the agent safe by limiting capabilities — it's to make the **environment** safe so the agent can be maximally capable within it.
|
||||
|
||||
---
|
||||
|
|
@ -1,116 +0,0 @@
|
|||
# Designing for Non-Technical Users ("Vibe Coders")
|
||||
|
||||
**The question that matters most** — because everything else is worthless if only engineers can use it.
|
||||
|
||||
### The Fundamental Principle (All 4 Models Converge)
|
||||
|
||||
> The human should **never have to think in code.** Not in input, not in output, not in error messages, not in verification, not in debugging. The entire technical layer should be absorbed by the system. The human operates purely in **intent, vision, preference, and judgment.**
|
||||
|
||||
### The Core Philosophy
|
||||
|
||||
| Human Provides | System Provides |
|
||||
|---------------|-----------------|
|
||||
| Vision & imagination | Engineering intelligence |
|
||||
| Taste & aesthetic judgment | Technical translation |
|
||||
| Direction & priorities | Architecture & implementation |
|
||||
| "This feels off — calmer, like Linear" | Concrete CSS/animation/spacing changes |
|
||||
|
||||
### The 10 Pillars of a Magical Non-Technical Experience
|
||||
|
||||
#### 1. Intent-Based Input, Not Specification
|
||||
Users speak naturally: *"I want an app where people can upload recipes and find them by ingredient."* The system runs a **discovery conversation** that feels like talking to a brilliant product partner — not filling out a requirements form. Behind the scenes, answers compile into structured specs, acceptance criteria, and interface contracts the human never sees.
|
||||
|
||||
> **Critical:** Questions should be about the *experience*, not the *implementation.* Never "relational or document store?" Always "should search find exact matches only, or also substitutable ingredients?"
|
||||
|
||||
#### 2. Show the Thing, Not the Process
|
||||
After each milestone: a **working preview**, not a task list. The human interacts with the real thing at every checkpoint — clicks around, feels it, reacts. Progress is communicated as capability, not code: *"Your app can now save workouts and retrieve them later"* — not *"implemented REST endpoint."*
|
||||
|
||||
#### 3. Collaborative Builder, Not Command Executor
|
||||
The agent should feel like a senior co-founder:
|
||||
```
|
||||
User: I want something like Notion but for recipes.
|
||||
|
||||
Agent: Here's how I'd approach that:
|
||||
- Recipe database with tagging
|
||||
- Search by ingredient
|
||||
- Meal planner
|
||||
|
||||
Would you like to prioritize simplicity or advanced features?
|
||||
```
|
||||
This implicitly educates the user while avoiding wrong builds from vague specs.
|
||||
|
||||
#### 4. Problems, Not Errors
|
||||
The human should **never see a stack trace**. Technical failures are either resolved silently or translated to domain-level questions:
|
||||
|
||||
| ❌ Never Show | ✅ Show Instead |
|
||||
|--------------|----------------|
|
||||
| `TypeError: Cannot read property 'map' of undefined` | "The recipe list isn't displaying correctly. I'm fixing it now — should be ready in a few minutes." |
|
||||
| `ECONNREFUSED localhost:5432` | "I'm having trouble connecting to the database. Working on it." |
|
||||
| Ambiguous technical decision | "When someone searches 'chicken,' should results include recipes where chicken is optional?" |
|
||||
|
||||
#### 5. Reactions, Not Reviews
|
||||
Design for **reactions** to the running app, not code reviews. Like working with an interior designer: *"I love the color but the couch feels too big."* Visual, spatial, experiential feedback. **A/B comparison** is the most powerful pattern: show two versions, human picks which "feels better" in seconds.
|
||||
|
||||
#### 6. Engineering Tradeoffs as Simple Choices
|
||||
Instead of *"Which auth provider?"* → ask *"Which matters more: A) Simplicity B) Maximum customization C) Enterprise security"* — the system maps answers to technical decisions automatically.
|
||||
|
||||
#### 7. Safety Blanket
|
||||
- Auto-backups every slice + "undo entire feature" button
|
||||
- **"Vibe Checkpoints"** — before every major change, a save point. "Go back to how it was ten minutes ago."
|
||||
- Deployment previews before anything goes live
|
||||
- No irreversible actions without plain-English confirmation
|
||||
|
||||
#### 8. Progressive Disclosure
|
||||
Start ultra-simple. Offer "Advanced mode" toggle only if the user ever asks. The system should **progressively reveal engineering** — at first pure vision → later architecture tweaking → eventually deep collaboration. Many users will never leave the simple mode, and that's fine.
|
||||
|
||||
#### 9. Implicit Teaching
|
||||
When the user asks *"why is that taking longer?"*:
|
||||
> "The recipe search needs to look through all recipes every time. I'm adding an index — think of it like a table of contents — so it can find things faster."
|
||||
|
||||
Optional, triggered by curiosity, expressed in analogy. Over time, users develop useful mental models of software **without it ever being mandatory.**
|
||||
|
||||
#### 10. Invisible Deployment & Operations
|
||||
"I want to share this with people" → receive a URL. Behind the scenes: hosting, domain, database, SSL, CI/CD. Ongoing maintenance equally invisible. Simple dashboard: *"Your recipe app had 340 visitors this week. Everything is running smoothly."*
|
||||
|
||||
### The Translation Layer (The Magic Glue)
|
||||
|
||||
A deterministic "Human Translator" node at the front of every orchestrator cycle:
|
||||
|
||||
```
|
||||
Raw user message + references
|
||||
↓
|
||||
[Human Translator]
|
||||
↓
|
||||
Precise assumptions, invariants, success criteria
|
||||
↓
|
||||
[Rest of the god-tier orchestrator pipeline]
|
||||
```
|
||||
|
||||
The rest of the graph never sees "vibe language" — only clean spec. This preserves all technical quality while shielding the user.
|
||||
|
||||
### The Scope Protection Layer
|
||||
|
||||
Non-technical users often don't realize how complex their requests are. The system must be honest — gently:
|
||||
|
||||
> *"That's a great idea. Adding social features is significant — it involves user profiles, a follow system, a feed algorithm, and notifications. It'll take as long as everything we've built so far. Want me to go ahead, or finish core recipe features first?"*
|
||||
|
||||
This respects agency while providing the information needed for good decisions.
|
||||
|
||||
### The Meta-Principle
|
||||
|
||||
> The system is a **creative tool**, not a development tool. It should feel like Photoshop or Ableton — a powerful instrument that lets a person with vision manifest that vision without understanding the underlying mechanics. A music producer doesn't need to understand digital signal processing. A filmmaker doesn't need to understand codec compression. **A person with a great app idea shouldn't need to understand React component lifecycle.**
|
||||
|
||||
### What Makes It Feel Magical
|
||||
|
||||
The most powerful systems feel magical when they:
|
||||
- Understand vague ideas
|
||||
- Ask smart clarifying questions
|
||||
- Translate intent into architecture
|
||||
- Show visible progress quickly
|
||||
- Make experimentation safe
|
||||
- Explain decisions clearly
|
||||
- Hide complexity without blocking power
|
||||
|
||||
> When these align, the user experiences: **"I can build anything I imagine."** That feeling is the real product.
|
||||
|
||||
---
|
||||
|
|
@ -1,33 +0,0 @@
|
|||
# Cross-Cutting Themes (Where All 4 Models Converge)
|
||||
|
||||
### Original Themes (Reinforced)
|
||||
|
||||
These ideas appeared independently in all four conversations across both rounds, indicating the highest-confidence principles:
|
||||
|
||||
1. **The LLM should only do what requires judgment.** Everything deterministic belongs in code.
|
||||
2. **Vertical slices are non-negotiable.** End-to-end working increments at every stage.
|
||||
3. **Context leanness = quality.** Less (but more relevant) context produces better outputs than more context.
|
||||
4. **Execution-based verification beats self-assessment.** Run the code. Trust test results over the model's opinion.
|
||||
5. **The orchestrator is the product.** The model is a commodity; the system around it is the differentiator.
|
||||
6. **State must be structured and deterministic.** Never let the LLM manage its own lifecycle or memory.
|
||||
7. **Speed comes from removing unnecessary work.** Not from doing the same work faster.
|
||||
8. **Failure recovery matters more than happy-path perfection.** Design the error paths first.
|
||||
9. **Human involvement should be high-leverage.** Specific questions with context, not open-ended reviews.
|
||||
10. **The system improves over time.** Track patterns, cache solutions, learn from failures.
|
||||
|
||||
### New Themes (From Grey Area Deep-Dives)
|
||||
|
||||
11. **Document assumptions, don't ask about every one.** Proceed with sensible defaults + transparent logging. Review at milestones, not in real-time.
|
||||
12. **The codebase is the lossless source of truth.** Summaries are lossy caches that must be periodically reconciled against actual code. Never summarize summaries.
|
||||
13. **Semantic conflicts are harder than syntactic ones.** Interface contracts must be behavioral specs, not just type signatures. Integration testing is a first-class concern, not an afterthought.
|
||||
14. **Observe before modifying.** Especially in legacy codebases — the agent must understand existing patterns before changing them. Preserve local consistency over global ideals.
|
||||
15. **Taste can be ~80-85% automated.** Convert subjective preferences to concrete, verifiable specs. Reserve human judgment for the remaining gestalt. The gap is closing fast with vision-capable models.
|
||||
16. **Irreversible operations are categorically different.** The agent prepares; the human executes. No exceptions.
|
||||
17. **"Boring" code is good code.** For handoff, enforce standard patterns, limit complexity, and write *why* comments. Automated readability testing catches problems before humans encounter them.
|
||||
18. **Make rewrites cheap, not rare.** Clean interfaces + good tests + branch-based experimentation = rewriting is a safe, routine operation rather than a crisis.
|
||||
19. **Route errors by type, not by severity.** Different error classes need different context, different handlers, and different escalation thresholds. Flaky tests should be quarantined, not fixed.
|
||||
20. **The magic is the translation layer.** For non-technical users, the entire value proposition is the invisible bridge between human intent and technical execution. Every moment the user has to think like a developer is a failure.
|
||||
|
||||
---
|
||||
|
||||
*Generated March 2026. Updated with grey-area deep-dive synthesis. Source material: two rounds of parallel deep-dive conversations with Claude (Anthropic), Gemini (Google), GPT (OpenAI), and Grok (xAI) on optimal autonomous AI coding agent architecture — including the 13 hardest unsolved problems and designing for non-technical users.*
|
||||
|
|
@ -1,37 +0,0 @@
|
|||
# Building Coding Agents — Research
|
||||
|
||||
> Split into individual files for easier consumption.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [01. Work Decomposition](./01-work-decomposition.md)
|
||||
- [02. What to Keep & Discard from Human Engineering](./02-what-to-keep-discard-from-human-engineering.md)
|
||||
- [03. State Machine & Context Management](./03-state-machine-context-management.md)
|
||||
- [04. Optimal Storage for Project Context](./04-optimal-storage-for-project-context.md)
|
||||
- [05. Parallelization Strategy](./05-parallelization-strategy.md)
|
||||
- [06. Maximizing Agent Autonomy & Superpowers](./06-maximizing-agent-autonomy-superpowers.md)
|
||||
- [07. System Prompt & LLM vs Deterministic Split](./07-system-prompt-llm-vs-deterministic-split.md)
|
||||
- [08. Speed Optimization](./08-speed-optimization.md)
|
||||
- [09. Top 10 Tips for a World-Class Agent](./09-top-10-tips-for-a-world-class-agent.md)
|
||||
- [10. Top 10 Pitfalls to Avoid](./10-top-10-pitfalls-to-avoid.md)
|
||||
- [11. God-Tier Context Engineering](./11-god-tier-context-engineering.md)
|
||||
- [12. Handling Ambiguity & Contradiction](./12-handling-ambiguity-contradiction.md)
|
||||
- [13. Long-Running Memory Fidelity](./13-long-running-memory-fidelity.md)
|
||||
- [14. Multi-Agent Semantic Conflict Resolution](./14-multi-agent-semantic-conflict-resolution.md)
|
||||
- [15. Legacy Code & Brownfield Onboarding](./15-legacy-code-brownfield-onboarding.md)
|
||||
- [16. Encoding Taste & Aesthetics](./16-encoding-taste-aesthetics.md)
|
||||
- [17. Irreversible Operations & Safety Architecture](./17-irreversible-operations-safety-architecture.md)
|
||||
- [18. The Handoff Problem: Agent → Human Maintainability](./18-the-handoff-problem-agent-human-maintainability.md)
|
||||
- [19. When to Scrap and Start Over](./19-when-to-scrap-and-start-over.md)
|
||||
- [20. Error Taxonomy & Routing](./20-error-taxonomy-routing.md)
|
||||
- [21. Cost-Quality Tradeoff & Model Routing](./21-cost-quality-tradeoff-model-routing.md)
|
||||
- [22. Cross-Project Learning & Reusable Intelligence](./22-cross-project-learning-reusable-intelligence.md)
|
||||
- [23. Evolution Across Project Scale](./23-evolution-across-project-scale.md)
|
||||
- [24. Security & Trust Boundaries](./24-security-trust-boundaries.md)
|
||||
- [25. Designing for Non-Technical Users ("Vibe Coders")](./25-designing-for-non-technical-users-vibe-coders.md)
|
||||
- [26. Cross-Cutting Themes (Where All 4 Models Converge)](./26-cross-cutting-themes-where-all-4-models-converge.md)
|
||||
|
||||
---
|
||||
|
||||
*Split into per-section files for surgical context loading.*
|
||||
|
||||
|
|
@ -1,82 +0,0 @@
|
|||
# Captures & Triage
|
||||
|
||||
*Introduced in v2.19.0*
|
||||
|
||||
Captures let you fire-and-forget thoughts during auto-mode execution. Instead of pausing auto-mode to steer, you can capture ideas, bugs, or scope changes and let GSD triage them at natural seams between tasks.
|
||||
|
||||
## Quick Start
|
||||
|
||||
While auto-mode is running (or any time):
|
||||
|
||||
```
|
||||
/gsd capture "add rate limiting to the API endpoints"
|
||||
/gsd capture "the auth flow should support OAuth, not just JWT"
|
||||
```
|
||||
|
||||
Captures are appended to `.gsd/CAPTURES.md` and triaged automatically between tasks.
|
||||
|
||||
## How It Works
|
||||
|
||||
### Pipeline
|
||||
|
||||
```
|
||||
capture → triage → confirm → resolve → resume
|
||||
```
|
||||
|
||||
1. **Capture** — `/gsd capture "thought"` appends to `.gsd/CAPTURES.md` with a timestamp and unique ID
|
||||
2. **Triage** — at natural seams between tasks (in `handleAgentEnd`), GSD detects pending captures and classifies them
|
||||
3. **Confirm** — the user is shown the proposed resolution and confirms or adjusts
|
||||
4. **Resolve** — the resolution is applied (task injection, replan trigger, deferral, etc.)
|
||||
5. **Resume** — auto-mode continues
|
||||
|
||||
### Classification Types
|
||||
|
||||
Each capture is classified into one of five types:
|
||||
|
||||
| Type | Meaning | Resolution |
|
||||
|------|---------|------------|
|
||||
| `quick-task` | Small, self-contained fix | Inline quick task executed immediately |
|
||||
| `inject` | New task needed in current slice | Task injected into the active slice plan |
|
||||
| `defer` | Important but not urgent | Deferred to roadmap reassessment |
|
||||
| `replan` | Changes the current approach | Triggers slice replan with capture context |
|
||||
| `note` | Informational, no action needed | Acknowledged, no plan changes |
|
||||
|
||||
### Automatic Triage
|
||||
|
||||
Triage fires automatically between tasks during auto-mode. The triage prompt receives:
|
||||
- All pending captures
|
||||
- The current slice plan
|
||||
- The active roadmap
|
||||
|
||||
The LLM classifies each capture and proposes a resolution. Plan-modifying resolutions (inject, replan) require user confirmation.
|
||||
|
||||
### Manual Triage
|
||||
|
||||
Trigger triage manually at any time:
|
||||
|
||||
```
|
||||
/gsd triage
|
||||
```
|
||||
|
||||
This is useful when you've accumulated several captures and want to process them before the next natural seam.
|
||||
|
||||
## Dashboard Integration
|
||||
|
||||
The progress widget shows a pending capture count badge when captures are waiting for triage. This is visible in both the `Ctrl+Alt+G` dashboard and the auto-mode progress widget.
|
||||
|
||||
## Context Injection
|
||||
|
||||
Capture context is automatically injected into:
|
||||
- **Replan-slice prompts** — so the replan knows what triggered it
|
||||
- **Reassess-roadmap prompts** — so deferred captures influence roadmap decisions
|
||||
|
||||
## Worktree Awareness
|
||||
|
||||
Captures always resolve to the **original project root's** `.gsd/CAPTURES.md`, not the worktree's local copy. This ensures captures from a steering terminal are visible to the auto-mode session running in a worktree.
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd capture "text"` | Capture a thought (quotes optional for single words) |
|
||||
| `/gsd triage` | Manually trigger triage of pending captures |
|
||||
|
|
@ -1,196 +0,0 @@
|
|||
# CI/CD Pipeline Guide
|
||||
|
||||
## Overview
|
||||
|
||||
GSD 2 uses a three-stage promotion pipeline that automatically moves merged PRs through **Dev → Test → Prod** environments using npm dist-tags.
|
||||
|
||||
```
|
||||
PR merged to main
|
||||
│
|
||||
▼
|
||||
┌─────────┐ ci.yml passes (build, test, typecheck)
|
||||
│ DEV │ → publishes gsd-pi@<version>-dev.<sha> with @dev tag
|
||||
└────┬────┘
|
||||
▼ (automatic if green)
|
||||
┌─────────┐ CLI smoke tests + LLM fixture replay
|
||||
│ TEST │ → promotes to @next tag
|
||||
└────┬────┘ → pushes Docker image as :next
|
||||
▼ (manual approval required)
|
||||
┌─────────┐ optional real-LLM integration tests
|
||||
│ PROD │ → promotes to @latest tag
|
||||
└─────────┘ → creates GitHub Release
|
||||
```
|
||||
|
||||
## For Contributors: Testing Your PR Before It Ships
|
||||
|
||||
### Install the Dev Build
|
||||
|
||||
Every merged PR is immediately installable:
|
||||
|
||||
```bash
|
||||
# Latest dev build (bleeding edge, every merged PR)
|
||||
npx gsd-pi@dev
|
||||
|
||||
# Test candidate (passed smoke + fixture tests)
|
||||
npx gsd-pi@next
|
||||
|
||||
# Stable production release
|
||||
npx gsd-pi@latest # or just: npx gsd-pi
|
||||
```
|
||||
|
||||
### Using Docker
|
||||
|
||||
```bash
|
||||
# Test candidate
|
||||
docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:next --version
|
||||
|
||||
# Stable
|
||||
docker run --rm -v $(pwd):/workspace ghcr.io/gsd-build/gsd-pi:latest --version
|
||||
```
|
||||
|
||||
### Checking if a Fix Landed
|
||||
|
||||
1. Find the PR's merge commit SHA (first 7 chars)
|
||||
2. Check if it's in `@dev`: `npm view gsd-pi@dev version`
|
||||
- If the version ends in `-dev.<your-sha>`, your PR is in dev
|
||||
3. Check if it promoted to `@next`: `npm view gsd-pi@next version`
|
||||
4. Check if it's in production: `npm view gsd-pi@latest version`
|
||||
|
||||
## For Maintainers
|
||||
|
||||
### Pipeline Workflows
|
||||
|
||||
| Workflow | File | Trigger | Purpose |
|
||||
|----------|------|---------|---------|
|
||||
| CI | `ci.yml` | PR + push to main | Build, test, typecheck — **gate for all promotions** |
|
||||
| Release Pipeline | `pipeline.yml` | After CI succeeds on main | Three-stage promotion |
|
||||
| Native Binaries | `build-native.yml` | `v*` tags | Cross-compile platform binaries |
|
||||
| Dev Cleanup | `cleanup-dev-versions.yml` | Weekly (Monday 06:00 UTC) | Unpublish `-dev.` versions older than 30 days |
|
||||
| AI Triage | `triage.yml` | New issues + PRs | Automated classification via Claude Haiku (v2.36) |
|
||||
|
||||
**CI optimization (v2.38):** GitHub Actions minutes were reduced ~60-70% (~10k → ~3-4k/month) through workflow consolidation and caching improvements.
|
||||
|
||||
**Pipeline optimization (v2.41):**
|
||||
- **Shallow clones** — CI lint and build jobs use `fetch-depth: 1` or `fetch-depth: 2` instead of full history, saving ~30-60s per job
|
||||
- **npm cache in pipeline** — dev-publish, test-verify, and prod-release now use `cache: 'npm'` on setup-node, saving ~1-2 min per job on repeat runs
|
||||
- **Exponential backoff** — npm registry propagation waits in `build-native.yml` replaced hardcoded `sleep 30` + fixed 15s retries with exponential backoff (5s → 10s → 20s → 30s cap), typically finishing in <15s when the registry is fast
|
||||
- **Security hardening** — pipeline.yml moved `${{ }}` expressions from `run:` blocks to `env:` variables to prevent command injection vectors
|
||||
### Docs-Only PR Detection (v2.41)
|
||||
|
||||
CI automatically detects when a PR contains only documentation changes (`.md` files and `docs/` content). When docs-only:
|
||||
|
||||
- **Skipped:** `build`, `windows-portability` (no code to compile or test)
|
||||
- **Still runs:** `lint` (secret scanning, `.gsd/` check), `docs-check` (prompt injection scan)
|
||||
|
||||
This saves CI minutes on documentation PRs while still enforcing security checks.
|
||||
|
||||
### Prompt Injection Scan (v2.41)
|
||||
|
||||
The `docs-check` job runs `scripts/docs-prompt-injection-scan.sh` on every PR that touches markdown files. It scans documentation prose (excluding fenced code blocks) for patterns that could manipulate LLM behavior when docs are ingested as context:
|
||||
|
||||
- **System prompt markers** — `<system-prompt>`, `<|im_start|>system`, `[SYSTEM]:`
|
||||
- **Role/instruction overrides** — `ignore previous instructions`, `you are now`, `new instructions:`
|
||||
- **Hidden HTML directives** — `<!-- PROMPT:`, `<!-- INSTRUCTION:`
|
||||
- **Tool call injection** — `<tool_call>`, `<function_call>`, `<invoke`
|
||||
- **Invisible Unicode** — zero-width character sequences that hide directives
|
||||
|
||||
Content inside fenced code blocks (` ``` `) is excluded — patterns in code examples are expected and legitimate.
|
||||
|
||||
**False positives:** Add exceptions to `.prompt-injection-scanignore` using the same format as `.secretscanignore` (one pattern per line, `file:regex` for file-scoped exceptions).
|
||||
|
||||
### Gating Tests
|
||||
|
||||
The pipeline only triggers after `ci.yml` passes. Key gating tests include:
|
||||
|
||||
- **Unit tests** (`npm run test:unit`) — includes `auto-session-encapsulation.test.ts` which enforces that all auto-mode state is encapsulated in `AutoSession`, plus dispatch loop regression tests that exercise the full `deriveState → resolveDispatch → idempotency` chain without an LLM. Any PR adding module-level mutable state to `auto.ts` will fail CI and block the pipeline.
|
||||
- **Integration tests** (`npm run test:integration`)
|
||||
- **Extension typecheck** (`npm run typecheck:extensions`)
|
||||
- **Package validation** (`npm run validate-pack`)
|
||||
- **Smoke tests** (`npm run test:smoke`) — run post-build in the pipeline against the local binary and again against the globally-installed `@dev` package
|
||||
- **Fixture tests** (`npm run test:fixtures`) — replay recorded LLM conversations without hitting real APIs
|
||||
- **Live regression tests** (`npm run test:live-regression`) — run against the installed binary in the Test stage to catch runtime regressions before promotion to `@next`
|
||||
|
||||
### Approving a Prod Release
|
||||
|
||||
1. A version reaches the Test stage automatically
|
||||
2. In GitHub Actions, the `prod-release` job will show "Waiting for review"
|
||||
3. Click **Review deployments** → select `prod` → **Approve**
|
||||
4. The version is promoted to `@latest` and a GitHub Release is created
|
||||
|
||||
To enable live LLM tests during Prod promotion:
|
||||
- Set the `RUN_LIVE_TESTS` environment variable to `true` on the `prod` environment
|
||||
|
||||
### Rolling Back a Release
|
||||
|
||||
If a broken version reaches production:
|
||||
|
||||
```bash
|
||||
# Roll back npm
|
||||
npm dist-tag add gsd-pi@<previous-good-version> latest
|
||||
|
||||
# Roll back Docker
|
||||
docker pull ghcr.io/gsd-build/gsd-pi:<previous-good-version>
|
||||
docker tag ghcr.io/gsd-build/gsd-pi:<previous-good-version> ghcr.io/gsd-build/gsd-pi:latest
|
||||
docker push ghcr.io/gsd-build/gsd-pi:latest
|
||||
```
|
||||
|
||||
For `@dev` or `@next` rollbacks, the next successful merge will overwrite the tag automatically.
|
||||
|
||||
### GitHub Configuration Required
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Environment: `dev` | No protection rules |
|
||||
| Environment: `test` | No protection rules |
|
||||
| Environment: `prod` | Required reviewers: maintainers |
|
||||
| Secret: `NPM_TOKEN` | All environments |
|
||||
| Secret: `ANTHROPIC_API_KEY` | Prod environment only |
|
||||
| Secret: `OPENAI_API_KEY` | Prod environment only |
|
||||
| Variable: `RUN_LIVE_TESTS` | `false` (set to `true` to enable live LLM tests) |
|
||||
| GHCR | Enabled for the `gsd-build` org |
|
||||
|
||||
### Docker Images
|
||||
|
||||
| Image | Base | Purpose | Tags |
|
||||
|-------|------|---------|------|
|
||||
| `ghcr.io/gsd-build/gsd-ci-builder` | `node:24-bookworm` | CI build environment with Rust toolchain | `:latest`, `:<date>` |
|
||||
| `ghcr.io/gsd-build/gsd-pi` | `node:24-slim` | User-facing runtime | `:latest`, `:next`, `:v<version>` |
|
||||
|
||||
The CI builder image is rebuilt automatically when the `Dockerfile` changes. It eliminates ~3-5 min of toolchain setup per CI run.
|
||||
|
||||
## LLM Fixture Tests
|
||||
|
||||
The fixture system records and replays LLM conversations without hitting real APIs (zero cost).
|
||||
|
||||
### Running Fixture Tests
|
||||
|
||||
```bash
|
||||
npm run test:fixtures
|
||||
```
|
||||
|
||||
### Recording New Fixtures
|
||||
|
||||
```bash
|
||||
# Set your API key, then record
|
||||
GSD_FIXTURE_MODE=record GSD_FIXTURE_DIR=./tests/fixtures/recordings \
|
||||
node --experimental-strip-types tests/fixtures/record.ts
|
||||
```
|
||||
|
||||
Fixtures are JSON files in `tests/fixtures/recordings/`. Each one captures a conversation's request/response pairs and replays them by turn index.
|
||||
|
||||
### When to Re-Record
|
||||
|
||||
Re-record fixtures when:
|
||||
- Provider wire format changes (e.g., new field in Anthropic response)
|
||||
- Tool definitions change (affects request shape)
|
||||
- System prompt changes (may cause turn count mismatch)
|
||||
|
||||
## Version Strategy
|
||||
|
||||
| Tag | Published | Format | Who uses it |
|
||||
|-----|-----------|--------|-------------|
|
||||
| `@dev` | Every merged PR | `2.27.0-dev.a3f2c1b` | Developers verifying fixes |
|
||||
| `@next` | Auto-promoted from dev | Same version | Early adopters, beta testers |
|
||||
| `@latest` | Manually approved | Same version | Production users |
|
||||
|
||||
Old `-dev.` versions are cleaned up weekly (30-day retention).
|
||||
|
|
@ -1,307 +0,0 @@
|
|||
# Commands Reference
|
||||
|
||||
## Session Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd` | Step mode — execute one unit at a time, pause between each |
|
||||
| `/gsd next` | Explicit step mode (same as `/gsd`) |
|
||||
| `/gsd auto` | Autonomous mode — research, plan, execute, commit, repeat |
|
||||
| `/gsd quick` | Execute a quick task with GSD guarantees (atomic commits, state tracking) without full planning overhead |
|
||||
| `/gsd stop` | Stop auto mode gracefully |
|
||||
| `/gsd pause` | Pause auto-mode (preserves state, `/gsd auto` to resume) |
|
||||
| `/gsd steer` | Hard-steer plan documents during execution |
|
||||
| `/gsd discuss` | Discuss architecture and decisions (works alongside auto mode) |
|
||||
| `/gsd status` | Progress dashboard |
|
||||
| `/gsd widget` | Cycle dashboard widget: full / small / min / off |
|
||||
| `/gsd queue` | Queue and reorder future milestones (safe during auto mode) |
|
||||
| `/gsd capture` | Fire-and-forget thought capture (works during auto mode) |
|
||||
| `/gsd triage` | Manually trigger triage of pending captures |
|
||||
| `/gsd dispatch` | Dispatch a specific phase directly (research, plan, execute, complete, reassess, uat, replan) |
|
||||
| `/gsd history` | View execution history (supports `--cost`, `--phase`, `--model` filters) |
|
||||
| `/gsd forensics` | Full-access GSD debugger — structured anomaly detection, unit traces, and LLM-guided root-cause analysis for auto-mode failures |
|
||||
| `/gsd cleanup` | Clean up GSD state files and stale worktrees |
|
||||
| `/gsd visualize` | Open workflow visualizer (progress, deps, metrics, timeline) |
|
||||
| `/gsd export --html` | Generate self-contained HTML report for current or completed milestone |
|
||||
| `/gsd export --html --all` | Generate retrospective reports for all milestones at once |
|
||||
| `/gsd update` | Update GSD to the latest version in-session |
|
||||
| `/gsd knowledge` | Add persistent project knowledge (rule, pattern, or lesson) |
|
||||
| `/gsd fast` | Toggle service tier for supported models (prioritized API routing) |
|
||||
| `/gsd rate` | Rate last unit's model tier (over/ok/under) — improves adaptive routing |
|
||||
| `/gsd changelog` | Show categorized release notes |
|
||||
| `/gsd logs` | Browse activity logs, debug logs, and metrics |
|
||||
| `/gsd remote` | Control remote auto-mode |
|
||||
| `/gsd help` | Categorized command reference with descriptions for all GSD subcommands |
|
||||
|
||||
## Configuration & Diagnostics
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd prefs` | Model selection, timeouts, budget ceiling |
|
||||
| `/gsd mode` | Switch workflow mode (solo/team) with coordinated defaults for milestone IDs, git commit behavior, and documentation |
|
||||
| `/gsd config` | Re-run the provider setup wizard (LLM provider + tool keys) |
|
||||
| `/gsd keys` | API key manager — list, add, remove, test, rotate, doctor |
|
||||
| `/gsd doctor` | Runtime health checks with auto-fix — issues surface in real time across widget, visualizer, and HTML reports (v2.40) |
|
||||
| `/gsd inspect` | Show SQLite DB diagnostics |
|
||||
| `/gsd init` | Project init wizard — detect, configure, bootstrap `.gsd/` |
|
||||
| `/gsd setup` | Global setup status and configuration |
|
||||
| `/gsd skill-health` | Skill lifecycle dashboard — usage stats, success rates, token trends, staleness warnings |
|
||||
| `/gsd skill-health <name>` | Detailed view for a single skill |
|
||||
| `/gsd skill-health --declining` | Show only skills flagged for declining performance |
|
||||
| `/gsd skill-health --stale N` | Show skills unused for N+ days |
|
||||
| `/gsd hooks` | Show configured post-unit and pre-dispatch hooks |
|
||||
| `/gsd run-hook` | Manually trigger a specific hook |
|
||||
| `/gsd migrate` | Migrate a v1 `.planning` directory to `.gsd` format |
|
||||
|
||||
## Milestone Management
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd new-milestone` | Create a new milestone |
|
||||
| `/gsd skip` | Prevent a unit from auto-mode dispatch |
|
||||
| `/gsd undo` | Revert last completed unit |
|
||||
| `/gsd undo-task` | Reset a specific task's completion state (DB + markdown) |
|
||||
| `/gsd reset-slice` | Reset a slice and all its tasks (DB + markdown) |
|
||||
| `/gsd park` | Park a milestone — skip without deleting |
|
||||
| `/gsd unpark` | Reactivate a parked milestone |
|
||||
| Discard milestone | Available via `/gsd` wizard → "Milestone actions" → "Discard" |
|
||||
|
||||
## Parallel Orchestration
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
|
||||
| `/gsd parallel status` | Show all workers with state, progress, and cost |
|
||||
| `/gsd parallel stop [MID]` | Stop all workers or a specific milestone's worker |
|
||||
| `/gsd parallel pause [MID]` | Pause all workers or a specific one |
|
||||
| `/gsd parallel resume [MID]` | Resume paused workers |
|
||||
| `/gsd parallel merge [MID]` | Merge completed milestones back to main |
|
||||
|
||||
See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.
|
||||
|
||||
## Workflow Templates (v2.42)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd start` | Start a workflow template (bugfix, spike, feature, hotfix, refactor, security-audit, dep-upgrade, full-project) |
|
||||
| `/gsd start resume` | Resume an in-progress workflow |
|
||||
| `/gsd templates` | List available workflow templates |
|
||||
| `/gsd templates info <name>` | Show detailed template info |
|
||||
|
||||
## Custom Workflows (v2.42)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd workflow new` | Create a new workflow definition (via skill) |
|
||||
| `/gsd workflow run <name>` | Create a run and start auto-mode |
|
||||
| `/gsd workflow list` | List workflow runs |
|
||||
| `/gsd workflow validate <name>` | Validate a workflow definition YAML |
|
||||
| `/gsd workflow pause` | Pause custom workflow auto-mode |
|
||||
| `/gsd workflow resume` | Resume paused custom workflow auto-mode |
|
||||
|
||||
## Extensions
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd extensions list` | List all extensions and their status |
|
||||
| `/gsd extensions enable <id>` | Enable a disabled extension |
|
||||
| `/gsd extensions disable <id>` | Disable an extension |
|
||||
| `/gsd extensions info <id>` | Show extension details |
|
||||
|
||||
## cmux Integration
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd cmux status` | Show cmux detection, prefs, and capabilities |
|
||||
| `/gsd cmux on` | Enable cmux integration |
|
||||
| `/gsd cmux off` | Disable cmux integration |
|
||||
| `/gsd cmux notifications on/off` | Toggle cmux desktop notifications |
|
||||
| `/gsd cmux sidebar on/off` | Toggle cmux sidebar metadata |
|
||||
| `/gsd cmux splits on/off` | Toggle cmux visual subagent splits |
|
||||
|
||||
## GitHub Sync (v2.39)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/github-sync bootstrap` | Initial setup — creates GitHub Milestones, Issues, and draft PRs from current `.gsd/` state |
|
||||
| `/github-sync status` | Show sync mapping counts (milestones, slices, tasks) |
|
||||
|
||||
Enable with `github.enabled: true` in preferences. Requires `gh` CLI installed and authenticated. Sync mapping is persisted in `.gsd/.github-sync.json`.
|
||||
|
||||
## Git Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/worktree` (`/wt`) | Git worktree lifecycle — create, switch, merge, remove |
|
||||
|
||||
## Session Management
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/clear` | Start a new session (alias for `/new`) |
|
||||
| `/exit` | Graceful shutdown — saves session state before exiting |
|
||||
| `/kill` | Kill GSD process immediately |
|
||||
| `/model` | Switch the active model |
|
||||
| `/login` | Log in to an LLM provider |
|
||||
| `/thinking` | Toggle thinking level during sessions |
|
||||
| `/voice` | Toggle real-time speech-to-text (macOS, Linux) |
|
||||
|
||||
## Keyboard Shortcuts
|
||||
|
||||
| Shortcut | Action |
|
||||
|----------|--------|
|
||||
| `Ctrl+Alt+G` | Toggle dashboard overlay |
|
||||
| `Ctrl+Alt+V` | Toggle voice transcription |
|
||||
| `Ctrl+Alt+B` | Show background shell processes |
|
||||
| `Ctrl+V` / `Alt+V` | Paste image from clipboard (screenshot → vision input) |
|
||||
| `Escape` | Pause auto mode (preserves conversation) |
|
||||
|
||||
> **Note:** In terminals without Kitty keyboard protocol support (macOS Terminal.app, JetBrains IDEs), slash-command fallbacks are shown instead of `Ctrl+Alt` shortcuts.
|
||||
>
|
||||
> **Tip:** If `Ctrl+V` is intercepted by your terminal (e.g. Warp), use `Alt+V` instead for clipboard image paste.
|
||||
|
||||
## CLI Flags
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `gsd` | Start a new interactive session |
|
||||
| `gsd --continue` (`-c`) | Resume the most recent session for the current directory |
|
||||
| `gsd --model <id>` | Override the default model for this session |
|
||||
| `gsd --print "msg"` (`-p`) | Single-shot prompt mode (no TUI) |
|
||||
| `gsd --mode <text\|json\|rpc\|mcp>` | Output mode for non-interactive use |
|
||||
| `gsd --list-models [search]` | List available models and exit |
|
||||
| `gsd --web [path]` | Start browser-based web interface (optional project path) |
|
||||
| `gsd --worktree` (`-w`) [name] | Start session in a git worktree (auto-generates name if omitted) |
|
||||
| `gsd --no-session` | Disable session persistence |
|
||||
| `gsd --extension <path>` | Load an additional extension (can be repeated) |
|
||||
| `gsd --append-system-prompt <text>` | Append text to the system prompt |
|
||||
| `gsd --tools <list>` | Comma-separated list of tools to enable |
|
||||
| `gsd --version` (`-v`) | Print version and exit |
|
||||
| `gsd --help` (`-h`) | Print help and exit |
|
||||
| `gsd sessions` | Interactive session picker — list all saved sessions for the current directory and choose one to resume |
|
||||
| `gsd --debug` | Enable structured JSONL diagnostic logging for troubleshooting dispatch and state issues |
|
||||
| `gsd config` | Set up global API keys for search and docs tools (saved to `~/.gsd/agent/auth.json`, applies to all projects). See [Global API Keys](./configuration.md#global-api-keys-gsd-config). |
|
||||
| `gsd update` | Update GSD to the latest version |
|
||||
| `gsd headless new-milestone` | Create a new milestone from a context file (headless — no TUI required) |
|
||||
|
||||
## Headless Mode
|
||||
|
||||
`gsd headless` runs `/gsd` commands without a TUI — designed for CI, cron jobs, and scripted automation. It spawns a child process in RPC mode, auto-responds to interactive prompts, detects completion, and exits with meaningful exit codes.
|
||||
|
||||
```bash
|
||||
# Run auto mode (default)
|
||||
gsd headless
|
||||
|
||||
# Run a single unit
|
||||
gsd headless next
|
||||
|
||||
# Instant JSON snapshot — no LLM, ~50ms
|
||||
gsd headless query
|
||||
|
||||
# With timeout for CI
|
||||
gsd headless --timeout 600000 auto
|
||||
|
||||
# Force a specific phase
|
||||
gsd headless dispatch plan
|
||||
|
||||
# Create a new milestone from a context file and start auto mode
|
||||
gsd headless new-milestone --context brief.md --auto
|
||||
|
||||
# Create a milestone from inline text
|
||||
gsd headless new-milestone --context-text "Build a REST API with auth"
|
||||
|
||||
# Pipe context from stdin
|
||||
echo "Build a CLI tool" | gsd headless new-milestone --context -
|
||||
```
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--timeout N` | Overall timeout in milliseconds (default: 300000 / 5 min) |
|
||||
| `--max-restarts N` | Auto-restart on crash with exponential backoff (default: 3). Set 0 to disable |
|
||||
| `--json` | Stream all events as JSONL to stdout |
|
||||
| `--model ID` | Override the model for the headless session |
|
||||
| `--context <file>` | Context file for `new-milestone` (use `-` for stdin) |
|
||||
| `--context-text <text>` | Inline context text for `new-milestone` |
|
||||
| `--auto` | Chain into auto-mode after milestone creation |
|
||||
|
||||
**Exit codes:** `0` = complete, `1` = error or timeout, `2` = blocked.
|
||||
|
||||
Any `/gsd` subcommand works as a positional argument — `gsd headless status`, `gsd headless doctor`, `gsd headless dispatch execute`, etc.
|
||||
|
||||
### `gsd headless query`
|
||||
|
||||
Returns a single JSON object with the full project snapshot — no LLM session, no RPC child, instant response (~50ms). This is the recommended way for orchestrators and scripts to inspect GSD state.
|
||||
|
||||
```bash
|
||||
gsd headless query | jq '.state.phase'
|
||||
# "executing"
|
||||
|
||||
gsd headless query | jq '.next'
|
||||
# {"action":"dispatch","unitType":"execute-task","unitId":"M001/S01/T03"}
|
||||
|
||||
gsd headless query | jq '.cost.total'
|
||||
# 4.25
|
||||
```
|
||||
|
||||
**Output schema:**
|
||||
|
||||
```json
|
||||
{
|
||||
"state": {
|
||||
"phase": "executing",
|
||||
"activeMilestone": { "id": "M001", "title": "..." },
|
||||
"activeSlice": { "id": "S01", "title": "..." },
|
||||
"activeTask": { "id": "T01", "title": "..." },
|
||||
"registry": [{ "id": "M001", "status": "active" }, ...],
|
||||
"progress": { "milestones": { "done": 0, "total": 2 }, "slices": { "done": 1, "total": 3 } },
|
||||
"blockers": []
|
||||
},
|
||||
"next": {
|
||||
"action": "dispatch",
|
||||
"unitType": "execute-task",
|
||||
"unitId": "M001/S01/T01"
|
||||
},
|
||||
"cost": {
|
||||
"workers": [{ "milestoneId": "M001", "cost": 1.50, "state": "running", ... }],
|
||||
"total": 1.50
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## MCP Server Mode
|
||||
|
||||
`gsd --mode mcp` runs GSD as a [Model Context Protocol](https://modelcontextprotocol.io) server over stdin/stdout. This exposes all GSD tools (read, write, edit, bash, etc.) to external AI clients — Claude Desktop, VS Code Copilot, and any MCP-compatible host.
|
||||
|
||||
```bash
|
||||
# Start GSD as an MCP server
|
||||
gsd --mode mcp
|
||||
```
|
||||
|
||||
The server registers all tools from the agent session and maps MCP `tools/list` and `tools/call` requests to GSD tool definitions. It runs until the transport closes.
|
||||
|
||||
## In-Session Update
|
||||
|
||||
`/gsd update` checks npm for a newer version of GSD and installs it without leaving the session.
|
||||
|
||||
```bash
|
||||
/gsd update
|
||||
# Current version: v2.36.0
|
||||
# Checking npm registry...
|
||||
# Updated to v2.37.0. Restart GSD to use the new version.
|
||||
```
|
||||
|
||||
If already up to date, it reports so and takes no action.
|
||||
|
||||
## Export
|
||||
|
||||
`/gsd export` generates reports of milestone work.
|
||||
|
||||
```bash
|
||||
# Generate HTML report for the active milestone
|
||||
/gsd export --html
|
||||
|
||||
# Generate retrospective reports for ALL milestones at once
|
||||
/gsd export --html --all
|
||||
```
|
||||
|
||||
Reports are saved to `.gsd/reports/` with a browseable `index.html` that links to all generated snapshots.
|
||||
|
|
@ -1,781 +0,0 @@
|
|||
# Configuration
|
||||
|
||||
GSD preferences live in `~/.gsd/preferences.md` (global) or `.gsd/preferences.md` (project-local). Manage interactively with `/gsd prefs`.
|
||||
|
||||
## `/gsd prefs` Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd prefs` | Open the global preferences wizard (default) |
|
||||
| `/gsd prefs global` | Interactive wizard for global preferences (`~/.gsd/preferences.md`) |
|
||||
| `/gsd prefs project` | Interactive wizard for project preferences (`.gsd/preferences.md`) |
|
||||
| `/gsd prefs status` | Show current preference files, merged values, and skill resolution status |
|
||||
| `/gsd prefs wizard` | Alias for `/gsd prefs global` |
|
||||
| `/gsd prefs setup` | Alias for `/gsd prefs wizard` — creates preferences file if missing |
|
||||
| `/gsd prefs import-claude` | Import Claude marketplace plugins and skills as namespaced GSD components |
|
||||
| `/gsd prefs import-claude global` | Import to global scope |
|
||||
| `/gsd prefs import-claude project` | Import to project scope |
|
||||
|
||||
## Preferences File Format
|
||||
|
||||
Preferences use YAML frontmatter in a markdown file:
|
||||
|
||||
```yaml
|
||||
---
|
||||
version: 1
|
||||
models:
|
||||
research: claude-sonnet-4-6
|
||||
planning: claude-opus-4-6
|
||||
execution: claude-sonnet-4-6
|
||||
completion: claude-sonnet-4-6
|
||||
skill_discovery: suggest
|
||||
auto_supervisor:
|
||||
soft_timeout_minutes: 20
|
||||
idle_timeout_minutes: 10
|
||||
hard_timeout_minutes: 30
|
||||
budget_ceiling: 50.00
|
||||
token_profile: balanced
|
||||
---
|
||||
```
|
||||
|
||||
## Global vs Project Preferences
|
||||
|
||||
| Scope | Path | Applies to |
|
||||
|-------|------|-----------|
|
||||
| Global | `~/.gsd/preferences.md` | All projects |
|
||||
| Project | `.gsd/preferences.md` | Current project only |
|
||||
|
||||
**Merge behavior:**
|
||||
- **Scalar fields** (`skill_discovery`, `budget_ceiling`): project wins if defined
|
||||
- **Array fields** (`always_use_skills`, etc.): concatenated (global first, then project)
|
||||
- **Object fields** (`models`, `git`, `auto_supervisor`): shallow-merged, project overrides per-key
|
||||
|
||||
## Global API Keys (`/gsd config`)
|
||||
|
||||
Tool API keys are stored globally in `~/.gsd/agent/auth.json` and apply to all projects automatically. Set them once with `/gsd config` — no need to configure per-project `.env` files.
|
||||
|
||||
```bash
|
||||
/gsd config
|
||||
```
|
||||
|
||||
This opens an interactive wizard showing which keys are configured and which are missing. Select a tool to enter its key.
|
||||
|
||||
### Supported keys
|
||||
|
||||
| Tool | Environment Variable | Purpose | Get a key |
|
||||
|------|---------------------|---------|-----------|
|
||||
| Tavily Search | `TAVILY_API_KEY` | Web search for non-Anthropic models | [tavily.com/app/api-keys](https://tavily.com/app/api-keys) |
|
||||
| Brave Search | `BRAVE_API_KEY` | Web search for non-Anthropic models | [brave.com/search/api](https://brave.com/search/api) |
|
||||
| Context7 Docs | `CONTEXT7_API_KEY` | Library documentation lookup | [context7.com/dashboard](https://context7.com/dashboard) |
|
||||
|
||||
### How it works
|
||||
|
||||
1. `/gsd config` saves keys to `~/.gsd/agent/auth.json`
|
||||
2. On every session start, `loadToolApiKeys()` reads the file and sets environment variables
|
||||
3. Keys apply to all projects — no per-project setup required
|
||||
4. Environment variables (`export BRAVE_API_KEY=...`) take precedence over saved keys
|
||||
5. Anthropic models don't need Brave/Tavily — they have built-in web search
|
||||
|
||||
## MCP Servers
|
||||
|
||||
GSD can connect to external MCP servers configured in project files. This is useful for local tools, internal APIs, self-hosted services, or integrations that aren't built in as native GSD extensions.
|
||||
|
||||
### Config file locations
|
||||
|
||||
GSD reads MCP client configuration from these project-local paths:
|
||||
|
||||
- `.mcp.json`
|
||||
- `.gsd/mcp.json`
|
||||
|
||||
If both files exist, server names are merged and the first definition found wins. Use:
|
||||
|
||||
- `.mcp.json` for repo-shared MCP configuration you may want to commit
|
||||
- `.gsd/mcp.json` for local-only MCP configuration you do **not** want to share
|
||||
|
||||
### Supported transports
|
||||
|
||||
| Transport | Config shape | Use when |
|
||||
|-----------|--------------|----------|
|
||||
| `stdio` | `command` + optional `args`, `env`, `cwd` | Launching a local MCP server process |
|
||||
| `http` | `url` | Connecting to an already-running MCP server over HTTP |
|
||||
|
||||
### Example: stdio server
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"my-server": {
|
||||
"type": "stdio",
|
||||
"command": "/absolute/path/to/python3",
|
||||
"args": ["/absolute/path/to/server.py"],
|
||||
"env": {
|
||||
"API_URL": "http://localhost:8000"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Example: HTTP server
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"my-http-server": {
|
||||
"url": "http://localhost:8080/mcp"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Verifying a server
|
||||
|
||||
After adding config, verify it from a GSD session:
|
||||
|
||||
```text
|
||||
mcp_servers
|
||||
mcp_discover(server="my-server")
|
||||
mcp_call(server="my-server", tool="<tool_name>", args={...})
|
||||
```
|
||||
|
||||
Recommended verification order:
|
||||
|
||||
1. `mcp_servers` — confirms GSD can see the config file and parse the server entry
|
||||
2. `mcp_discover` — confirms the server process starts and responds to `tools/list`
|
||||
3. `mcp_call` — confirms at least one real tool invocation works
|
||||
|
||||
### Notes
|
||||
|
||||
- Use absolute paths for local executables and scripts when possible.
|
||||
- For `stdio` servers, prefer setting required environment variables directly in the MCP config instead of relying on an interactive shell profile.
|
||||
- If a server is team-shared and safe to commit, `.mcp.json` is usually the better home.
|
||||
- If a server depends on machine-local paths, personal services, or local-only secrets, prefer `.gsd/mcp.json`.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `GSD_HOME` | `~/.gsd` | Global GSD directory. All paths derive from this unless individually overridden. Affects preferences, skills, sessions, and per-project state. (v2.39) |
|
||||
| `GSD_PROJECT_ID` | (auto-hash) | Override the automatic project identity hash. Per-project state goes to `$GSD_HOME/projects/<GSD_PROJECT_ID>/` instead of the computed hash. Useful for CI/CD or sharing state across clones of the same repo. (v2.39) |
|
||||
| `GSD_STATE_DIR` | `$GSD_HOME` | Per-project state root. Controls where `projects/<repo-hash>/` directories are created. Takes precedence over `GSD_HOME` for project state. |
|
||||
| `GSD_CODING_AGENT_DIR` | `$GSD_HOME/agent` | Agent directory containing managed resources, extensions, and auth. Takes precedence over `GSD_HOME` for agent paths. |
|
||||
|
||||
## All Settings
|
||||
|
||||
### `models`
|
||||
|
||||
Per-phase model selection. Each key accepts a model string or an object with fallbacks.
|
||||
|
||||
```yaml
|
||||
models:
|
||||
research: claude-sonnet-4-6
|
||||
planning:
|
||||
model: claude-opus-4-6
|
||||
fallbacks:
|
||||
- openrouter/z-ai/glm-5
|
||||
execution: claude-sonnet-4-6
|
||||
execution_simple: claude-haiku-4-5-20250414
|
||||
completion: claude-sonnet-4-6
|
||||
subagent: claude-sonnet-4-6
|
||||
```
|
||||
|
||||
**Phases:** `research`, `planning`, `execution`, `execution_simple`, `completion`, `subagent`
|
||||
|
||||
- `execution_simple` — used for tasks classified as "simple" by the [complexity router](./token-optimization.md#complexity-based-task-routing)
|
||||
- `subagent` — model for delegated subagent tasks (scout, researcher, worker)
|
||||
- Provider targeting: use `provider/model` format (e.g., `bedrock/claude-sonnet-4-6`) or the `provider` field in object format
|
||||
- Omit a key to use whatever model is currently active
|
||||
|
||||
### Custom Model Definitions (`models.json`)
|
||||
|
||||
Define custom models and providers in `~/.gsd/agent/models.json`. This lets you add models not included in the default registry — useful for self-hosted endpoints (Ollama, vLLM, LM Studio), fine-tuned models, proxies, or new provider releases.
|
||||
|
||||
GSD resolves models.json with fallback logic:
|
||||
1. `~/.gsd/agent/models.json` — primary (GSD)
|
||||
2. `~/.pi/agent/models.json` — fallback (Pi)
|
||||
3. If neither exists, creates `~/.gsd/agent/models.json`
|
||||
|
||||
**Quick example for local models (Ollama):**
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"baseUrl": "http://localhost:11434/v1",
|
||||
"api": "openai-completions",
|
||||
"apiKey": "ollama",
|
||||
"models": [
|
||||
{ "id": "llama3.1:8b" },
|
||||
{ "id": "qwen2.5-coder:7b" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The file reloads each time you open `/model` — no restart needed.
|
||||
|
||||
For full documentation including provider configuration, model overrides, OpenAI compatibility settings, and advanced examples, see the [Custom Models Guide](./custom-models.md).
|
||||
|
||||
**With fallbacks:**
|
||||
|
||||
```yaml
|
||||
models:
|
||||
planning:
|
||||
model: claude-opus-4-6
|
||||
fallbacks:
|
||||
- openrouter/z-ai/glm-5
|
||||
- openrouter/moonshotai/kimi-k2.5
|
||||
provider: bedrock # optional: target a specific provider
|
||||
```
|
||||
|
||||
When a model fails to switch (provider unavailable, rate limited, credits exhausted), GSD automatically tries the next model in the `fallbacks` list.
|
||||
|
||||
### Community Provider Extensions
|
||||
|
||||
For providers not built into GSD, community extensions can add full provider support with proper model definitions, thinking format configuration, and interactive API key setup.
|
||||
|
||||
| Extension | Provider | Models | Install |
|
||||
|-----------|----------|--------|---------|
|
||||
| [`pi-dashscope`](https://www.npmjs.com/package/pi-dashscope) | Alibaba DashScope (ModelStudio) | Qwen3, GLM-5, MiniMax M2.5, Kimi K2.5 | `gsd install npm:pi-dashscope` |
|
||||
|
||||
Community extensions are recommended over the built-in `alibaba-coding-plan` provider for DashScope models — they use the correct OpenAI-compatible endpoint and include per-model compatibility flags for thinking mode.
|
||||
|
||||
### `token_profile`
|
||||
|
||||
Coordinates model selection, phase skipping, and context compression. See [Token Optimization](./token-optimization.md).
|
||||
|
||||
Values: `budget`, `balanced` (default), `quality`
|
||||
|
||||
| Profile | Behavior |
|
||||
|---------|----------|
|
||||
| `budget` | Skips research + reassessment phases, uses cheaper models |
|
||||
| `balanced` | Default behavior — all phases run, standard model selection |
|
||||
| `quality` | All phases run, prefers higher-quality models |
|
||||
|
||||
### `phases`
|
||||
|
||||
Fine-grained control over which phases run in auto mode:
|
||||
|
||||
```yaml
|
||||
phases:
|
||||
skip_research: false # skip milestone-level research
|
||||
skip_reassess: false # skip roadmap reassessment after each slice
|
||||
skip_slice_research: true # skip per-slice research
|
||||
reassess_after_slice: true # enable roadmap reassessment after each slice (required for reassessment)
|
||||
require_slice_discussion: false # pause auto-mode before each slice for discussion
|
||||
```
|
||||
|
||||
These are usually set automatically by `token_profile`, but can be overridden explicitly.
|
||||
|
||||
> **Note:** Roadmap reassessment requires `reassess_after_slice: true` to be set explicitly. Without it, reassessment is skipped regardless of `skip_reassess`.
|
||||
|
||||
### `skill_discovery`
|
||||
|
||||
Controls how GSD finds and applies skills during auto mode.
|
||||
|
||||
| Value | Behavior |
|
||||
|-------|----------|
|
||||
| `auto` | Skills found and applied automatically |
|
||||
| `suggest` | Skills identified during research but not auto-installed (default) |
|
||||
| `off` | Skill discovery disabled |
|
||||
|
||||
### `auto_supervisor`
|
||||
|
||||
Timeout thresholds for auto mode supervision:
|
||||
|
||||
```yaml
|
||||
auto_supervisor:
|
||||
model: claude-sonnet-4-6 # optional: model for supervisor (defaults to active model)
|
||||
soft_timeout_minutes: 20 # warn LLM to wrap up
|
||||
idle_timeout_minutes: 10 # detect stalls
|
||||
hard_timeout_minutes: 30 # pause auto mode
|
||||
```
|
||||
|
||||
### `budget_ceiling`
|
||||
|
||||
Maximum USD to spend during auto mode. No `$` sign — just the number.
|
||||
|
||||
```yaml
|
||||
budget_ceiling: 50.00
|
||||
```
|
||||
|
||||
### `budget_enforcement`
|
||||
|
||||
How the budget ceiling is enforced:
|
||||
|
||||
| Value | Behavior |
|
||||
|-------|----------|
|
||||
| `warn` | Log a warning but continue |
|
||||
| `pause` | Pause auto mode (default when ceiling is set) |
|
||||
| `halt` | Stop auto mode entirely |
|
||||
|
||||
### `context_pause_threshold`
|
||||
|
||||
Context window usage percentage (0-100) at which auto mode pauses for checkpointing. Set to `0` to disable.
|
||||
|
||||
```yaml
|
||||
context_pause_threshold: 80 # pause at 80% context usage
|
||||
```
|
||||
|
||||
Default: `0` (disabled)
|
||||
|
||||
### `uat_dispatch`
|
||||
|
||||
Enable automatic UAT (User Acceptance Test) runs after slice completion:
|
||||
|
||||
```yaml
|
||||
uat_dispatch: true
|
||||
```
|
||||
|
||||
### Verification (v2.26)
|
||||
|
||||
Configure shell commands that run automatically after every task execution. Failures trigger auto-fix retries before advancing.
|
||||
|
||||
```yaml
|
||||
verification_commands:
|
||||
- npm run lint
|
||||
- npm run test
|
||||
verification_auto_fix: true # auto-retry on failure (default: true)
|
||||
verification_max_retries: 2 # max retry attempts (default: 2)
|
||||
```
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `verification_commands` | string[] | `[]` | Shell commands to run after task execution |
|
||||
| `verification_auto_fix` | boolean | `true` | Auto-retry when verification fails |
|
||||
| `verification_max_retries` | number | `2` | Maximum auto-fix retry attempts |
|
||||
|
||||
### `auto_report` (v2.26)
|
||||
|
||||
Auto-generate HTML reports after milestone completion:
|
||||
|
||||
```yaml
|
||||
auto_report: true # default: true
|
||||
```
|
||||
|
||||
Reports are written to `.gsd/reports/` as self-contained HTML files with embedded CSS/JS.
|
||||
|
||||
### `unique_milestone_ids`
|
||||
|
||||
Generate milestone IDs with a random suffix to avoid collisions in team workflows:
|
||||
|
||||
```yaml
|
||||
unique_milestone_ids: true
|
||||
# Produces: M001-eh88as instead of M001
|
||||
```
|
||||
|
||||
### `git`
|
||||
|
||||
Git behavior configuration. All fields optional:
|
||||
|
||||
```yaml
|
||||
git:
|
||||
auto_push: false # push commits to remote after committing
|
||||
push_branches: false # push milestone branch to remote
|
||||
remote: origin # git remote name
|
||||
snapshots: false # WIP snapshot commits during long tasks
|
||||
pre_merge_check: false # run checks before worktree merge (true/false/"auto")
|
||||
commit_type: feat # override conventional commit prefix
|
||||
main_branch: main # primary branch name
|
||||
merge_strategy: squash # how worktree branches merge: "squash" or "merge"
|
||||
isolation: worktree # git isolation: "worktree", "branch", or "none"
|
||||
commit_docs: true # commit .gsd/ artifacts to git (set false to keep local)
|
||||
manage_gitignore: true # set false to prevent GSD from modifying .gitignore
|
||||
worktree_post_create: .gsd/hooks/post-worktree-create # script to run after worktree creation
|
||||
auto_pr: false # create a PR on milestone completion (requires push_branches)
|
||||
pr_target_branch: develop # target branch for auto-created PRs (default: main branch)
|
||||
```
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `auto_push` | boolean | `false` | Push commits to remote after committing |
|
||||
| `push_branches` | boolean | `false` | Push milestone branch to remote |
|
||||
| `remote` | string | `"origin"` | Git remote name |
|
||||
| `snapshots` | boolean | `false` | WIP snapshot commits during long tasks |
|
||||
| `pre_merge_check` | bool/string | `false` | Run checks before merge (`true`/`false`/`"auto"`) |
|
||||
| `commit_type` | string | (inferred) | Override conventional commit prefix (`feat`, `fix`, `refactor`, `docs`, `test`, `chore`, `perf`, `ci`, `build`, `style`) |
|
||||
| `main_branch` | string | `"main"` | Primary branch name |
|
||||
| `merge_strategy` | string | `"squash"` | How worktree branches merge: `"squash"` (combine all commits) or `"merge"` (preserve individual commits) |
|
||||
| `isolation` | string | `"worktree"` | Auto-mode isolation: `"worktree"` (separate directory), `"branch"` (work in project root — useful for submodule-heavy repos), or `"none"` (no isolation — commits on current branch, no worktree or milestone branch) |
|
||||
| `commit_docs` | boolean | `true` | Commit `.gsd/` planning artifacts to git. Set `false` to keep local-only |
|
||||
| `manage_gitignore` | boolean | `true` | When `false`, GSD will not modify `.gitignore` at all — no baseline patterns, no self-healing. Use if you manage your own `.gitignore` |
|
||||
| `worktree_post_create` | string | (none) | Script to run after worktree creation. Receives `SOURCE_DIR` and `WORKTREE_DIR` env vars |
|
||||
| `auto_pr` | boolean | `false` | Automatically create a pull request when a milestone completes. Requires `auto_push: true` and `gh` CLI installed and authenticated |
|
||||
| `pr_target_branch` | string | (main branch) | Target branch for auto-created PRs (e.g. `develop`, `qa`). Defaults to `main_branch` if not set |
|
||||
|
||||
#### `git.worktree_post_create`
|
||||
|
||||
Script to run after a worktree is created (both auto-mode and manual `/worktree`). Useful for copying `.env` files, symlinking asset directories, or running setup commands that worktrees don't inherit from the main tree.
|
||||
|
||||
```yaml
|
||||
git:
|
||||
worktree_post_create: .gsd/hooks/post-worktree-create
|
||||
```
|
||||
|
||||
The script receives two environment variables:
|
||||
- `SOURCE_DIR` — the original project root
|
||||
- `WORKTREE_DIR` — the newly created worktree path
|
||||
|
||||
Example hook script (`.gsd/hooks/post-worktree-create`):
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# Copy environment files and symlink assets into the new worktree
|
||||
cp "$SOURCE_DIR/.env" "$WORKTREE_DIR/.env"
|
||||
cp "$SOURCE_DIR/.env.local" "$WORKTREE_DIR/.env.local" 2>/dev/null || true
|
||||
ln -sf "$SOURCE_DIR/assets" "$WORKTREE_DIR/assets"
|
||||
```
|
||||
|
||||
The path can be absolute or relative to the project root. The script runs with a 30-second timeout. Failure is non-fatal — GSD logs a warning and continues.
|
||||
|
||||
#### `git.auto_pr`
|
||||
|
||||
Automatically create a pull request when a milestone completes. Designed for teams using Gitflow or branch-based workflows where work should go through PR review before merging to a target branch.
|
||||
|
||||
```yaml
|
||||
git:
|
||||
auto_push: true
|
||||
auto_pr: true
|
||||
pr_target_branch: develop # or qa, staging, etc.
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
- `auto_push: true` — the milestone branch must be pushed before a PR can be created
|
||||
- [`gh` CLI](https://cli.github.com/) installed and authenticated (`gh auth login`)
|
||||
|
||||
**How it works:**
|
||||
1. Milestone completes → GSD squash-merges the worktree to the main branch
|
||||
2. Pushes the main branch to remote (if `auto_push: true`)
|
||||
3. Pushes the milestone branch to remote
|
||||
4. Creates a PR from the milestone branch to `pr_target_branch` via `gh pr create`
|
||||
|
||||
If `pr_target_branch` is not set, the PR targets the `main_branch` (or auto-detected main branch). PR creation failure is non-fatal — GSD logs and continues.
|
||||
|
||||
### `github` (v2.39)
|
||||
|
||||
GitHub sync configuration. When enabled, GSD auto-syncs milestones, slices, and tasks to GitHub Issues, PRs, and Milestones.
|
||||
|
||||
```yaml
|
||||
github:
|
||||
enabled: true
|
||||
repo: "owner/repo" # auto-detected from git remote if omitted
|
||||
labels: [gsd, auto-generated] # labels applied to created issues/PRs
|
||||
project: "Project ID" # optional GitHub Project board
|
||||
```
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `enabled` | boolean | `false` | Enable GitHub sync |
|
||||
| `repo` | string | (auto-detected) | GitHub repository in `owner/repo` format |
|
||||
| `labels` | string[] | `[]` | Labels to apply to created issues and PRs |
|
||||
| `project` | string | (none) | GitHub Project ID for project board integration |
|
||||
|
||||
**Requirements:**
|
||||
- `gh` CLI installed and authenticated (`gh auth login`)
|
||||
- Sync mapping is persisted in `.gsd/.github-sync.json`
|
||||
- Rate-limit aware — skips sync when GitHub API rate limit is low
|
||||
|
||||
**Commands:**
|
||||
- `/github-sync bootstrap` — initial setup and sync
|
||||
- `/github-sync status` — show sync mapping counts
|
||||
|
||||
### `notifications`
|
||||
|
||||
Control what notifications GSD sends during auto mode:
|
||||
|
||||
```yaml
|
||||
notifications:
|
||||
enabled: true
|
||||
on_complete: true # notify on unit completion
|
||||
on_error: true # notify on errors
|
||||
on_budget: true # notify on budget thresholds
|
||||
on_milestone: true # notify when milestone finishes
|
||||
on_attention: true # notify when manual attention needed
|
||||
```
|
||||
|
||||
### `remote_questions`
|
||||
|
||||
Route interactive questions to Slack or Discord for headless auto mode:
|
||||
|
||||
```yaml
|
||||
remote_questions:
|
||||
channel: slack # or discord
|
||||
channel_id: "C1234567890"
|
||||
timeout_minutes: 15 # question timeout (1-30 minutes)
|
||||
poll_interval_seconds: 10 # poll interval (2-30 seconds)
|
||||
```
|
||||
|
||||
### `post_unit_hooks`
|
||||
|
||||
Custom hooks that fire after specific unit types complete:
|
||||
|
||||
```yaml
|
||||
post_unit_hooks:
|
||||
- name: code-review
|
||||
after: [execute-task]
|
||||
prompt: "Review the code changes for quality and security issues."
|
||||
model: claude-opus-4-6 # optional: model override
|
||||
max_cycles: 1 # max fires per trigger (1-10, default: 1)
|
||||
artifact: REVIEW.md # optional: skip if this file exists
|
||||
retry_on: NEEDS-REWORK.md # optional: re-run trigger unit if this file appears
|
||||
agent: review-agent # optional: agent definition to use
|
||||
enabled: true # optional: disable without removing
|
||||
```
|
||||
|
||||
**Known unit types for `after`:** `research-milestone`, `plan-milestone`, `research-slice`, `plan-slice`, `execute-task`, `complete-slice`, `replan-slice`, `reassess-roadmap`, `run-uat`
|
||||
|
||||
**Prompt substitutions:** `{milestoneId}`, `{sliceId}`, `{taskId}` are replaced with current context values.
|
||||
|
||||
### `pre_dispatch_hooks`
|
||||
|
||||
Hooks that intercept units before dispatch. Three actions available:
|
||||
|
||||
**Modify** — prepend/append text to the unit prompt:
|
||||
|
||||
```yaml
|
||||
pre_dispatch_hooks:
|
||||
- name: add-standards
|
||||
before: [execute-task]
|
||||
action: modify
|
||||
prepend: "Follow our coding standards document."
|
||||
append: "Run linting after changes."
|
||||
```
|
||||
|
||||
**Skip** — skip the unit entirely:
|
||||
|
||||
```yaml
|
||||
pre_dispatch_hooks:
|
||||
- name: skip-research
|
||||
before: [research-slice]
|
||||
action: skip
|
||||
skip_if: RESEARCH.md # optional: only skip if this file exists
|
||||
```
|
||||
|
||||
**Replace** — replace the unit prompt entirely:
|
||||
|
||||
```yaml
|
||||
pre_dispatch_hooks:
|
||||
- name: custom-execute
|
||||
before: [execute-task]
|
||||
action: replace
|
||||
prompt: "Execute the task using TDD methodology."
|
||||
unit_type: execute-task-tdd # optional: override unit type label
|
||||
model: claude-opus-4-6 # optional: model override
|
||||
```
|
||||
|
||||
All pre-dispatch hooks support `enabled: true/false` to toggle without removing.
|
||||
|
||||
### `always_use_skills` / `prefer_skills` / `avoid_skills`
|
||||
|
||||
Skill routing preferences:
|
||||
|
||||
```yaml
|
||||
always_use_skills:
|
||||
- debug-like-expert
|
||||
prefer_skills:
|
||||
- frontend-design
|
||||
avoid_skills: []
|
||||
```
|
||||
|
||||
Skills can be bare names (looked up in `~/.agents/skills/` and `.agents/skills/`) or absolute paths.
|
||||
|
||||
### `skill_rules`
|
||||
|
||||
Situational skill routing with human-readable triggers:
|
||||
|
||||
```yaml
|
||||
skill_rules:
|
||||
- when: task involves authentication
|
||||
use: [clerk]
|
||||
- when: frontend styling work
|
||||
prefer: [frontend-design]
|
||||
- when: working with legacy code
|
||||
avoid: [aggressive-refactor]
|
||||
```
|
||||
|
||||
### `custom_instructions`
|
||||
|
||||
Durable instructions appended to every session:
|
||||
|
||||
```yaml
|
||||
custom_instructions:
|
||||
- "Always use TypeScript strict mode"
|
||||
- "Prefer functional patterns over classes"
|
||||
```
|
||||
|
||||
For project-specific knowledge (patterns, gotchas, lessons learned), use `.gsd/KNOWLEDGE.md` instead — it's injected into every agent prompt automatically. Add entries with `/gsd knowledge rule|pattern|lesson <description>`.
|
||||
|
||||
### `RUNTIME.md` — Runtime Context (v2.39)
|
||||
|
||||
Declare project-level runtime context in `.gsd/RUNTIME.md`. This file is inlined into task execution prompts, giving the agent accurate information about your runtime environment without relying on hallucinated paths or URLs.
|
||||
|
||||
**Location:** `.gsd/RUNTIME.md`
|
||||
|
||||
**Example:**
|
||||
|
||||
```markdown
|
||||
# Runtime Context
|
||||
|
||||
## API Endpoints
|
||||
- Main API: https://api.example.com
|
||||
- Cache: redis://localhost:6379
|
||||
|
||||
## Environment Variables
|
||||
- DEPLOYMENT_ENV: staging
|
||||
- DB_POOL_SIZE: 20
|
||||
|
||||
## Local Services
|
||||
- PostgreSQL: localhost:5432
|
||||
- Redis: localhost:6379
|
||||
```
|
||||
|
||||
Use this for information that the agent needs during execution but that doesn't belong in `DECISIONS.md` (architectural) or `KNOWLEDGE.md` (patterns/rules). Common examples: API base URLs, service ports, deployment targets, and environment-specific configuration.
|
||||
|
||||
### `dynamic_routing`
|
||||
|
||||
Complexity-based model routing. See [Dynamic Model Routing](./dynamic-model-routing.md).
|
||||
|
||||
```yaml
|
||||
dynamic_routing:
|
||||
enabled: true
|
||||
tier_models:
|
||||
light: claude-haiku-4-5
|
||||
standard: claude-sonnet-4-6
|
||||
heavy: claude-opus-4-6
|
||||
escalate_on_failure: true
|
||||
budget_pressure: true
|
||||
cross_provider: true
|
||||
```
|
||||
|
||||
### `service_tier` (v2.42)
|
||||
|
||||
OpenAI service tier preference for supported models. Toggle with `/gsd fast`.
|
||||
|
||||
| Value | Behavior |
|
||||
|-------|----------|
|
||||
| `"priority"` | Priority tier — 2x cost, faster responses |
|
||||
| `"flex"` | Flex tier — 0.5x cost, slower responses |
|
||||
| (unset) | Default tier |
|
||||
|
||||
```yaml
|
||||
service_tier: priority
|
||||
```
|
||||
|
||||
### `forensics_dedup` (v2.43)
|
||||
|
||||
Opt-in: search existing issues and PRs before filing from `/gsd forensics`. Uses additional AI tokens.
|
||||
|
||||
```yaml
|
||||
forensics_dedup: true # default: false
|
||||
```
|
||||
|
||||
### `show_token_cost` (v2.44)
|
||||
|
||||
Opt-in: show per-prompt and cumulative session token cost in the footer.
|
||||
|
||||
```yaml
|
||||
show_token_cost: true # default: false
|
||||
```
|
||||
|
||||
### `auto_visualize`
|
||||
|
||||
Show the workflow visualizer automatically after milestone completion:
|
||||
|
||||
```yaml
|
||||
auto_visualize: true
|
||||
```
|
||||
|
||||
See [Workflow Visualizer](./visualizer.md).
|
||||
|
||||
### `parallel`
|
||||
|
||||
Run multiple milestones simultaneously. Disabled by default.
|
||||
|
||||
```yaml
|
||||
parallel:
|
||||
enabled: false # Master toggle
|
||||
max_workers: 2 # Concurrent workers (1-4)
|
||||
budget_ceiling: 50.00 # Aggregate cost limit in USD
|
||||
merge_strategy: "per-milestone" # "per-slice" or "per-milestone"
|
||||
auto_merge: "confirm" # "auto", "confirm", or "manual"
|
||||
```
|
||||
|
||||
See [Parallel Orchestration](./parallel-orchestration.md) for full documentation.
|
||||
|
||||
## Full Example
|
||||
|
||||
```yaml
|
||||
---
|
||||
version: 1
|
||||
|
||||
# Model selection
|
||||
models:
|
||||
research: openrouter/deepseek/deepseek-r1
|
||||
planning:
|
||||
model: claude-opus-4-6
|
||||
fallbacks:
|
||||
- openrouter/z-ai/glm-5
|
||||
execution: claude-sonnet-4-6
|
||||
execution_simple: claude-haiku-4-5-20250414
|
||||
completion: claude-sonnet-4-6
|
||||
|
||||
# Token optimization
|
||||
token_profile: balanced
|
||||
|
||||
# Dynamic model routing
|
||||
dynamic_routing:
|
||||
enabled: true
|
||||
escalate_on_failure: true
|
||||
budget_pressure: true
|
||||
|
||||
# Budget
|
||||
budget_ceiling: 25.00
|
||||
budget_enforcement: pause
|
||||
context_pause_threshold: 80
|
||||
|
||||
# Supervision
|
||||
auto_supervisor:
|
||||
soft_timeout_minutes: 15
|
||||
hard_timeout_minutes: 25
|
||||
|
||||
# Git
|
||||
git:
|
||||
auto_push: true
|
||||
merge_strategy: squash
|
||||
isolation: worktree # "worktree", "branch", or "none"
|
||||
commit_docs: true
|
||||
|
||||
# Skills
|
||||
skill_discovery: suggest
|
||||
skill_staleness_days: 60 # Skills unused for N days get deprioritized (0 = disabled)
|
||||
always_use_skills:
|
||||
- debug-like-expert
|
||||
skill_rules:
|
||||
- when: task involves authentication
|
||||
use: [clerk]
|
||||
|
||||
# Notifications
|
||||
notifications:
|
||||
on_complete: false
|
||||
on_milestone: true
|
||||
on_attention: true
|
||||
|
||||
# Visualizer
|
||||
auto_visualize: true
|
||||
|
||||
# Service tier
|
||||
service_tier: priority # "priority" or "flex" (for /gsd fast)
|
||||
|
||||
# Diagnostics
|
||||
forensics_dedup: true # deduplicate before filing forensics issues
|
||||
show_token_cost: true # show per-prompt cost in footer
|
||||
|
||||
# Hooks
|
||||
post_unit_hooks:
|
||||
- name: code-review
|
||||
after: [execute-task]
|
||||
prompt: "Review {sliceId}/{taskId} for quality and security."
|
||||
artifact: REVIEW.md
|
||||
---
|
||||
```
|
||||
|
|
@ -1,249 +0,0 @@
|
|||
# The Context Pipeline
|
||||
|
||||
The full journey of a user prompt from keypress to LLM input, through every transformation stage. Understanding this pipeline is the foundation of all context engineering in pi.
|
||||
|
||||
---
|
||||
|
||||
## The Pipeline at a Glance
|
||||
|
||||
```
|
||||
User types prompt and hits Enter
|
||||
│
|
||||
├─► Extension command check (/command)
|
||||
│ If match → run handler, skip everything below
|
||||
│
|
||||
├─► input event
|
||||
│ Extensions can: transform text/images, intercept entirely, or pass through
|
||||
│
|
||||
├─► Skill expansion (/skill:name)
|
||||
│ Skill file content injected into prompt text
|
||||
│
|
||||
├─► Prompt template expansion (/template)
|
||||
│ Template file content merged into prompt text
|
||||
│
|
||||
├─► before_agent_start event [ONCE per user prompt]
|
||||
│ Extensions can:
|
||||
│ • Inject custom messages (appended after user message)
|
||||
│ • Modify the system prompt (chained across extensions)
|
||||
│
|
||||
├─► Agent.prompt(messages)
|
||||
│ Messages array: [user message, ...nextTurn messages, ...extension messages]
|
||||
│
|
||||
│ ┌── Turn loop (repeats while LLM calls tools) ──────────────┐
|
||||
│ │ │
|
||||
│ │ transformContext (= context event) [EVERY turn] │
|
||||
│ │ Extensions receive AgentMessage[] deep copy │
|
||||
│ │ Can filter, reorder, inject, or replace messages │
|
||||
│ │ Multiple handlers chain: each sees previous output │
|
||||
│ │ │
|
||||
│ │ convertToLlm [EVERY turn, AFTER context event] │
|
||||
│ │ AgentMessage[] → Message[] │
|
||||
│ │ Custom types mapped to user role │
|
||||
│ │ bashExecution (!! prefix) filtered out │
|
||||
│ │ Not extensible — hardcoded in messages.ts │
|
||||
│ │ │
|
||||
│ │ LLM call │
|
||||
│ │ System prompt + converted messages + tool definitions │
|
||||
│ │ │
|
||||
│ │ Tool execution (if LLM calls tools) │
|
||||
│ │ tool_call event → can block │
|
||||
│ │ execute runs │
|
||||
│ │ tool_result event → can modify result │
|
||||
│ │ Steering check → may skip remaining tools │
|
||||
│ │ │
|
||||
│ │ Follow-up check (if no more tool calls) │
|
||||
│ │ Queued follow-up messages become next turn input │
|
||||
│ │ │
|
||||
│ └────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
└─► agent_end event
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stage-by-Stage Detail
|
||||
|
||||
### Stage 1: Extension Command Check
|
||||
|
||||
The first thing that happens. If the text starts with `/` and matches a registered extension command, the command handler runs and **the prompt never reaches the agent**. No events fire. No LLM call happens.
|
||||
|
||||
This means extension commands are fully synchronous escape hatches — they execute even during streaming (they're checked before any queuing logic).
|
||||
|
||||
### Stage 2: Input Event
|
||||
|
||||
```typescript
|
||||
pi.on("input", async (event, ctx) => {
|
||||
// event.text — the raw user input
|
||||
// event.images — attached images, if any
|
||||
// event.source — "interactive" | "rpc" | "extension"
|
||||
|
||||
// Three possible return values:
|
||||
return { action: "continue" }; // pass through unchanged
|
||||
return { action: "transform", text: "new text" }; // rewrite the input
|
||||
return { action: "handled" }; // swallow entirely
|
||||
});
|
||||
```
|
||||
|
||||
**Chaining:** Multiple `input` handlers chain. If handler A returns `transform`, handler B sees the transformed text. If any handler returns `handled`, the pipeline stops — no LLM call.
|
||||
|
||||
**Timing:** Fires before skill/template expansion. Your handler sees the raw `/skill:name args` text, not the expanded content.
|
||||
|
||||
### Stage 3: Skill and Template Expansion
|
||||
|
||||
Deterministic text substitution. `/skill:name args` becomes the skill file content wrapped in `<skill>` tags. `/template args` becomes the template file content. These are string replacements — no events fire.
|
||||
|
||||
### Stage 4: before_agent_start
|
||||
|
||||
```typescript
|
||||
pi.on("before_agent_start", async (event, ctx) => {
|
||||
// event.prompt — the expanded user prompt text
|
||||
// event.images — attached images
|
||||
// event.systemPrompt — current system prompt (may be modified by earlier extensions)
|
||||
|
||||
return {
|
||||
message: {
|
||||
customType: "my-context",
|
||||
content: "Context the LLM should see",
|
||||
display: false, // UI rendering only — LLM ALWAYS sees it
|
||||
},
|
||||
systemPrompt: event.systemPrompt + "\nExtra instructions",
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Critical facts:**
|
||||
- Fires **once** per user prompt, not per turn
|
||||
- System prompts **chain**: Extension A modifies it, Extension B sees the modified version in `event.systemPrompt`
|
||||
- Messages **accumulate**: All extensions' messages are collected and injected as separate entries
|
||||
- If no extension returns a `systemPrompt`, the base system prompt is restored (previous turn's modifications don't persist)
|
||||
|
||||
**Message injection order in the final array:**
|
||||
```
|
||||
[user message] → [nextTurn messages] → [extension messages from before_agent_start]
|
||||
```
|
||||
|
||||
### Stage 5: The Turn Loop
|
||||
|
||||
This is where the LLM is actually called. The turn loop repeats for each LLM response that includes tool calls.
|
||||
|
||||
#### 5a: transformContext / context event
|
||||
|
||||
The `context` event is wired as the `transformContext` callback on the Agent. It fires on **every turn** within the agent loop.
|
||||
|
||||
```typescript
|
||||
// Inside the agent loop (agent-loop.ts):
|
||||
let messages = context.messages;
|
||||
if (config.transformContext) {
|
||||
messages = await config.transformContext(messages, signal);
|
||||
}
|
||||
const llmMessages = await config.convertToLlm(messages);
|
||||
```
|
||||
|
||||
The `context` event handler in the runner creates a `structuredClone` deep copy:
|
||||
|
||||
```typescript
|
||||
// runner.ts emitContext():
|
||||
let currentMessages = structuredClone(messages);
|
||||
// ...each handler receives and can modify currentMessages
|
||||
```
|
||||
|
||||
**This means:**
|
||||
- You get a deep copy — safe to mutate, splice, filter, or replace
|
||||
- You work at the `AgentMessage[]` level (includes custom types)
|
||||
- Multiple handlers chain: each sees the output of the previous
|
||||
- **You cannot modify the system prompt here** — only `before_agent_start` can do that
|
||||
- The messages include everything: user messages, assistant responses, tool results, custom messages, bash executions, compaction summaries, branch summaries
|
||||
|
||||
#### 5b: convertToLlm
|
||||
|
||||
After `context` event processing, `convertToLlm` maps `AgentMessage[]` to `Message[]`:
|
||||
|
||||
| AgentMessage role | Converted to | Notes |
|
||||
|---|---|---|
|
||||
| `user` | `user` | Pass through |
|
||||
| `assistant` | `assistant` | Pass through |
|
||||
| `toolResult` | `toolResult` | Pass through |
|
||||
| `custom` | `user` | Content preserved, `display` field ignored |
|
||||
| `bashExecution` | `user` | Unless `excludeFromContext` (`!!` prefix) → filtered out |
|
||||
| `compactionSummary` | `user` | Wrapped in `<summary>` tags |
|
||||
| `branchSummary` | `user` | Wrapped in `<summary>` tags |
|
||||
|
||||
**`convertToLlm` is not extensible.** It's a hardcoded function in `messages.ts`. If you need to change how messages appear to the LLM, do it in the `context` event handler before this stage.
|
||||
|
||||
#### 5c: LLM Call
|
||||
|
||||
The converted messages plus system prompt plus tool definitions go to the LLM provider. The system prompt used is whatever was set by `before_agent_start` (or the base prompt if no extension modified it).
|
||||
|
||||
#### 5d: Tool Execution and Interception
|
||||
|
||||
When the LLM responds with tool calls, they execute sequentially:
|
||||
|
||||
```
|
||||
For each tool call:
|
||||
tool_call event → can { block: true, reason: "..." }
|
||||
If blocked → Error("reason") becomes the tool result
|
||||
tool_execution_start event (informational)
|
||||
tool.execute() runs
|
||||
tool_execution_end event (informational)
|
||||
tool_result event → can modify { content, details, isError }
|
||||
|
||||
Steering check → if steering messages queued:
|
||||
Remaining tools get "Skipped due to queued user message"
|
||||
Steering messages become input for next turn
|
||||
```
|
||||
|
||||
### Stage 6: Follow-up and Continuation
|
||||
|
||||
When the LLM finishes and has no more tool calls:
|
||||
1. Check for steering messages → if any, start new turn with them
|
||||
2. Check for follow-up messages → if any, start new turn with them
|
||||
3. If neither → `agent_end` fires, agent goes idle
|
||||
|
||||
---
|
||||
|
||||
## What the LLM Actually Sees
|
||||
|
||||
For any given turn, the LLM receives:
|
||||
|
||||
```
|
||||
System prompt (base + before_agent_start modifications)
|
||||
+
|
||||
Messages (after context event filtering, after convertToLlm mapping)
|
||||
+
|
||||
Tool definitions (active tools with names, descriptions, parameter schemas)
|
||||
```
|
||||
|
||||
The system prompt includes:
|
||||
- Base prompt (tool descriptions, guidelines, pi docs reference, date/time, cwd)
|
||||
- `promptSnippet` overrides from active tools (replaces tool description in "Available tools")
|
||||
- `promptGuidelines` from active tools (appended to "Guidelines" section)
|
||||
- `appendSystemPrompt` from settings/config
|
||||
- Project context files (AGENTS.md, CLAUDE.md from cwd ancestors)
|
||||
- Skills listing (names + descriptions, agent uses `read` to load them)
|
||||
- Any `before_agent_start` modifications
|
||||
|
||||
---
|
||||
|
||||
## Key Timing Distinctions
|
||||
|
||||
| Hook | When | How often | Can modify |
|
||||
|------|------|-----------|-----------|
|
||||
| `input` | Before expansion | Once per user input | Input text |
|
||||
| `before_agent_start` | After expansion, before agent loop | Once per user prompt | System prompt + inject messages |
|
||||
| `context` | Before each LLM call | Every turn in agent loop | Message array |
|
||||
| `tool_call` | Before each tool execution | Per tool call | Block execution |
|
||||
| `tool_result` | After each tool execution | Per tool call | Result content/details |
|
||||
|
||||
---
|
||||
|
||||
## The Deep Copy Question
|
||||
|
||||
When do you get a safe-to-mutate copy vs a reference?
|
||||
|
||||
| Hook | What you receive | Safe to mutate? |
|
||||
|------|-----------------|-----------------|
|
||||
| `context` | `structuredClone` deep copy | Yes |
|
||||
| `before_agent_start` | `event.systemPrompt` is a string (immutable) | Return new string |
|
||||
| `tool_call` | `event.input` is the raw args object | Do not mutate — return `block` |
|
||||
| `tool_result` | `{ ...event }` shallow spread | Return new values, don't mutate |
|
||||
| `input` | `event.text` is a string (immutable) | Return new text via `transform` |
|
||||
|
|
@ -1,465 +0,0 @@
|
|||
# Hook Reference
|
||||
|
||||
Complete behavioral specification of every hook in pi's extension system. Covers timing, chaining semantics, return shapes, and edge cases not in the extending-pi docs.
|
||||
|
||||
---
|
||||
|
||||
## Hook Categories
|
||||
|
||||
1. **Input hooks** — intercept user input before the agent
|
||||
2. **Agent lifecycle hooks** — control the agent loop boundary
|
||||
3. **Per-turn hooks** — fire on every LLM call within an agent run
|
||||
4. **Tool hooks** — intercept individual tool executions
|
||||
5. **Session hooks** — respond to session lifecycle changes
|
||||
6. **Model hooks** — respond to model changes
|
||||
7. **Resource hooks** — provide dynamic resources at startup
|
||||
|
||||
---
|
||||
|
||||
## 1. Input Hooks
|
||||
|
||||
### `input`
|
||||
|
||||
**When:** User submits text (Enter in editor, RPC message, or `pi.sendUserMessage` from an extension with `source: "extension"`).
|
||||
|
||||
**Before:** Skill expansion, template expansion, command check (extension commands are checked before `input` fires, but built-in commands are checked after).
|
||||
|
||||
**Chaining:** Sequential through all extensions. Each handler sees the text output of the previous handler's `transform`. First `handled` stops the chain and the pipeline.
|
||||
|
||||
```typescript
|
||||
pi.on("input", async (event, ctx) => {
|
||||
// event.text: string — current text (possibly transformed by earlier handler)
|
||||
// event.images: ImageContent[] | undefined
|
||||
// event.source: "interactive" | "rpc" | "extension"
|
||||
|
||||
// Option 1: Pass through
|
||||
return { action: "continue" };
|
||||
// or return nothing (undefined) — same as continue
|
||||
|
||||
// Option 2: Transform
|
||||
return { action: "transform", text: "rewritten", images: newImages };
|
||||
|
||||
// Option 3: Swallow (no LLM call, no further handlers)
|
||||
return { action: "handled" };
|
||||
});
|
||||
```
|
||||
|
||||
**Edge cases:**
|
||||
- Extension commands (`/mycommand`) are checked **before** `input` fires. If it matches, `input` never fires.
|
||||
- Built-in commands (`/new`, `/model`, etc.) are checked **after** `input` transforms. So `input` can transform text into a built-in command, or transform a built-in command into something else.
|
||||
- Images can be replaced via `transform`. Omitting `images` in the transform result preserves the original images.
|
||||
|
||||
---
|
||||
|
||||
## 2. Agent Lifecycle Hooks
|
||||
|
||||
### `before_agent_start`
|
||||
|
||||
**When:** After input processing, skill/template expansion, and the user message is constructed — but before `agent.prompt()` is called.
|
||||
|
||||
**Fires:** Once per user prompt. Does NOT fire on subsequent turns within the same agent run.
|
||||
|
||||
**Chaining:**
|
||||
- **System prompt:** Chains. Extension A modifies `event.systemPrompt`, Extension B sees that modified version. If no extension returns a `systemPrompt`, the base prompt is used (resetting any previous turn's modifications).
|
||||
- **Messages:** Accumulate. All `message` results are collected into an array. Each becomes a separate `CustomMessage` with `role: "custom"` injected after the user message.
|
||||
|
||||
```typescript
|
||||
pi.on("before_agent_start", async (event, ctx) => {
|
||||
// event.prompt: string — expanded user prompt text
|
||||
// event.images: ImageContent[] | undefined
|
||||
// event.systemPrompt: string — current system prompt (may be chained from earlier extension)
|
||||
|
||||
return {
|
||||
// Optional: inject a custom message into the session
|
||||
message: {
|
||||
customType: "my-extension", // identifies the message type
|
||||
content: "Text the LLM sees", // string or (TextContent | ImageContent)[]
|
||||
display: true, // controls UI rendering, NOT LLM visibility
|
||||
details: { any: "data" }, // for custom rendering and state reconstruction
|
||||
},
|
||||
|
||||
// Optional: modify the system prompt for this agent run
|
||||
systemPrompt: event.systemPrompt + "\nNew instructions",
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Critical detail:** The `display` field controls whether the message shows in the TUI chat log. The LLM **always** sees the message content regardless of `display`. All custom messages become `user` role messages in `convertToLlm`.
|
||||
|
||||
**Error handling:** If a handler throws, the error is captured and reported via `emitError`. Other handlers still run. The pipeline is not stopped.
|
||||
|
||||
### `agent_start`
|
||||
|
||||
**When:** The agent loop begins (after `before_agent_start`, after `agent.prompt()` is called).
|
||||
|
||||
**Fires:** Once per agent run. Informational only — no return value.
|
||||
|
||||
```typescript
|
||||
pi.on("agent_start", async (event, ctx) => {
|
||||
// event: { type: "agent_start" }
|
||||
// Useful for: starting timers, resetting per-run state
|
||||
});
|
||||
```
|
||||
|
||||
### `agent_end`
|
||||
|
||||
**When:** The agent loop finishes (all turns complete, no more tool calls, no queued messages).
|
||||
|
||||
**Fires:** Once per agent run.
|
||||
|
||||
```typescript
|
||||
pi.on("agent_end", async (event, ctx) => {
|
||||
// event.messages: AgentMessage[] — all messages produced during this run
|
||||
// Useful for: final summaries, state persistence, triggering follow-up actions
|
||||
});
|
||||
```
|
||||
|
||||
**Subtlety:** `event.messages` contains only the NEW messages from this agent run, not the full conversation history. Use `ctx.sessionManager.getBranch()` for the full history.
|
||||
|
||||
---
|
||||
|
||||
## 3. Per-Turn Hooks
|
||||
|
||||
### `turn_start`
|
||||
|
||||
**When:** Each turn within the agent loop begins (before the LLM call).
|
||||
|
||||
```typescript
|
||||
pi.on("turn_start", async (event, ctx) => {
|
||||
// event.turnIndex: number — 0-based index of this turn within the agent run
|
||||
// event.timestamp: number — when the turn started
|
||||
});
|
||||
```
|
||||
|
||||
### `context`
|
||||
|
||||
**When:** Before each LLM call, after the turn starts. This is the last chance to modify what the LLM sees.
|
||||
|
||||
**Fires:** Every turn. If the LLM calls 3 tools and loops back, `context` fires 4 times (once for initial call + once per loop-back).
|
||||
|
||||
**Chaining:** Sequential. Each handler receives the output of the previous. First handler gets a `structuredClone` deep copy of the agent's message array.
|
||||
|
||||
```typescript
|
||||
pi.on("context", async (event, ctx) => {
|
||||
// event.messages: AgentMessage[] — deep copy, safe to mutate
|
||||
|
||||
// Filter out messages
|
||||
const filtered = event.messages.filter(m => !isIrrelevant(m));
|
||||
return { messages: filtered };
|
||||
|
||||
// Or inject messages
|
||||
return { messages: [...event.messages, syntheticMessage] };
|
||||
|
||||
// Or return nothing to pass through unchanged
|
||||
});
|
||||
```
|
||||
|
||||
**What `event.messages` contains:**
|
||||
- All roles: `user`, `assistant`, `toolResult`, `custom`, `bashExecution`, `compactionSummary`, `branchSummary`
|
||||
- The user message from the current prompt
|
||||
- Custom messages injected by `before_agent_start`
|
||||
- Tool results from earlier turns in this agent run
|
||||
- Steering/follow-up messages that became turn inputs
|
||||
- Historical messages from the session (including compaction summaries)
|
||||
|
||||
**What it does NOT contain:**
|
||||
- The system prompt (use `before_agent_start` for that)
|
||||
- Tool definitions (use `pi.setActiveTools()` for that)
|
||||
|
||||
### `turn_end`
|
||||
|
||||
**When:** After the LLM responds and all tool calls for this turn complete.
|
||||
|
||||
```typescript
|
||||
pi.on("turn_end", async (event, ctx) => {
|
||||
// event.turnIndex: number
|
||||
// event.message: AgentMessage — the assistant's response message
|
||||
// event.toolResults: ToolResultMessage[] — results from tools called this turn
|
||||
});
|
||||
```
|
||||
|
||||
### `message_start` / `message_update` / `message_end`
|
||||
|
||||
**When:** Message lifecycle events. `update` only fires for assistant messages during streaming (token-by-token).
|
||||
|
||||
```typescript
|
||||
pi.on("message_start", async (event, ctx) => {
|
||||
// event.message: AgentMessage — user, assistant, toolResult, or custom
|
||||
});
|
||||
|
||||
pi.on("message_update", async (event, ctx) => {
|
||||
// event.message: AgentMessage — partial assistant message (streaming)
|
||||
// event.assistantMessageEvent: AssistantMessageEvent — the specific token event
|
||||
});
|
||||
|
||||
pi.on("message_end", async (event, ctx) => {
|
||||
// event.message: AgentMessage — final message
|
||||
// Messages are persisted to the session file at this point
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Tool Hooks
|
||||
|
||||
### `tool_call`
|
||||
|
||||
**When:** After the LLM requests a tool call, before it executes.
|
||||
|
||||
**Chaining:** Sequential. If any handler returns `{ block: true }`, execution stops immediately. The block reason becomes an Error that is caught and returned as the tool result with `isError: true`.
|
||||
|
||||
```typescript
|
||||
pi.on("tool_call", async (event, ctx) => {
|
||||
// event.toolCallId: string
|
||||
// event.toolName: string
|
||||
// event.input: typed based on tool (use isToolCallEventType for narrowing)
|
||||
|
||||
// Block execution
|
||||
return { block: true, reason: "Not allowed in read-only mode" };
|
||||
|
||||
// Allow execution (return nothing or undefined)
|
||||
});
|
||||
```
|
||||
|
||||
**Type narrowing:**
|
||||
```typescript
|
||||
import { isToolCallEventType } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
pi.on("tool_call", async (event, ctx) => {
|
||||
if (isToolCallEventType("bash", event)) {
|
||||
event.input.command; // string — typed!
|
||||
}
|
||||
if (isToolCallEventType("write", event)) {
|
||||
event.input.path; // string
|
||||
event.input.content; // string
|
||||
}
|
||||
// Custom tools need explicit type params:
|
||||
if (isToolCallEventType<"my_tool", { action: string }>("my_tool", event)) {
|
||||
event.input.action; // string
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### `tool_execution_start` / `tool_execution_update` / `tool_execution_end`
|
||||
|
||||
Informational events during tool execution. No return values.
|
||||
|
||||
```typescript
|
||||
pi.on("tool_execution_start", async (event) => {
|
||||
// event.toolCallId, event.toolName, event.args
|
||||
});
|
||||
|
||||
pi.on("tool_execution_update", async (event) => {
|
||||
// event.partialResult — streaming progress from onUpdate callback
|
||||
});
|
||||
|
||||
pi.on("tool_execution_end", async (event) => {
|
||||
// event.result, event.isError
|
||||
});
|
||||
```
|
||||
|
||||
### `tool_result`
|
||||
|
||||
**When:** After a tool finishes executing, before the result is returned to the agent loop.
|
||||
|
||||
**Chaining:** Sequential. Each handler can modify the result. Modifications accumulate across handlers. All handlers see the evolving `currentEvent` with content/details/isError updated by previous handlers.
|
||||
|
||||
```typescript
|
||||
pi.on("tool_result", async (event, ctx) => {
|
||||
// event.toolCallId: string
|
||||
// event.toolName: string
|
||||
// event.input: Record<string, unknown>
|
||||
// event.content: (TextContent | ImageContent)[]
|
||||
// event.details: unknown
|
||||
// event.isError: boolean
|
||||
|
||||
// Modify the result
|
||||
return {
|
||||
content: [...event.content, { type: "text", text: "\n\nAudit: logged" }],
|
||||
isError: false, // can flip error state
|
||||
};
|
||||
|
||||
// Return nothing to pass through unchanged
|
||||
});
|
||||
```
|
||||
|
||||
**Also fires for errors:** If tool execution throws, `tool_result` still fires with `isError: true` and the error message as content. Extensions can modify even error results.
|
||||
|
||||
---
|
||||
|
||||
## 5. Session Hooks
|
||||
|
||||
### `session_start`
|
||||
|
||||
**When:** Initial session load (startup) and after session switch/fork. Also fires after `/reload`.
|
||||
|
||||
**Use for:** State restoration from session entries, initial setup.
|
||||
|
||||
```typescript
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
// Restore state from session
|
||||
for (const entry of ctx.sessionManager.getBranch()) {
|
||||
if (entry.type === "custom" && entry.customType === "my-state") {
|
||||
myState = entry.data;
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
### `session_before_switch` / `session_switch`
|
||||
|
||||
**When:** Before/after `/new` or `/resume`.
|
||||
|
||||
```typescript
|
||||
pi.on("session_before_switch", async (event) => {
|
||||
// event.reason: "new" | "resume"
|
||||
// event.targetSessionFile?: string (only for resume)
|
||||
return { cancel: true }; // prevent the switch
|
||||
});
|
||||
```
|
||||
|
||||
### `session_before_fork` / `session_fork`
|
||||
|
||||
**When:** Before/after `/fork`.
|
||||
|
||||
```typescript
|
||||
pi.on("session_before_fork", async (event) => {
|
||||
// event.entryId: string — the entry being forked from
|
||||
return { cancel: true };
|
||||
// or
|
||||
return { skipConversationRestore: true }; // fork without restoring messages
|
||||
});
|
||||
```
|
||||
|
||||
### `session_before_compact` / `session_compact`
|
||||
|
||||
**When:** Before/after compaction (manual or auto).
|
||||
|
||||
```typescript
|
||||
pi.on("session_before_compact", async (event) => {
|
||||
// event.preparation: CompactionPreparation
|
||||
// event.branchEntries: SessionEntry[]
|
||||
// event.customInstructions?: string
|
||||
// event.signal: AbortSignal
|
||||
|
||||
return { cancel: true };
|
||||
// or provide custom compaction:
|
||||
return {
|
||||
compaction: {
|
||||
summary: "My custom summary",
|
||||
firstKeptEntryId: event.preparation.firstKeptEntryId,
|
||||
tokensBefore: event.preparation.tokensBefore,
|
||||
}
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### `session_before_tree` / `session_tree`
|
||||
|
||||
**When:** Before/after `/tree` navigation.
|
||||
|
||||
```typescript
|
||||
pi.on("session_before_tree", async (event) => {
|
||||
// event.preparation: TreePreparation
|
||||
// event.signal: AbortSignal
|
||||
|
||||
return { cancel: true };
|
||||
// or provide custom summary:
|
||||
return {
|
||||
summary: { summary: "Custom branch summary" },
|
||||
label: "my-label",
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### `session_shutdown`
|
||||
|
||||
**When:** Process exit (Ctrl+C, Ctrl+D, SIGTERM, `ctx.shutdown()`).
|
||||
|
||||
```typescript
|
||||
pi.on("session_shutdown", async (_event, ctx) => {
|
||||
// Last chance to persist state
|
||||
// Keep it fast — process is exiting
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Model Hooks
|
||||
|
||||
### `model_select`
|
||||
|
||||
**When:** Model changes via `/model`, Ctrl+P cycling, or session restore.
|
||||
|
||||
```typescript
|
||||
pi.on("model_select", async (event, ctx) => {
|
||||
// event.model: Model — the new model
|
||||
// event.previousModel: Model | undefined
|
||||
// event.source: "set" | "cycle" | "restore"
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Resource Hooks
|
||||
|
||||
### `resources_discover`
|
||||
|
||||
**When:** At startup and after `/reload`. Lets extensions provide additional skill, prompt template, and theme paths.
|
||||
|
||||
**Not documented in extending-pi docs.** This is how extensions ship their own resources.
|
||||
|
||||
```typescript
|
||||
pi.on("resources_discover", async (event, ctx) => {
|
||||
// event.cwd: string
|
||||
// event.reason: "startup" | "reload"
|
||||
|
||||
return {
|
||||
skillPaths: [join(__dirname, "skills", "SKILL.md")],
|
||||
promptPaths: [join(__dirname, "prompts", "my-template.md")],
|
||||
themePaths: [join(__dirname, "themes", "dark.json")],
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Behavior:** Returned paths are loaded by the resource loader and integrated into the system prompt (skills) and available commands (prompts/themes). The system prompt is rebuilt after resources are extended.
|
||||
|
||||
---
|
||||
|
||||
## 8. User Bash Hooks
|
||||
|
||||
### `user_bash`
|
||||
|
||||
**When:** User executes a command via `!` or `!!` prefix in the editor.
|
||||
|
||||
```typescript
|
||||
pi.on("user_bash", async (event, ctx) => {
|
||||
// event.command: string
|
||||
// event.excludeFromContext: boolean (true if !! prefix)
|
||||
// event.cwd: string
|
||||
|
||||
// Provide custom execution (e.g., SSH)
|
||||
return {
|
||||
operations: { execute: (cmd) => sshExec(remote, cmd) },
|
||||
};
|
||||
|
||||
// Or provide a full replacement result
|
||||
return {
|
||||
result: { output: "custom output", exitCode: 0, cancelled: false, truncated: false },
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Execution Order Across Extensions
|
||||
|
||||
All hooks iterate through extensions in **load order** (project-local first, then global, then explicitly configured via `-e`). Within each extension, handlers for the same event run in registration order.
|
||||
|
||||
For hooks that chain (e.g., `context`, `before_agent_start.systemPrompt`, `input`, `tool_result`):
|
||||
- Extension A's handler runs first, Extension B sees A's output
|
||||
- Load order determines priority
|
||||
|
||||
For hooks that short-circuit (e.g., `tool_call` with `block`, `input` with `handled`, session `cancel`):
|
||||
- First extension to return the short-circuit value wins
|
||||
- Remaining handlers are skipped
|
||||
|
|
@ -1,413 +0,0 @@
|
|||
# Context Injection Patterns
|
||||
|
||||
Practical recipes for injecting, filtering, transforming, and managing context through pi's hook system. Each pattern includes when to use it, which hook to use, and the exact implementation.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1: Per-Prompt System Prompt Modification
|
||||
|
||||
**Use when:** You want to change the LLM's behavior for the entire agent run based on some condition.
|
||||
|
||||
**Hook:** `before_agent_start`
|
||||
|
||||
```typescript
|
||||
let debugMode = false;
|
||||
|
||||
pi.registerCommand("debug", {
|
||||
handler: async (_args, ctx) => {
|
||||
debugMode = !debugMode;
|
||||
ctx.ui.notify(debugMode ? "Debug mode ON" : "Debug mode OFF");
|
||||
},
|
||||
});
|
||||
|
||||
pi.on("before_agent_start", async (event) => {
|
||||
if (debugMode) {
|
||||
return {
|
||||
systemPrompt: event.systemPrompt + `
|
||||
|
||||
## Debug Mode
|
||||
- Show your reasoning for each decision
|
||||
- Before executing any tool, explain what you expect to happen
|
||||
- After each tool result, explain what you learned
|
||||
- If something unexpected happens, stop and explain before continuing`,
|
||||
};
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Why `before_agent_start` and not `context`:** The system prompt is separate from the message array. `context` can only modify messages, not the system prompt.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2: Invisible Context Injection
|
||||
|
||||
**Use when:** You need the LLM to know something without the user seeing it in the chat.
|
||||
|
||||
**Hook:** `before_agent_start` with `display: false`
|
||||
|
||||
```typescript
|
||||
pi.on("before_agent_start", async (event, ctx) => {
|
||||
const gitBranch = await getBranch();
|
||||
const recentCommits = await getRecentCommits(5);
|
||||
|
||||
return {
|
||||
message: {
|
||||
customType: "git-context",
|
||||
content: `[Git Context] Branch: ${gitBranch}\nRecent commits:\n${recentCommits}`,
|
||||
display: false, // User doesn't see this in chat
|
||||
// But the LLM DOES see it — display only controls UI rendering
|
||||
},
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Important:** `display: false` hides from UI only. The LLM always receives custom messages as `user` role content. There is no way to inject LLM-invisible metadata through `sendMessage` or `before_agent_start`.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3: Conditional Context Filtering
|
||||
|
||||
**Use when:** Some messages in the history are no longer relevant and waste context tokens.
|
||||
|
||||
**Hook:** `context`
|
||||
|
||||
```typescript
|
||||
pi.on("context", async (event) => {
|
||||
return {
|
||||
messages: event.messages.filter(m => {
|
||||
// Remove custom messages from a previous mode
|
||||
if (m.role === "custom" && m.customType === "plan-mode-context") {
|
||||
return currentMode === "plan"; // only keep if still in plan mode
|
||||
}
|
||||
|
||||
// Remove old bash executions beyond the last 10
|
||||
if (m.role === "bashExecution") {
|
||||
return bashCount++ >= totalBash - 10;
|
||||
}
|
||||
|
||||
return true;
|
||||
}),
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Why `context` and not `before_agent_start`:** `context` fires every turn and can see the full message array including tool results from earlier turns. `before_agent_start` fires once and can only inject — it can't filter existing messages.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4: Dynamic Context Injection Per Turn
|
||||
|
||||
**Use when:** You want to add context that changes between turns (e.g., current file state, running process output).
|
||||
|
||||
**Hook:** `context`
|
||||
|
||||
```typescript
|
||||
pi.on("context", async (event, ctx) => {
|
||||
// Inject a synthetic message at the end of the conversation
|
||||
const liveStatus = await getProcessStatus();
|
||||
|
||||
const contextMessage = {
|
||||
role: "user" as const,
|
||||
content: [{ type: "text" as const, text: `[Live Status] ${liveStatus}` }],
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
|
||||
return {
|
||||
messages: [...event.messages, contextMessage],
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
**Caution:** Messages injected in `context` are NOT persisted to the session. They exist only for the LLM call. Next turn, you'll need to inject again. This is actually useful — it means the context is always fresh.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 5: Deferred Context (Next Turn)
|
||||
|
||||
**Use when:** You want to attach context to the user's next prompt without interrupting the current conversation.
|
||||
|
||||
**Mechanism:** `pi.sendMessage` with `deliverAs: "nextTurn"`
|
||||
|
||||
```typescript
|
||||
// Queue context for the next user prompt
|
||||
pi.sendMessage(
|
||||
{
|
||||
customType: "deferred-context",
|
||||
content: "The test suite passed with 47/47 tests",
|
||||
display: false,
|
||||
},
|
||||
{ deliverAs: "nextTurn" }
|
||||
);
|
||||
```
|
||||
|
||||
**How it works internally:** The message is stored in `_pendingNextTurnMessages` and injected into the `messages` array when the next `agent.prompt()` is called, after the user message. Unlike `context` hook injection, these messages ARE persisted to the session.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 6: Context Window Management
|
||||
|
||||
**Use when:** You're approaching the context limit and need to intelligently prune.
|
||||
|
||||
**Hook:** `context`
|
||||
|
||||
```typescript
|
||||
pi.on("context", async (event, ctx) => {
|
||||
const usage = ctx.getContextUsage();
|
||||
if (!usage || usage.percent === null || usage.percent < 70) {
|
||||
return; // plenty of room
|
||||
}
|
||||
|
||||
// Aggressive pruning: remove tool results beyond the last 20
|
||||
let toolResultCount = 0;
|
||||
const total = event.messages.filter(m => m.role === "toolResult").length;
|
||||
|
||||
return {
|
||||
messages: event.messages.filter(m => {
|
||||
if (m.role === "toolResult") {
|
||||
toolResultCount++;
|
||||
// Keep last 20 tool results
|
||||
return toolResultCount > total - 20;
|
||||
}
|
||||
return true;
|
||||
}),
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 7: Steering with Context
|
||||
|
||||
**Use when:** You want to redirect the agent mid-run with additional context.
|
||||
|
||||
**Mechanism:** `pi.sendMessage` with `deliverAs: "steer"`
|
||||
|
||||
```typescript
|
||||
// During an agent run, inject a steering message
|
||||
pi.sendMessage(
|
||||
{
|
||||
customType: "user-feedback",
|
||||
content: "IMPORTANT: The user just updated the config file. Re-read config.json before continuing.",
|
||||
display: true,
|
||||
},
|
||||
{ deliverAs: "steer" }
|
||||
);
|
||||
```
|
||||
|
||||
**What happens:** The current tool call finishes, remaining queued tool calls are skipped (they get error results saying "Skipped due to queued user message"), and the steering message becomes input for the next turn.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 8: Follow-Up Context After Completion
|
||||
|
||||
**Use when:** You want to trigger another LLM turn after the agent finishes, with additional context.
|
||||
|
||||
**Mechanism:** `pi.sendMessage` with `deliverAs: "followUp"`
|
||||
|
||||
```typescript
|
||||
pi.on("agent_end", async (event, ctx) => {
|
||||
// Check if the agent made changes that need verification
|
||||
const hasEdits = event.messages.some(m =>
|
||||
m.role === "toolResult" && m.toolName === "edit"
|
||||
);
|
||||
|
||||
if (hasEdits) {
|
||||
pi.sendMessage(
|
||||
{
|
||||
customType: "auto-verify",
|
||||
content: "You just made edits. Please verify them by running the test suite.",
|
||||
display: true,
|
||||
},
|
||||
{ deliverAs: "followUp", triggerTurn: true }
|
||||
);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 9: Tool-Scoped Context via promptGuidelines
|
||||
|
||||
**Use when:** You want context that only appears when specific tools are active.
|
||||
|
||||
**Mechanism:** `promptGuidelines` on tool registration
|
||||
|
||||
```typescript
|
||||
pi.registerTool({
|
||||
name: "deploy",
|
||||
label: "Deploy",
|
||||
description: "Deploy the application",
|
||||
promptSnippet: "Deploy the application to staging or production",
|
||||
promptGuidelines: [
|
||||
"Always run tests before deploying",
|
||||
"Never deploy to production without explicit user confirmation",
|
||||
"After deploying, verify the health check endpoint",
|
||||
],
|
||||
parameters: Type.Object({ /* ... */ }),
|
||||
async execute(toolCallId, params, signal, onUpdate, ctx) { /* ... */ },
|
||||
});
|
||||
```
|
||||
|
||||
**Behavior:** The `promptGuidelines` are added to the "Guidelines" section of the system prompt ONLY when the `deploy` tool is in the active tool set. If the tool is disabled via `pi.setActiveTools(...)`, the guidelines disappear.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 10: Persistent State as Context
|
||||
|
||||
**Use when:** You need state that survives session resume AND is visible to the LLM.
|
||||
|
||||
**Mechanism:** Tool result `details` + `session_start` reconstruction + `before_agent_start` injection
|
||||
|
||||
```typescript
|
||||
let projectFacts: string[] = [];
|
||||
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
// Reconstruct from session
|
||||
projectFacts = [];
|
||||
for (const entry of ctx.sessionManager.getBranch()) {
|
||||
if (entry.type === "message" && entry.message.role === "toolResult") {
|
||||
if (entry.message.toolName === "learn_fact") {
|
||||
projectFacts = entry.message.details?.facts ?? [];
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
pi.registerTool({
|
||||
name: "learn_fact",
|
||||
label: "Learn Fact",
|
||||
description: "Record a fact about the project",
|
||||
parameters: Type.Object({ fact: Type.String() }),
|
||||
async execute(toolCallId, params) {
|
||||
projectFacts.push(params.fact);
|
||||
return {
|
||||
content: [{ type: "text", text: `Learned: ${params.fact}` }],
|
||||
details: { facts: [...projectFacts] }, // snapshot in details for branching
|
||||
};
|
||||
},
|
||||
});
|
||||
|
||||
pi.on("before_agent_start", async (event) => {
|
||||
if (projectFacts.length > 0) {
|
||||
return {
|
||||
message: {
|
||||
customType: "project-facts",
|
||||
content: `Known project facts:\n${projectFacts.map(f => `- ${f}`).join("\n")}`,
|
||||
display: false,
|
||||
},
|
||||
};
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Why this works for branching:** State lives in tool result `details`, so when the user forks from an earlier point, `session_start` reconstructs from `getBranch()` (the current path), not the full history. Old branches' facts don't leak into new branches.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 11: Input Preprocessing / Macros
|
||||
|
||||
**Use when:** You want custom syntax that expands before the LLM sees it.
|
||||
|
||||
**Hook:** `input`
|
||||
|
||||
```typescript
|
||||
pi.on("input", async (event) => {
|
||||
// Expand @file references to file contents
|
||||
const expanded = event.text.replace(/@(\S+)/g, (match, filePath) => {
|
||||
try {
|
||||
const content = readFileSync(filePath, "utf-8");
|
||||
return `\`\`\`${filePath}\n${content}\n\`\`\``;
|
||||
} catch {
|
||||
return match; // leave unchanged if can't read
|
||||
}
|
||||
});
|
||||
|
||||
if (expanded !== event.text) {
|
||||
return { action: "transform", text: expanded };
|
||||
}
|
||||
return { action: "continue" };
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 12: Context-Aware Tool Blocking
|
||||
|
||||
**Use when:** You want to prevent certain tool usage based on conversation context.
|
||||
|
||||
**Hook:** `tool_call` with `context` awareness
|
||||
|
||||
```typescript
|
||||
let inPlanMode = false;
|
||||
|
||||
pi.on("tool_call", async (event, ctx) => {
|
||||
if (!inPlanMode) return;
|
||||
|
||||
const destructiveTools = ["edit", "write", "bash"];
|
||||
|
||||
if (event.toolName === "bash" && isToolCallEventType("bash", event)) {
|
||||
// Allow read-only bash commands
|
||||
if (isSafeCommand(event.input.command)) return;
|
||||
}
|
||||
|
||||
if (destructiveTools.includes(event.toolName)) {
|
||||
return {
|
||||
block: true,
|
||||
reason: `Plan mode active: ${event.toolName} is not allowed. Use /plan to exit plan mode.`,
|
||||
};
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
### ❌ Don't: Modify system prompt in `context`
|
||||
|
||||
```typescript
|
||||
// WRONG — context event can only modify messages, not the system prompt
|
||||
pi.on("context", async (event, ctx) => {
|
||||
// This does nothing to the system prompt
|
||||
return { systemPrompt: "new prompt" }; // ← not a valid return field
|
||||
});
|
||||
```
|
||||
|
||||
### ❌ Don't: Rely on `display: false` for security
|
||||
|
||||
```typescript
|
||||
// WRONG — display: false only hides from UI, LLM still sees it
|
||||
pi.on("before_agent_start", async () => ({
|
||||
message: {
|
||||
customType: "secret",
|
||||
content: "API_KEY=sk-1234", // LLM receives this as a user message!
|
||||
display: false,
|
||||
},
|
||||
}));
|
||||
```
|
||||
|
||||
### ❌ Don't: Use `context` for one-time injection
|
||||
|
||||
```typescript
|
||||
// WRONG — context fires every turn, so this injects repeatedly
|
||||
let injected = false;
|
||||
pi.on("context", async (event) => {
|
||||
if (!injected) {
|
||||
injected = true;
|
||||
return { messages: [...event.messages, myMessage] };
|
||||
}
|
||||
});
|
||||
// Problem: after compaction or session restore, injected resets to false
|
||||
```
|
||||
|
||||
Use `before_agent_start` with `message` for one-time per-prompt injection instead.
|
||||
|
||||
### ❌ Don't: Use `getEntries()` for branch-aware state
|
||||
|
||||
```typescript
|
||||
// WRONG — getEntries() returns ALL entries including dead branches
|
||||
for (const entry of ctx.sessionManager.getEntries()) { /* ... */ }
|
||||
|
||||
// CORRECT — getBranch() returns only entries on the current branch path
|
||||
for (const entry of ctx.sessionManager.getBranch()) { /* ... */ }
|
||||
```
|
||||
|
|
@ -1,209 +0,0 @@
|
|||
# Message Types and LLM Visibility
|
||||
|
||||
Every message in pi has an `AgentMessage` type. These messages go through `convertToLlm` before the LLM sees them. This document specifies exactly what the LLM receives for each message type and what it never sees.
|
||||
|
||||
---
|
||||
|
||||
## The AgentMessage Type Hierarchy
|
||||
|
||||
Pi uses `AgentMessage` as its internal message type, which is a union of standard LLM messages and custom application messages:
|
||||
|
||||
```typescript
|
||||
// Standard LLM messages
|
||||
type Message = UserMessage | AssistantMessage | ToolResultMessage;
|
||||
|
||||
// Custom messages added by pi's coding agent
|
||||
interface CustomAgentMessages {
|
||||
bashExecution: BashExecutionMessage;
|
||||
custom: CustomMessage;
|
||||
branchSummary: BranchSummaryMessage;
|
||||
compactionSummary: CompactionSummaryMessage;
|
||||
}
|
||||
|
||||
// The union
|
||||
type AgentMessage = Message | CustomAgentMessages[keyof CustomAgentMessages];
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Message Type → LLM Conversion Table
|
||||
|
||||
| AgentMessage type | `role` seen by LLM | Content transformation | When excluded |
|
||||
|---|---|---|---|
|
||||
| `user` | `user` | Pass through unchanged | Never |
|
||||
| `assistant` | `assistant` | Pass through unchanged | Never |
|
||||
| `toolResult` | `toolResult` | Pass through unchanged | Never |
|
||||
| `custom` | `user` | `content` preserved as-is (string → `[{type:"text",text}]`) | Never — ALL custom messages reach the LLM |
|
||||
| `bashExecution` | `user` | Formatted: `` Ran `cmd`\n```\noutput\n``` `` | When `excludeFromContext: true` (`!!` prefix) |
|
||||
| `compactionSummary` | `user` | Wrapped: `The conversation history before this point was compacted into the following summary:\n<summary>\n...\n</summary>` | Never |
|
||||
| `branchSummary` | `user` | Wrapped: `The following is a summary of a branch that this conversation came back from:\n<summary>\n...\n</summary>` | Never |
|
||||
|
||||
---
|
||||
|
||||
## Custom Messages In Detail
|
||||
|
||||
Custom messages are created by:
|
||||
1. `pi.sendMessage()` — extension-injected messages
|
||||
2. `before_agent_start` returning a `message` — per-prompt context injection
|
||||
|
||||
### The `display` Field Misconception
|
||||
|
||||
```typescript
|
||||
pi.sendMessage({
|
||||
customType: "my-context",
|
||||
content: "This text goes to the LLM",
|
||||
display: false, // ← ONLY controls UI rendering
|
||||
});
|
||||
```
|
||||
|
||||
**What `display` controls:**
|
||||
- `true`: Message appears in the TUI chat log (rendered via `registerMessageRenderer` if one exists, or default rendering)
|
||||
- `false`: Message is hidden from the TUI chat log
|
||||
|
||||
**What `display` does NOT control:**
|
||||
- LLM visibility — the LLM ALWAYS receives the content as a `user` role message
|
||||
- Session persistence — the message is ALWAYS persisted to the session file
|
||||
|
||||
### How Custom Messages Become User Messages
|
||||
|
||||
In `convertToLlm` (messages.ts):
|
||||
|
||||
```typescript
|
||||
case "custom": {
|
||||
const content = typeof m.content === "string"
|
||||
? [{ type: "text", text: m.content }]
|
||||
: m.content;
|
||||
return {
|
||||
role: "user",
|
||||
content,
|
||||
timestamp: m.timestamp,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
The `customType`, `display`, and `details` fields are all stripped. The LLM sees a plain user message with the content.
|
||||
|
||||
---
|
||||
|
||||
## Bash Execution Messages
|
||||
|
||||
Created when the user runs commands via `!` or `!!` prefix.
|
||||
|
||||
### `!` (included in context)
|
||||
|
||||
```typescript
|
||||
// User types: !ls -la
|
||||
// LLM sees:
|
||||
{
|
||||
role: "user",
|
||||
content: [{ type: "text", text: "Ran `ls -la`\n```\n<output>\n```" }]
|
||||
}
|
||||
```
|
||||
|
||||
With exit code, cancellation, and truncation info appended as needed:
|
||||
- Non-zero exit: `\n\nCommand exited with code N`
|
||||
- Cancelled: `\n\n(command cancelled)`
|
||||
- Truncated: `\n\n[Output truncated. Full output: /path/to/file]`
|
||||
|
||||
### `!!` (excluded from context)
|
||||
|
||||
```typescript
|
||||
// User types: !!echo secret
|
||||
// LLM sees: NOTHING — filtered out by convertToLlm
|
||||
```
|
||||
|
||||
The `excludeFromContext` flag on `BashExecutionMessage` causes `convertToLlm` to return `undefined` for this message, effectively removing it.
|
||||
|
||||
---
|
||||
|
||||
## Compaction and Branch Summary Messages
|
||||
|
||||
These are synthetic messages created by pi's session management.
|
||||
|
||||
### Compaction Summary
|
||||
|
||||
When the context is compacted, older messages are replaced with a summary:
|
||||
|
||||
```typescript
|
||||
// LLM sees:
|
||||
{
|
||||
role: "user",
|
||||
content: [{
|
||||
type: "text",
|
||||
text: "The conversation history before this point was compacted into the following summary:\n\n<summary>\n[LLM-generated summary of the compacted conversation]\n</summary>"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
### Branch Summary
|
||||
|
||||
When navigating away from a branch and back, the abandoned branch gets summarized:
|
||||
|
||||
```typescript
|
||||
// LLM sees:
|
||||
{
|
||||
role: "user",
|
||||
content: [{
|
||||
type: "text",
|
||||
text: "The following is a summary of a branch that this conversation came back from:\n\n<summary>\n[summary of the branch]\n</summary>"
|
||||
}]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## What the LLM Never Sees
|
||||
|
||||
1. **`appendEntry` data** — Extension-private entries (`pi.appendEntry("my-state", data)`) are stored in the session file but NEVER included in the message array. They're not `AgentMessage` types at all — they're `CustomEntry` session entries.
|
||||
|
||||
2. **`details` on custom messages** — The `details` field is for rendering and state reconstruction. `convertToLlm` strips it.
|
||||
|
||||
3. **`details` on tool results** — Tool result `details` are stripped by the LLM message conversion. Only `content` reaches the LLM.
|
||||
|
||||
4. **`!!` bash execution output** — Explicitly excluded from context.
|
||||
|
||||
5. **Tool definitions not in the active set** — If a tool is registered but not in `getActiveTools()`, the LLM doesn't know it exists.
|
||||
|
||||
6. **`promptSnippet` and `promptGuidelines` from inactive tools** — Only active tools contribute to the system prompt.
|
||||
|
||||
---
|
||||
|
||||
## The Message Array Order
|
||||
|
||||
For a typical conversation, the message array the LLM sees (after `context` event and `convertToLlm`) looks like:
|
||||
|
||||
```
|
||||
1. [compactionSummary → user] (if compaction happened)
|
||||
2. [branchSummary → user] (if navigated back from a branch)
|
||||
3. [user] (first user message after compaction)
|
||||
4. [assistant] (LLM response)
|
||||
5. [toolResult] (tool results)
|
||||
6. [user] (next user message)
|
||||
7. [custom → user] (extension-injected message)
|
||||
8. ...continues...
|
||||
9. [user] (current prompt)
|
||||
10. [custom → user] (before_agent_start injected messages)
|
||||
11. [custom → user] (nextTurn queued messages)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implications for Extension Authors
|
||||
|
||||
### If you want the LLM to see something:
|
||||
- Use `before_agent_start` → `message` for per-prompt context
|
||||
- Use `context` event to inject into the message array per-turn
|
||||
- Use `pi.sendMessage` for standalone messages
|
||||
- Use `before_agent_start` → `systemPrompt` for system-level instructions
|
||||
|
||||
### If you want to hide something from the LLM:
|
||||
- Use `pi.appendEntry` — never reaches the message array
|
||||
- Use tool result `details` — stored in session but stripped before LLM
|
||||
- Use the `context` event to filter messages OUT of the array
|
||||
- There is NO way to inject UI-only messages that participate in the conversation flow — `display: false` only hides from the TUI, not from the LLM
|
||||
|
||||
### If you want something to survive compaction:
|
||||
- Store it in tool result `details` (survives in the kept entries)
|
||||
- Store it in `appendEntry` (survives as session data, not messages)
|
||||
- Re-inject it via `before_agent_start` every time (survives because you regenerate it)
|
||||
- Messages in the compacted range are replaced by the compaction summary — they're gone from the LLM's perspective
|
||||
|
|
@ -1,233 +0,0 @@
|
|||
# Inter-Extension Communication
|
||||
|
||||
How extensions communicate with each other, share state, and coordinate behavior.
|
||||
|
||||
---
|
||||
|
||||
## pi.events — The Shared Event Bus
|
||||
|
||||
Every extension receives the same `pi.events` instance. It's a simple typed pub/sub bus.
|
||||
|
||||
### API
|
||||
|
||||
```typescript
|
||||
// Emit an event on a channel
|
||||
pi.events.emit("my-channel", { action: "started", id: 123 });
|
||||
|
||||
// Subscribe to a channel — returns an unsubscribe function
|
||||
const unsub = pi.events.on("my-channel", (data) => {
|
||||
// data is typed as `unknown` — you must cast
|
||||
const payload = data as { action: string; id: number };
|
||||
console.log(payload.action); // "started"
|
||||
});
|
||||
|
||||
// Later: stop listening
|
||||
unsub();
|
||||
```
|
||||
|
||||
### Characteristics
|
||||
|
||||
| Property | Behavior |
|
||||
|---|---|
|
||||
| **Typing** | `data` is `unknown`. No generics. Cast at the consumer. |
|
||||
| **Error handling** | Handlers are wrapped in async try/catch. Errors log to `console.error` but don't propagate to emitter or crash the session. |
|
||||
| **Ordering** | Handlers fire in subscription order (order of `pi.events.on` calls). |
|
||||
| **Persistence** | No replay, no persistence. If you emit before anyone subscribes, the event is lost. |
|
||||
| **Scope** | Shared across ALL extensions in the session. The bus is created once and passed to every extension's `createExtensionAPI`. |
|
||||
| **Lifecycle** | The bus is cleared on extension reload (`/reload`). Subscriptions from the old extension instances are gone. |
|
||||
|
||||
### Example: Extension A Signals Extension B
|
||||
|
||||
```typescript
|
||||
// Extension A: plan-mode.ts
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.registerCommand("plan", {
|
||||
handler: async (_args, ctx) => {
|
||||
planEnabled = !planEnabled;
|
||||
pi.events.emit("mode-change", { mode: planEnabled ? "plan" : "normal" });
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
// Extension B: status-display.ts
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.events.on("mode-change", (data) => {
|
||||
const { mode } = data as { mode: string };
|
||||
// React to mode change
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Limitations
|
||||
|
||||
- **No request/response** — emit is fire-and-forget. If you need a response, use shared state or a callback pattern.
|
||||
- **No guaranteed delivery** — if the subscriber hasn't loaded yet (load order matters), the event is missed.
|
||||
- **No channel namespacing** — use descriptive channel names to avoid collisions (e.g., `"myext:event"` rather than `"update"`).
|
||||
|
||||
---
|
||||
|
||||
## Shared State Patterns
|
||||
|
||||
### Pattern 1: Shared Module State
|
||||
|
||||
If two extensions are loaded from the same package (via `package.json` `pi.extensions` array), they can share state through module-level variables in a shared file.
|
||||
|
||||
```
|
||||
my-extension/
|
||||
├── package.json # pi.extensions: ["./a.ts", "./b.ts"]
|
||||
├── a.ts # import { state } from "./shared.ts"
|
||||
├── b.ts # import { state } from "./shared.ts"
|
||||
└── shared.ts # export const state = { count: 0 }
|
||||
```
|
||||
|
||||
**Caveat:** jiti module caching means the shared module is loaded once. But on `/reload`, everything is re-imported from scratch — shared state resets.
|
||||
|
||||
### Pattern 2: Event Bus as State Channel
|
||||
|
||||
Use `pi.events` to broadcast state changes. Each extension maintains its own copy.
|
||||
|
||||
```typescript
|
||||
// Extension A: authoritative state owner
|
||||
let items: string[] = [];
|
||||
|
||||
function addItem(item: string) {
|
||||
items.push(item);
|
||||
pi.events.emit("items:updated", { items: [...items] });
|
||||
}
|
||||
|
||||
// Extension B: state consumer
|
||||
let mirroredItems: string[] = [];
|
||||
|
||||
pi.events.on("items:updated", (data) => {
|
||||
mirroredItems = (data as { items: string[] }).items;
|
||||
});
|
||||
```
|
||||
|
||||
### Pattern 3: Session Entries as Coordination Points
|
||||
|
||||
Extensions can read each other's `appendEntry` data from the session:
|
||||
|
||||
```typescript
|
||||
// Extension A writes:
|
||||
pi.appendEntry("ext-a-config", { theme: "dark", verbose: true });
|
||||
|
||||
// Extension B reads during session_start:
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
for (const entry of ctx.sessionManager.getEntries()) {
|
||||
if (entry.type === "custom" && entry.customType === "ext-a-config") {
|
||||
const config = entry.data as { theme: string; verbose: boolean };
|
||||
// Use config from Extension A
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
**Downside:** This only works after `session_start`. Not suitable for real-time coordination during a turn.
|
||||
|
||||
---
|
||||
|
||||
## Multi-Extension Coordination Patterns
|
||||
|
||||
### Pattern: Mode Manager
|
||||
|
||||
One extension acts as the mode authority, others react:
|
||||
|
||||
```typescript
|
||||
// mode-manager.ts — the authority
|
||||
export default function (pi: ExtensionAPI) {
|
||||
let currentMode: "plan" | "execute" | "review" = "execute";
|
||||
|
||||
pi.registerCommand("mode", {
|
||||
handler: async (args, ctx) => {
|
||||
const newMode = args.trim() as typeof currentMode;
|
||||
if (!["plan", "execute", "review"].includes(newMode)) {
|
||||
ctx.ui.notify(`Invalid mode: ${newMode}`, "error");
|
||||
return;
|
||||
}
|
||||
currentMode = newMode;
|
||||
pi.events.emit("mode:changed", { mode: currentMode });
|
||||
ctx.ui.notify(`Mode: ${currentMode}`);
|
||||
},
|
||||
});
|
||||
|
||||
// Other extensions can query current mode via event
|
||||
pi.events.on("mode:query", () => {
|
||||
pi.events.emit("mode:current", { mode: currentMode });
|
||||
});
|
||||
}
|
||||
|
||||
// tool-guard.ts — reacts to mode changes
|
||||
export default function (pi: ExtensionAPI) {
|
||||
let currentMode = "execute";
|
||||
|
||||
pi.events.on("mode:changed", (data) => {
|
||||
currentMode = (data as { mode: string }).mode;
|
||||
});
|
||||
|
||||
pi.on("tool_call", async (event) => {
|
||||
if (currentMode === "plan" && ["edit", "write"].includes(event.toolName)) {
|
||||
return { block: true, reason: "Plan mode: write operations disabled" };
|
||||
}
|
||||
if (currentMode === "review" && event.toolName === "bash") {
|
||||
return { block: true, reason: "Review mode: bash disabled" };
|
||||
}
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern: Extension Priority Chain
|
||||
|
||||
When multiple extensions handle the same hook, load order determines priority. Project-local extensions load before global ones. Within a directory, files are discovered in filesystem order.
|
||||
|
||||
If you need explicit priority control:
|
||||
|
||||
```typescript
|
||||
// priority-extension.ts
|
||||
export default function (pi: ExtensionAPI) {
|
||||
// Register with a known channel so other extensions can defer
|
||||
pi.events.emit("priority:registered", { name: "security-guard" });
|
||||
|
||||
pi.on("tool_call", async (event) => {
|
||||
// This runs first if loaded first
|
||||
if (isUnsafe(event)) {
|
||||
return { block: true, reason: "Security policy violation" };
|
||||
}
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## The ExtensionContext in Tools
|
||||
|
||||
Tools registered by extensions receive `ExtensionContext` as their 5th `execute` parameter. This is the same context event handlers get:
|
||||
|
||||
```typescript
|
||||
pi.registerTool({
|
||||
name: "my_tool",
|
||||
// ...
|
||||
async execute(toolCallId, params, signal, onUpdate, ctx) {
|
||||
// ctx.ui — dialog methods, notifications, widgets
|
||||
// ctx.sessionManager — read session state
|
||||
// ctx.model — current model
|
||||
// ctx.cwd — working directory
|
||||
// ctx.hasUI — false in print/json mode
|
||||
// ctx.isIdle() — agent state
|
||||
// ctx.abort() — abort current operation
|
||||
// ctx.getContextUsage() — token usage
|
||||
// ctx.compact() — trigger compaction
|
||||
// ctx.getSystemPrompt() — current system prompt
|
||||
|
||||
if (ctx.hasUI) {
|
||||
const confirmed = await ctx.ui.confirm("Proceed?", "This will modify files");
|
||||
if (!confirmed) {
|
||||
return { content: [{ type: "text", text: "Cancelled by user" }] };
|
||||
}
|
||||
}
|
||||
|
||||
// ... do work
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
**Important:** The `ctx` is freshly created via `runner.createContext()` for each tool execution. It reflects the current state at call time (current model, current session, etc.), not the state when the tool was registered.
|
||||
|
|
@ -1,382 +0,0 @@
|
|||
# Advanced Patterns from Source
|
||||
|
||||
Production patterns extracted from the pi codebase, built-in extensions, and real extension examples. Each pattern shows the mechanism, the source of truth, and when to use it.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 1: Mode-Aware Tool Sets with Context Injection
|
||||
|
||||
**Source:** `plan-mode/index.ts` — the built-in plan mode extension.
|
||||
|
||||
This pattern combines tool set management, tool call blocking, context event filtering, and before_agent_start injection into a cohesive mode system.
|
||||
|
||||
### The Architecture
|
||||
|
||||
```
|
||||
/plan toggle → sets planModeEnabled
|
||||
├─► setActiveTools(PLAN_MODE_TOOLS) # restrict available tools
|
||||
├─► tool_call guard # block unsafe bash even if tool is active
|
||||
├─► before_agent_start # inject mode-specific instructions
|
||||
├─► context # filter stale mode messages on mode exit
|
||||
└─► agent_end # check plan output, offer execution
|
||||
```
|
||||
|
||||
### Key Insight: Defense in Depth
|
||||
|
||||
The plan mode uses THREE layers of tool control:
|
||||
|
||||
1. **`setActiveTools`** — removes write tools from the active set entirely. The LLM doesn't even know they exist.
|
||||
2. **`tool_call` guard** — even for allowed tools like `bash`, blocks destructive commands via an allowlist.
|
||||
3. **`context` filter** — when exiting plan mode, removes stale plan mode context messages so they don't confuse the LLM in normal mode.
|
||||
|
||||
```typescript
|
||||
// Layer 1: Tool set
|
||||
if (planModeEnabled) {
|
||||
pi.setActiveTools(["read", "bash", "grep", "find", "ls"]);
|
||||
}
|
||||
|
||||
// Layer 2: Bash guard
|
||||
pi.on("tool_call", async (event) => {
|
||||
if (!planModeEnabled || event.toolName !== "bash") return;
|
||||
if (!isSafeCommand(event.input.command)) {
|
||||
return { block: true, reason: "Plan mode: command blocked" };
|
||||
}
|
||||
});
|
||||
|
||||
// Layer 3: Context cleanup on mode exit
|
||||
pi.on("context", async (event) => {
|
||||
if (planModeEnabled) return; // keep plan context when in plan mode
|
||||
return {
|
||||
messages: event.messages.filter(m => {
|
||||
// Remove plan mode markers from context
|
||||
if (m.customType === "plan-mode-context") return false;
|
||||
return true;
|
||||
}),
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### Why This Matters
|
||||
|
||||
A naive implementation would just change the tool set. But:
|
||||
- `bash` with `rm -rf` is technically a "read-only" tool by name
|
||||
- Stale context messages from a previous mode can confuse the LLM
|
||||
- The LLM might try to work around restrictions if it sees the mode instructions but has the tools available
|
||||
|
||||
---
|
||||
|
||||
## Pattern 2: Preset System with Dynamic Model + Tool + Prompt Configuration
|
||||
|
||||
**Source:** `preset.ts` — the built-in preset extension.
|
||||
|
||||
This pattern shows how to build a full configuration management system that coordinates model, thinking level, tools, and system prompt from a single config file.
|
||||
|
||||
### The Architecture
|
||||
|
||||
```
|
||||
presets.json → load on session_start
|
||||
│
|
||||
├─► /preset command → applyPreset(name)
|
||||
├─► Ctrl+Shift+U → cyclePreset()
|
||||
├─► --preset flag → applyPreset on startup
|
||||
│
|
||||
applyPreset:
|
||||
├─► pi.setModel() → switch model
|
||||
├─► pi.setThinkingLevel() → adjust thinking
|
||||
├─► pi.setActiveTools() → reconfigure tools
|
||||
└─► store activePreset → before_agent_start reads it
|
||||
|
||||
before_agent_start:
|
||||
└─► append preset.instructions to system prompt
|
||||
```
|
||||
|
||||
### Key Insight: Deferred System Prompt Application
|
||||
|
||||
The preset doesn't modify the system prompt during `applyPreset`. It stores `activePreset` and lets `before_agent_start` read it:
|
||||
|
||||
```typescript
|
||||
// On apply — just store
|
||||
activePresetName = name;
|
||||
activePreset = preset;
|
||||
|
||||
// On each prompt — inject
|
||||
pi.on("before_agent_start", async (event) => {
|
||||
if (activePreset?.instructions) {
|
||||
return {
|
||||
systemPrompt: `${event.systemPrompt}\n\n${activePreset.instructions}`,
|
||||
};
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
This is better than calling `agent.setSystemPrompt()` directly because:
|
||||
- `before_agent_start` fires on every prompt, keeping the system prompt current
|
||||
- The base system prompt is rebuilt by pi when tools change — a direct set would be overwritten
|
||||
- Other extensions can see and further modify the prompt in the chain
|
||||
|
||||
---
|
||||
|
||||
## Pattern 3: Progress Tracking with Widget + State Persistence
|
||||
|
||||
**Source:** `plan-mode/index.ts` — todo item tracking during plan execution.
|
||||
|
||||
### The Architecture
|
||||
|
||||
```
|
||||
Plan created (assistant message with "Plan:" section)
|
||||
→ extractTodoItems() parses numbered steps
|
||||
→ todoItems stored in memory
|
||||
→ ui.setWidget() shows progress
|
||||
→ appendEntry() persists state
|
||||
|
||||
Each turn:
|
||||
→ turn_end checks for [DONE:n] markers
|
||||
→ markCompletedSteps() updates todoItems
|
||||
→ updateStatus() refreshes widget
|
||||
|
||||
Session resume:
|
||||
→ session_start restores from appendEntry
|
||||
→ Re-scans messages after last execute marker for [DONE:n]
|
||||
→ Rebuilds completion state
|
||||
```
|
||||
|
||||
### Key Insight: Dual State Reconstruction
|
||||
|
||||
On session resume, the extension does TWO things:
|
||||
|
||||
1. **Reads the persisted state** from `appendEntry`:
|
||||
```typescript
|
||||
const planModeEntry = entries
|
||||
.filter(e => e.type === "custom" && e.customType === "plan-mode")
|
||||
.pop();
|
||||
```
|
||||
|
||||
2. **Re-scans assistant messages** for completion markers:
|
||||
```typescript
|
||||
// Only scan messages AFTER the last plan-mode-execute marker
|
||||
const allText = messages.map(getTextContent).join("\n");
|
||||
markCompletedSteps(allText, todoItems);
|
||||
```
|
||||
|
||||
This handles the case where the extension crashed or was reloaded mid-execution — the persisted state might be stale, but the messages are the source of truth.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 4: Dynamic Resource Injection
|
||||
|
||||
**Source:** `dynamic-resources/index.ts` — extension that ships its own skills and themes.
|
||||
|
||||
```typescript
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
|
||||
const baseDir = dirname(fileURLToPath(import.meta.url));
|
||||
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.on("resources_discover", () => {
|
||||
return {
|
||||
skillPaths: [join(baseDir, "SKILL.md")],
|
||||
promptPaths: [join(baseDir, "dynamic.md")],
|
||||
themePaths: [join(baseDir, "dynamic.json")],
|
||||
};
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### How It Works Internally
|
||||
|
||||
After `session_start`, the runner calls `emitResourcesDiscover()`. The returned paths are processed through the `ResourceLoader`:
|
||||
|
||||
1. Skills → loaded, added to system prompt's skill listing
|
||||
2. Prompts → loaded as prompt templates, available via `/templatename`
|
||||
3. Themes → loaded, available via `/theme` or `ctx.ui.setTheme()`
|
||||
|
||||
The system prompt is rebuilt after resources are extended, so new skills appear in the same prompt turn.
|
||||
|
||||
### When to Use
|
||||
|
||||
- Extension packages that need custom skills (e.g., a deployment extension with a "deploy checklist" skill)
|
||||
- Theme packs distributed as extensions
|
||||
- Dynamic prompt templates that depend on the project context
|
||||
|
||||
---
|
||||
|
||||
## Pattern 5: Claude Rules Integration
|
||||
|
||||
**Source:** `claude-rules.ts` — scanning `.claude/rules/` for per-project rules.
|
||||
|
||||
### The Architecture
|
||||
|
||||
```
|
||||
session_start:
|
||||
→ Scan .claude/rules/ for .md files (recursive)
|
||||
→ Store file list
|
||||
|
||||
before_agent_start:
|
||||
→ Append file list to system prompt
|
||||
→ Agent uses read tool to load specific rules on demand
|
||||
```
|
||||
|
||||
### Key Insight: Listing, Not Loading
|
||||
|
||||
The extension does NOT load rule file contents into the system prompt. It lists the files:
|
||||
|
||||
```typescript
|
||||
pi.on("before_agent_start", async (event) => {
|
||||
if (ruleFiles.length === 0) return;
|
||||
|
||||
const rulesList = ruleFiles.map(f => `- .claude/rules/${f}`).join("\n");
|
||||
|
||||
return {
|
||||
systemPrompt: event.systemPrompt + `
|
||||
|
||||
## Project Rules
|
||||
The following project rules are available in .claude/rules/:
|
||||
${rulesList}
|
||||
When working on tasks related to these rules, use the read tool to load the relevant rule files.`,
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
This is context-efficient: the system prompt grows by one line per rule file, not by the full contents of every rule. The LLM loads specific rules via `read` only when relevant.
|
||||
|
||||
---
|
||||
|
||||
## Pattern 6: Remote Execution via Tool Wrapping
|
||||
|
||||
**Source:** The SSH extension pattern and `createBashTool` with pluggable operations.
|
||||
|
||||
### The Architecture
|
||||
|
||||
Tools support pluggable `operations` that replace the underlying I/O:
|
||||
|
||||
```typescript
|
||||
import { createBashTool } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
// Create a bash tool that executes via SSH
|
||||
const remoteBash = createBashTool(cwd, {
|
||||
operations: {
|
||||
execute: async (command, options) => {
|
||||
return sshExec(remoteHost, command, options);
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
// Register it as the bash tool (overrides built-in)
|
||||
pi.registerTool({
|
||||
...remoteBash,
|
||||
name: "bash", // same name = overrides built-in
|
||||
});
|
||||
```
|
||||
|
||||
### The spawnHook Alternative
|
||||
|
||||
For lighter customization (e.g., environment setup):
|
||||
|
||||
```typescript
|
||||
const bashTool = createBashTool(cwd, {
|
||||
spawnHook: ({ command, cwd, env }) => ({
|
||||
command: `source ~/.profile\n${command}`,
|
||||
cwd: `/mnt/sandbox${cwd}`,
|
||||
env: { ...env, CI: "1" },
|
||||
}),
|
||||
});
|
||||
```
|
||||
|
||||
### User Bash Hook for `!` Commands
|
||||
|
||||
The `user_bash` event lets you intercept user-typed bash commands (not LLM-initiated ones):
|
||||
|
||||
```typescript
|
||||
pi.on("user_bash", async (event) => {
|
||||
// Route user bash commands through SSH too
|
||||
return {
|
||||
operations: {
|
||||
execute: (cmd, opts) => sshExec(remoteHost, cmd, opts),
|
||||
},
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 7: Extension-Aware Compaction
|
||||
|
||||
**Source:** `session_before_compact` in agent-session.ts.
|
||||
|
||||
### Custom Compaction Summary
|
||||
|
||||
Override the default LLM-generated summary:
|
||||
|
||||
```typescript
|
||||
pi.on("session_before_compact", async (event) => {
|
||||
// Build a domain-specific summary
|
||||
const summary = buildCustomSummary(event.branchEntries);
|
||||
|
||||
return {
|
||||
compaction: {
|
||||
summary,
|
||||
firstKeptEntryId: event.preparation.firstKeptEntryId,
|
||||
tokensBefore: event.preparation.tokensBefore,
|
||||
},
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### Compaction-Aware State
|
||||
|
||||
If your extension stores state in messages that might get compacted away, you need a reconstruction strategy:
|
||||
|
||||
```typescript
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
// Check if there's been a compaction
|
||||
const entries = ctx.sessionManager.getBranch();
|
||||
const hasCompaction = entries.some(e => e.type === "compaction");
|
||||
|
||||
if (hasCompaction) {
|
||||
// State before compaction is gone from messages
|
||||
// Fall back to appendEntry data or re-derive from remaining messages
|
||||
restoreFromAppendEntries(entries);
|
||||
} else {
|
||||
// Full message history available
|
||||
restoreFromToolResults(entries);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pattern 8: The Complete Extension Initialization Sequence
|
||||
|
||||
From the source code, the full initialization order is:
|
||||
|
||||
```
|
||||
1. Extension factory function runs
|
||||
├─► pi.on() — register event handlers
|
||||
├─► pi.registerTool() — register tools
|
||||
├─► pi.registerCommand() — register commands
|
||||
├─► pi.registerShortcut() — register shortcuts
|
||||
├─► pi.registerFlag() — register CLI flags
|
||||
└─► pi.registerProvider() — queued (not yet applied)
|
||||
|
||||
2. ExtensionRunner created with all extensions
|
||||
|
||||
3. bindCore() — action methods become live
|
||||
├─► pi.sendMessage, pi.setActiveTools, etc. now work
|
||||
└─► Queued provider registrations flushed to ModelRegistry
|
||||
|
||||
4. bindExtensions() — UI context and command context connected
|
||||
└─► setUIContext(), bindCommandContext()
|
||||
|
||||
5. session_start event fires
|
||||
└─► Extensions restore state from session
|
||||
|
||||
6. resources_discover event fires
|
||||
└─► Extensions provide additional skill/prompt/theme paths
|
||||
|
||||
7. System prompt rebuilt with new resources
|
||||
|
||||
8. Ready for first user prompt
|
||||
```
|
||||
|
||||
**Important timing:** During step 1, action methods (`sendMessage`, `setActiveTools`, etc.) will throw. You can only register handlers and tools during the factory function. Use `session_start` for anything that needs runtime access.
|
||||
|
|
@ -1,316 +0,0 @@
|
|||
# The System Prompt Anatomy
|
||||
|
||||
How pi's system prompt is built, what goes into it, when it's rebuilt, and every lever you have to shape it.
|
||||
|
||||
---
|
||||
|
||||
## The Final Prompt Structure
|
||||
|
||||
When `buildSystemPrompt()` runs, it assembles sections in this exact order:
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────┐
|
||||
│ 1. Base prompt (default or SYSTEM.md override) │
|
||||
│ ├── Identity statement │
|
||||
│ ├── Available tools list │
|
||||
│ ├── Custom tools note │
|
||||
│ ├── Guidelines │
|
||||
│ └── Pi documentation pointers │
|
||||
│ │
|
||||
│ 2. Append system prompt (APPEND_SYSTEM.md) │
|
||||
│ │
|
||||
│ 3. Project context files │
|
||||
│ ├── ~/.gsd/agent/AGENTS.md (global) │
|
||||
│ ├── Ancestor AGENTS.md / CLAUDE.md files │
|
||||
│ └── cwd AGENTS.md / CLAUDE.md │
|
||||
│ │
|
||||
│ 4. Skills listing │
|
||||
│ └── <available_skills> XML block │
|
||||
│ │
|
||||
│ 5. Date/time and working directory │
|
||||
└──────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
After `buildSystemPrompt()`, extensions can further modify via `before_agent_start`.
|
||||
|
||||
---
|
||||
|
||||
## Section 1: The Base Prompt
|
||||
|
||||
### Default Base Prompt (no SYSTEM.md)
|
||||
|
||||
When no SYSTEM.md exists, pi uses its built-in base:
|
||||
|
||||
```
|
||||
You are an expert coding assistant operating inside pi, a coding agent harness.
|
||||
You help users by reading files, executing commands, editing code, and writing new files.
|
||||
|
||||
Available tools:
|
||||
- read: Read file contents
|
||||
- bash: Execute bash commands (ls, grep, find, etc.)
|
||||
- edit: Make surgical edits to files (find exact text and replace)
|
||||
- write: Create or overwrite files
|
||||
- my_custom_tool: [promptSnippet or description]
|
||||
|
||||
In addition to the tools above, you may have access to other custom tools
|
||||
depending on the project.
|
||||
|
||||
Guidelines:
|
||||
- Use bash for file operations like ls, rg, find
|
||||
- Use read to examine files before editing. You must use this tool instead of cat or sed.
|
||||
- Use edit for precise changes (old text must match exactly)
|
||||
- Use write only for new files or complete rewrites
|
||||
- [extension tool promptGuidelines inserted here]
|
||||
- Be concise in your responses
|
||||
- Show file paths clearly when working with files
|
||||
|
||||
Pi documentation (read only when the user asks about pi itself...):
|
||||
- Main documentation: [path]
|
||||
- Additional docs: [path]
|
||||
- Examples: [path]
|
||||
```
|
||||
|
||||
### SYSTEM.md Override (full replacement)
|
||||
|
||||
If `.gsd/SYSTEM.md` (project) or `~/.gsd/agent/SYSTEM.md` (global) exists, its contents **completely replace** the default base prompt above. The tools list, guidelines, pi docs pointers — all gone. You own the entire base.
|
||||
|
||||
Project takes precedence over global. Only one SYSTEM.md is used (first found wins).
|
||||
|
||||
**What still gets appended even with a custom SYSTEM.md:**
|
||||
- APPEND_SYSTEM.md content
|
||||
- Project context files (AGENTS.md / CLAUDE.md)
|
||||
- Skills listing (if the `read` tool is active)
|
||||
- Date/time and cwd
|
||||
|
||||
**What you lose:**
|
||||
- The entire default prompt structure
|
||||
- Built-in tool descriptions and guidelines
|
||||
- Pi documentation pointers
|
||||
- Dynamic guidelines from `promptGuidelines` on tools
|
||||
|
||||
### How Tool Descriptions Appear
|
||||
|
||||
Each active tool gets a line in "Available tools":
|
||||
|
||||
```
|
||||
- toolname: [one-line description]
|
||||
```
|
||||
|
||||
The description is determined by priority:
|
||||
1. `promptSnippet` from the tool registration (if provided)
|
||||
2. Built-in description from `toolDescriptions` map (for read, bash, edit, write, grep, find, ls)
|
||||
3. The tool's `name` as fallback
|
||||
|
||||
`promptSnippet` is normalized: newlines collapsed to spaces, trimmed to a single line.
|
||||
|
||||
### How Guidelines Are Built
|
||||
|
||||
Guidelines are assembled dynamically based on which tools are active:
|
||||
|
||||
| Condition | Guideline |
|
||||
|---|---|
|
||||
| bash active, no grep/find/ls | "Use bash for file operations like ls, rg, find" |
|
||||
| bash active + grep/find/ls | "Prefer grep/find/ls tools over bash for file exploration" |
|
||||
| read + edit active | "Use read to examine files before editing" |
|
||||
| edit active | "Use edit for precise changes (old text must match exactly)" |
|
||||
| write active | "Use write only for new files or complete rewrites" |
|
||||
| edit or write active | "When summarizing your actions, output plain text directly" |
|
||||
| Always | "Be concise in your responses" |
|
||||
| Always | "Show file paths clearly when working with files" |
|
||||
|
||||
**Extension tool guidelines** from `promptGuidelines` are appended after the built-in guidelines. They're deduplicated (same string appears only once even if multiple tools register it).
|
||||
|
||||
---
|
||||
|
||||
## Section 2: Append System Prompt
|
||||
|
||||
If `.gsd/APPEND_SYSTEM.md` (project) or `~/.gsd/agent/APPEND_SYSTEM.md` (global) exists, its contents are appended after the base prompt.
|
||||
|
||||
This is the safe way to add project-wide instructions without replacing the default prompt. It works with both the default base and a custom SYSTEM.md.
|
||||
|
||||
---
|
||||
|
||||
## Section 3: Project Context Files
|
||||
|
||||
Pi walks the filesystem collecting context files:
|
||||
|
||||
```
|
||||
1. ~/.gsd/agent/AGENTS.md (global)
|
||||
2. Walk from cwd upward to root:
|
||||
- Each directory: check for AGENTS.md, then CLAUDE.md (first found wins per directory)
|
||||
- Files are collected root-down (ancestors first, cwd last)
|
||||
```
|
||||
|
||||
All found files are concatenated under a "# Project Context" header:
|
||||
|
||||
```markdown
|
||||
# Project Context
|
||||
|
||||
Project-specific instructions and guidelines:
|
||||
|
||||
## /Users/you/.gsd/agent/AGENTS.md
|
||||
|
||||
[global AGENTS.md content]
|
||||
|
||||
## /Users/you/projects/myapp/AGENTS.md
|
||||
|
||||
[project AGENTS.md content]
|
||||
```
|
||||
|
||||
**AGENTS.md vs CLAUDE.md:** Both are treated identically. Per directory, AGENTS.md is checked first. If it exists, CLAUDE.md in the same directory is skipped.
|
||||
|
||||
---
|
||||
|
||||
## Section 4: Skills Listing
|
||||
|
||||
If the `read` tool is active and skills are loaded, an XML block is appended:
|
||||
|
||||
```xml
|
||||
The following skills provide specialized instructions for specific tasks.
|
||||
Use the read tool to load a skill's file when the task matches its description.
|
||||
When a skill file references a relative path, resolve it against the skill directory.
|
||||
|
||||
<available_skills>
|
||||
<skill>
|
||||
<name>commit-outstanding</name>
|
||||
<description>Commit all uncommitted files in logical groups</description>
|
||||
<location>/Users/you/.agents/skills/commit-outstanding/SKILL.md</location>
|
||||
</skill>
|
||||
</available_skills>
|
||||
```
|
||||
|
||||
Skills with `disable-model-invocation: true` in their frontmatter are excluded from this listing.
|
||||
|
||||
**Key design:** Only names, descriptions, and file paths go into the system prompt. The full skill content is NOT loaded. The agent uses the `read` tool to load specific skills on demand. This keeps the system prompt small even with many skills.
|
||||
|
||||
---
|
||||
|
||||
## Section 5: Date/Time and CWD
|
||||
|
||||
Always appended last:
|
||||
|
||||
```
|
||||
Current date and time: Saturday, March 7, 2026 at 08:55:05 AM CST
|
||||
Current working directory: /Users/you/projects/myapp
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## When the System Prompt Is Rebuilt
|
||||
|
||||
The base system prompt (`_baseSystemPrompt`) is rebuilt in these situations:
|
||||
|
||||
| Trigger | What happens |
|
||||
|---|---|
|
||||
| **Startup** (`_buildRuntime`) | Full rebuild with initial tool set |
|
||||
| **`setActiveToolsByName()`** | Rebuild with new tool set (guidelines and snippets change) |
|
||||
| **`reload()`** (`/reload`) | Full rebuild — reloads SYSTEM.md, APPEND_SYSTEM.md, context files, skills, extensions |
|
||||
| **`extendResourcesFromExtensions()`** | Rebuild after `resources_discover` adds new skills/prompts/themes |
|
||||
| **`_refreshToolRegistry()`** | Rebuild when extension tools change dynamically |
|
||||
|
||||
### Per-Prompt Modifications
|
||||
|
||||
On each user prompt, the `before_agent_start` hook can modify the system prompt. This modification is **not persisted** — the base prompt is restored if no extension modifies it on the next prompt:
|
||||
|
||||
```
|
||||
User prompt 1:
|
||||
before_agent_start → extensions modify system prompt → LLM sees modified version
|
||||
|
||||
User prompt 2:
|
||||
before_agent_start → no extensions modify → LLM sees base system prompt (reset)
|
||||
```
|
||||
|
||||
This means `before_agent_start` modifications are truly per-prompt. You cannot make a permanent system prompt change through this hook alone (the change must be re-applied every time).
|
||||
|
||||
---
|
||||
|
||||
## Every Lever for Shaping the System Prompt
|
||||
|
||||
From static configuration to dynamic extension hooks, ordered from broadest to most targeted:
|
||||
|
||||
### Static (file-based, loaded at startup)
|
||||
|
||||
| Mechanism | Scope | Effect |
|
||||
|---|---|---|
|
||||
| `SYSTEM.md` | Replace base prompt entirely | Nuclear option — you own everything |
|
||||
| `APPEND_SYSTEM.md` | Append to base prompt | Safe additive instructions |
|
||||
| `AGENTS.md` / `CLAUDE.md` | Project context section | Per-project conventions and rules |
|
||||
| Skill `SKILL.md` files | Skills listing | On-demand capability descriptions |
|
||||
|
||||
### Dynamic (extension-based, runtime)
|
||||
|
||||
| Mechanism | Scope | Timing | Effect |
|
||||
|---|---|---|---|
|
||||
| `before_agent_start` → `systemPrompt` | Full prompt | Per user prompt | Modify/append/replace system prompt |
|
||||
| `promptSnippet` on tools | Tool description line | When tool set changes | Custom one-liner in "Available tools" |
|
||||
| `promptGuidelines` on tools | Guidelines section | When tool set changes | Add behavioral bullets |
|
||||
| `pi.setActiveTools()` | Tool list + guidelines | Immediate, next prompt | Add/remove tools (rebuilds prompt) |
|
||||
| `resources_discover` event | Skills listing | Startup + reload | Inject additional skills from extensions |
|
||||
|
||||
### Per-Turn (message-based, not system prompt)
|
||||
|
||||
These don't modify the system prompt but add to what the LLM sees:
|
||||
|
||||
| Mechanism | Timing | Effect |
|
||||
|---|---|---|
|
||||
| `before_agent_start` → `message` | Per user prompt | Inject custom message (becomes user role) |
|
||||
| `context` event | Per LLM turn | Filter/inject/transform message array |
|
||||
| `pi.sendMessage()` | Anytime | Inject custom message into conversation |
|
||||
|
||||
---
|
||||
|
||||
## Practical Tradeoffs
|
||||
|
||||
### SYSTEM.md vs before_agent_start
|
||||
|
||||
| | SYSTEM.md | before_agent_start |
|
||||
|---|---|---|
|
||||
| **Persistence** | Permanent until file changes | Per-prompt, must re-apply |
|
||||
| **Dynamism** | Static file content | Can compute based on state |
|
||||
| **Tool awareness** | Loses built-in tool guidelines | Preserves base prompt, appends |
|
||||
| **Composability** | Only one SYSTEM.md (project or global) | Multiple extensions can chain |
|
||||
|
||||
**Recommendation:** Use SYSTEM.md only when you genuinely need to replace the entire prompt (e.g., custom agent personality, non-coding use case). Use `before_agent_start` for everything else.
|
||||
|
||||
### APPEND_SYSTEM.md vs AGENTS.md
|
||||
|
||||
Both append content, but they appear in different sections:
|
||||
|
||||
- **APPEND_SYSTEM.md** appears immediately after the base prompt, before "# Project Context"
|
||||
- **AGENTS.md** appears inside "# Project Context" with a `## filepath` header
|
||||
|
||||
Functionally equivalent for the LLM. Use APPEND_SYSTEM.md for instructions that feel like system-level directives. Use AGENTS.md for project-specific conventions and context.
|
||||
|
||||
### promptGuidelines vs before_agent_start
|
||||
|
||||
| | promptGuidelines | before_agent_start |
|
||||
|---|---|---|
|
||||
| **Scope** | Only when the tool is active | Always (or conditionally in your code) |
|
||||
| **Positioning** | Inside "Guidelines" section | Appended to end (or wherever you put it) |
|
||||
| **Tool coupling** | Automatically appears/disappears with tool | Independent of tool state |
|
||||
|
||||
**Recommendation:** Use `promptGuidelines` for instructions directly related to tool usage. Use `before_agent_start` for behavioral modifications independent of tool state.
|
||||
|
||||
---
|
||||
|
||||
## The Full Context Surface Area
|
||||
|
||||
Everything the LLM sees on a given turn:
|
||||
|
||||
```
|
||||
System prompt (built from all sources above + before_agent_start mods)
|
||||
+
|
||||
Message array (after context event filtering + convertToLlm):
|
||||
- Compaction summaries (user role)
|
||||
- Branch summaries (user role)
|
||||
- Historical user/assistant/toolResult messages
|
||||
- Bash execution results (user role, unless !! excluded)
|
||||
- Custom messages from extensions (user role)
|
||||
- Current prompt + before_agent_start injected messages
|
||||
+
|
||||
Tool definitions:
|
||||
- name, description, parameter JSON schema
|
||||
- Only for active tools (pi.getActiveTools())
|
||||
```
|
||||
|
||||
Understanding this complete surface area — and which levers control which parts — is the key to effective context engineering in pi.
|
||||
|
|
@ -1,17 +0,0 @@
|
|||
# Context & Hooks — Deep Reference
|
||||
|
||||
How context flows through pi, how to intercept and shape it, and advanced patterns for extension authors.
|
||||
|
||||
These documents fill gaps between the high-level extending-pi docs and the actual source implementation. Read the extending-pi docs first for fundamentals, then use these for precision work.
|
||||
|
||||
## Documents
|
||||
|
||||
| # | Document | When to read |
|
||||
|---|----------|-------------|
|
||||
| 01 | [The Context Pipeline](01-the-context-pipeline.md) | Understanding the full journey of a user prompt through every transformation stage to the LLM |
|
||||
| 02 | [Hook Reference](02-hook-reference.md) | Complete behavioral specification of every hook — timing, chaining, return shapes, edge cases |
|
||||
| 03 | [Context Injection Patterns](03-context-injection-patterns.md) | Practical recipes for injecting, filtering, transforming, and managing context |
|
||||
| 04 | [Message Types and LLM Visibility](04-message-types-and-llm-visibility.md) | How every message type is converted for the LLM, what it sees, what it doesn't |
|
||||
| 05 | [Inter-Extension Communication](05-inter-extension-communication.md) | `pi.events`, shared state patterns, and multi-extension coordination |
|
||||
| 06 | [Advanced Patterns from Source](06-advanced-patterns-from-source.md) | Production patterns extracted from the pi codebase and built-in extensions |
|
||||
| 07 | [The System Prompt Anatomy](07-the-system-prompt-anatomy.md) | How the system prompt is built, every input source, when it's rebuilt, and every lever to shape it |
|
||||
|
|
@ -1,93 +0,0 @@
|
|||
# Cost Management
|
||||
|
||||
GSD tracks token usage and cost for every unit of work dispatched during auto mode. This data powers the dashboard, budget enforcement, and cost projections.
|
||||
|
||||
## Cost Tracking
|
||||
|
||||
Every unit's metrics are captured automatically:
|
||||
|
||||
- **Token counts** — input, output, cache read, cache write, total
|
||||
- **Cost** — USD cost per unit
|
||||
- **Duration** — wall-clock time
|
||||
- **Tool calls** — number of tool invocations
|
||||
- **Message counts** — assistant and user messages
|
||||
|
||||
Data is stored in `.gsd/metrics.json` and survives across sessions.
|
||||
|
||||
### Viewing Costs
|
||||
|
||||
**Dashboard:** `Ctrl+Alt+G` or `/gsd status` shows real-time cost breakdown.
|
||||
|
||||
**Aggregations available:**
|
||||
- By phase (research, planning, execution, completion, reassessment)
|
||||
- By slice (M001/S01, M001/S02, ...)
|
||||
- By model (which models consumed the most budget)
|
||||
- Project totals
|
||||
|
||||
## Budget Ceiling
|
||||
|
||||
Set a maximum spend for a project:
|
||||
|
||||
```yaml
|
||||
---
|
||||
version: 1
|
||||
budget_ceiling: 50.00
|
||||
---
|
||||
```
|
||||
|
||||
### Enforcement Modes
|
||||
|
||||
Control what happens when the ceiling is reached:
|
||||
|
||||
```yaml
|
||||
budget_enforcement: pause # default when ceiling is set
|
||||
```
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `warn` | Log a warning, continue executing |
|
||||
| `pause` | Pause auto mode, wait for user action |
|
||||
| `halt` | Stop auto mode entirely |
|
||||
|
||||
## Cost Projections
|
||||
|
||||
Once at least two slices have completed, GSD projects the remaining cost:
|
||||
|
||||
```
|
||||
Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)
|
||||
```
|
||||
|
||||
Projections use per-slice averages from completed work. If the budget ceiling has been reached, a warning is appended.
|
||||
|
||||
## Budget Pressure & Model Downgrading
|
||||
|
||||
When approaching the budget ceiling, the [complexity router](./token-optimization.md#budget-pressure) automatically downgrades model assignments to cheaper tiers. This is graduated:
|
||||
|
||||
- **< 50% used** — no adjustment
|
||||
- **50-75% used** — standard tasks downgrade to light
|
||||
- **75-90% used** — same, more aggressive
|
||||
- **> 90% used** — nearly everything downgrades; only heavy tasks stay at standard
|
||||
|
||||
This ensures the budget is spread across remaining work instead of being exhausted early on complex tasks.
|
||||
|
||||
## Token Profiles & Cost
|
||||
|
||||
The `token_profile` preference directly affects cost:
|
||||
|
||||
| Profile | Typical Savings | How |
|
||||
|---------|----------------|-----|
|
||||
| `budget` | 40-60% | Cheaper models, phase skipping, minimal context |
|
||||
| `balanced` | 10-20% | Default models, skip slice research, standard context |
|
||||
| `quality` | 0% (baseline) | Full models, all phases, full context |
|
||||
|
||||
See [Token Optimization](./token-optimization.md) for details.
|
||||
|
||||
## Tips
|
||||
|
||||
- Start with `balanced` profile and a generous `budget_ceiling` to establish baseline costs
|
||||
- Check `/gsd status` after a few slices to see per-slice cost averages
|
||||
- Switch to `budget` profile for well-understood, repetitive work
|
||||
- Use `quality` only when architectural decisions are being made
|
||||
- Per-phase model selection lets you use Opus only for planning while keeping execution on Sonnet
|
||||
- Enable `dynamic_routing` for automatic model downgrading on simple tasks — see [Dynamic Model Routing](./dynamic-model-routing.md)
|
||||
- Use `/gsd visualize` → Metrics tab to see where your budget is going
|
||||
|
|
@ -1,335 +0,0 @@
|
|||
# Custom Models
|
||||
|
||||
Add custom providers and models (Ollama, vLLM, LM Studio, proxies) via `~/.gsd/agent/models.json`.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Minimal Example](#minimal-example)
|
||||
- [Full Example](#full-example)
|
||||
- [Supported APIs](#supported-apis)
|
||||
- [Provider Configuration](#provider-configuration)
|
||||
- [Model Configuration](#model-configuration)
|
||||
- [Overriding Built-in Providers](#overriding-built-in-providers)
|
||||
- [Per-model Overrides](#per-model-overrides)
|
||||
- [OpenAI Compatibility](#openai-compatibility)
|
||||
|
||||
## Minimal Example
|
||||
|
||||
For local models (Ollama, LM Studio, vLLM), only `id` is required per model:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"baseUrl": "http://localhost:11434/v1",
|
||||
"api": "openai-completions",
|
||||
"apiKey": "ollama",
|
||||
"models": [
|
||||
{ "id": "llama3.1:8b" },
|
||||
{ "id": "qwen2.5-coder:7b" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `apiKey` is required but Ollama ignores it, so any value works.
|
||||
|
||||
Some OpenAI-compatible servers do not understand the `developer` role used for reasoning-capable models. For those providers, set `compat.supportsDeveloperRole` to `false` so GSD sends the system prompt as a `system` message instead. If the server also does not support `reasoning_effort`, set `compat.supportsReasoningEffort` to `false` too.
|
||||
|
||||
You can set `compat` at the provider level to apply to all models, or at the model level to override a specific model. This commonly applies to Ollama, vLLM, SGLang, and similar OpenAI-compatible servers.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"baseUrl": "http://localhost:11434/v1",
|
||||
"api": "openai-completions",
|
||||
"apiKey": "ollama",
|
||||
"compat": {
|
||||
"supportsDeveloperRole": false,
|
||||
"supportsReasoningEffort": false
|
||||
},
|
||||
"models": [
|
||||
{
|
||||
"id": "gpt-oss:20b",
|
||||
"reasoning": true
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Full Example
|
||||
|
||||
Override defaults when you need specific values:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"ollama": {
|
||||
"baseUrl": "http://localhost:11434/v1",
|
||||
"api": "openai-completions",
|
||||
"apiKey": "ollama",
|
||||
"models": [
|
||||
{
|
||||
"id": "llama3.1:8b",
|
||||
"name": "Llama 3.1 8B (Local)",
|
||||
"reasoning": false,
|
||||
"input": ["text"],
|
||||
"contextWindow": 128000,
|
||||
"maxTokens": 32000,
|
||||
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The file reloads each time you open `/model`. Edit during session; no restart needed.
|
||||
|
||||
## Supported APIs
|
||||
|
||||
| API | Description |
|
||||
|-----|-------------|
|
||||
| `openai-completions` | OpenAI Chat Completions (most compatible) |
|
||||
| `openai-responses` | OpenAI Responses API |
|
||||
| `anthropic-messages` | Anthropic Messages API |
|
||||
| `google-generative-ai` | Google Generative AI |
|
||||
|
||||
Set `api` at provider level (default for all models) or model level (override per model).
|
||||
|
||||
## Provider Configuration
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `baseUrl` | API endpoint URL |
|
||||
| `api` | API type (see above) |
|
||||
| `apiKey` | API key (see value resolution below) |
|
||||
| `headers` | Custom headers (see value resolution below) |
|
||||
| `authHeader` | Set `true` to add `Authorization: Bearer <apiKey>` automatically |
|
||||
| `models` | Array of model configurations |
|
||||
| `modelOverrides` | Per-model overrides for built-in models on this provider |
|
||||
|
||||
### Value Resolution
|
||||
|
||||
The `apiKey` and `headers` fields support three formats:
|
||||
|
||||
- **Shell command:** `"!command"` executes and uses stdout
|
||||
```json
|
||||
"apiKey": "!security find-generic-password -ws 'anthropic'"
|
||||
"apiKey": "!op read 'op://vault/item/credential'"
|
||||
```
|
||||
- **Environment variable:** Uses the value of the named variable
|
||||
```json
|
||||
"apiKey": "MY_API_KEY"
|
||||
```
|
||||
- **Literal value:** Used directly
|
||||
```json
|
||||
"apiKey": "sk-..."
|
||||
```
|
||||
|
||||
### Custom Headers
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"custom-proxy": {
|
||||
"baseUrl": "https://proxy.example.com/v1",
|
||||
"apiKey": "MY_API_KEY",
|
||||
"api": "anthropic-messages",
|
||||
"headers": {
|
||||
"x-portkey-api-key": "PORTKEY_API_KEY",
|
||||
"x-secret": "!op read 'op://vault/item/secret'"
|
||||
},
|
||||
"models": [...]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Model Configuration
|
||||
|
||||
| Field | Required | Default | Description |
|
||||
|-------|----------|---------|-------------|
|
||||
| `id` | Yes | — | Model identifier (passed to the API) |
|
||||
| `name` | No | `id` | Human-readable model label. Used for matching (`--model` patterns) and shown in model details/status text. |
|
||||
| `api` | No | provider's `api` | Override provider's API for this model |
|
||||
| `reasoning` | No | `false` | Supports extended thinking |
|
||||
| `input` | No | `["text"]` | Input types: `["text"]` or `["text", "image"]` |
|
||||
| `contextWindow` | No | `128000` | Context window size in tokens |
|
||||
| `maxTokens` | No | `16384` | Maximum output tokens |
|
||||
| `cost` | No | all zeros | `{"input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0}` (per million tokens) |
|
||||
| `compat` | No | provider `compat` | OpenAI compatibility overrides. Merged with provider-level `compat` when both are set. |
|
||||
|
||||
Current behavior:
|
||||
- `/model` and `--list-models` list entries by model `id`.
|
||||
- The configured `name` is used for model matching and detail/status text.
|
||||
|
||||
## Overriding Built-in Providers
|
||||
|
||||
Route a built-in provider through a proxy without redefining models:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"anthropic": {
|
||||
"baseUrl": "https://my-proxy.example.com/v1"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
All built-in Anthropic models remain available. Existing OAuth or API key auth continues to work.
|
||||
|
||||
To merge custom models into a built-in provider, include the `models` array:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"anthropic": {
|
||||
"baseUrl": "https://my-proxy.example.com/v1",
|
||||
"apiKey": "ANTHROPIC_API_KEY",
|
||||
"api": "anthropic-messages",
|
||||
"models": [...]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Merge semantics:
|
||||
- Built-in models are kept.
|
||||
- Custom models are upserted by `id` within the provider.
|
||||
- If a custom model `id` matches a built-in model `id`, the custom model replaces that built-in model.
|
||||
- If a custom model `id` is new, it is added alongside built-in models.
|
||||
|
||||
## Per-model Overrides
|
||||
|
||||
Use `modelOverrides` to customize specific built-in models without replacing the provider's full model list.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openrouter": {
|
||||
"modelOverrides": {
|
||||
"anthropic/claude-sonnet-4": {
|
||||
"name": "Claude Sonnet 4 (Bedrock Route)",
|
||||
"compat": {
|
||||
"openRouterRouting": {
|
||||
"only": ["amazon-bedrock"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`modelOverrides` supports these fields per model: `name`, `reasoning`, `input`, `cost` (partial), `contextWindow`, `maxTokens`, `headers`, `compat`.
|
||||
|
||||
Behavior notes:
|
||||
- `modelOverrides` are applied to built-in provider models.
|
||||
- Unknown model IDs are ignored.
|
||||
- You can combine provider-level `baseUrl`/`headers` with `modelOverrides`.
|
||||
- If `models` is also defined for a provider, custom models are merged after built-in overrides. A custom model with the same `id` replaces the overridden built-in model entry.
|
||||
|
||||
## OpenAI Compatibility
|
||||
|
||||
For providers with partial OpenAI compatibility, use the `compat` field.
|
||||
|
||||
- Provider-level `compat` applies defaults to all models under that provider.
|
||||
- Model-level `compat` overrides provider-level values for that model.
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"local-llm": {
|
||||
"baseUrl": "http://localhost:8080/v1",
|
||||
"api": "openai-completions",
|
||||
"compat": {
|
||||
"supportsUsageInStreaming": false,
|
||||
"maxTokensField": "max_tokens"
|
||||
},
|
||||
"models": [...]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `supportsStore` | Provider supports `store` field |
|
||||
| `supportsDeveloperRole` | Use `developer` vs `system` role |
|
||||
| `supportsReasoningEffort` | Support for `reasoning_effort` parameter |
|
||||
| `reasoningEffortMap` | Map GSD thinking levels to provider-specific `reasoning_effort` values |
|
||||
| `supportsUsageInStreaming` | Supports `stream_options: { include_usage: true }` (default: `true`) |
|
||||
| `maxTokensField` | Use `max_completion_tokens` or `max_tokens` |
|
||||
| `requiresToolResultName` | Include `name` on tool result messages |
|
||||
| `requiresAssistantAfterToolResult` | Insert an assistant message before a user message after tool results |
|
||||
| `requiresThinkingAsText` | Convert thinking blocks to plain text |
|
||||
| `thinkingFormat` | Use `reasoning_effort`, `zai`, `qwen`, or `qwen-chat-template` thinking parameters |
|
||||
| `supportsStrictMode` | Include the `strict` field in tool definitions |
|
||||
| `openRouterRouting` | OpenRouter routing config passed to OpenRouter for model/provider selection |
|
||||
| `vercelGatewayRouting` | Vercel AI Gateway routing config for provider selection (`only`, `order`) |
|
||||
|
||||
`qwen` uses top-level `enable_thinking`. Use `qwen-chat-template` for local Qwen-compatible servers that require `chat_template_kwargs.enable_thinking`.
|
||||
|
||||
Example:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"openrouter": {
|
||||
"baseUrl": "https://openrouter.ai/api/v1",
|
||||
"apiKey": "OPENROUTER_API_KEY",
|
||||
"api": "openai-completions",
|
||||
"models": [
|
||||
{
|
||||
"id": "openrouter/anthropic/claude-3.5-sonnet",
|
||||
"name": "OpenRouter Claude 3.5 Sonnet",
|
||||
"compat": {
|
||||
"openRouterRouting": {
|
||||
"order": ["anthropic"],
|
||||
"fallbacks": ["openai"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Vercel AI Gateway example:
|
||||
|
||||
```json
|
||||
{
|
||||
"providers": {
|
||||
"vercel-ai-gateway": {
|
||||
"baseUrl": "https://ai-gateway.vercel.sh/v1",
|
||||
"apiKey": "AI_GATEWAY_API_KEY",
|
||||
"api": "openai-completions",
|
||||
"models": [
|
||||
{
|
||||
"id": "moonshotai/kimi-k2.5",
|
||||
"name": "Kimi K2.5 (Fireworks via Vercel)",
|
||||
"reasoning": true,
|
||||
"input": ["text", "image"],
|
||||
"cost": { "input": 0.6, "output": 3, "cacheRead": 0, "cacheWrite": 0 },
|
||||
"contextWindow": 262144,
|
||||
"maxTokens": 262144,
|
||||
"compat": {
|
||||
"vercelGatewayRouting": {
|
||||
"only": ["fireworks", "novita"],
|
||||
"order": ["fireworks", "novita"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
|
@ -1,127 +0,0 @@
|
|||
# Dynamic Model Routing
|
||||
|
||||
*Introduced in v2.19.0*
|
||||
|
||||
Dynamic model routing automatically selects cheaper models for simple work and reserves expensive models for complex tasks. This reduces token consumption by 20-50% on capped plans without sacrificing quality where it matters.
|
||||
|
||||
## How It Works
|
||||
|
||||
Each unit dispatched by auto-mode is classified into a complexity tier:
|
||||
|
||||
| Tier | Typical Work | Default Model Level |
|
||||
|------|-------------|-------------------|
|
||||
| **Light** | Slice completion, UAT, hooks | Haiku-class |
|
||||
| **Standard** | Research, planning, execution, milestone completion | Sonnet-class |
|
||||
| **Heavy** | Replanning, roadmap reassessment, complex execution | Opus-class |
|
||||
|
||||
The router then selects a model for that tier. The key rule: **downgrade-only semantics**. The user's configured model is always the ceiling — routing never upgrades beyond what you've configured.
|
||||
|
||||
## Enabling
|
||||
|
||||
Dynamic routing is off by default. Enable it in preferences:
|
||||
|
||||
```yaml
|
||||
---
|
||||
version: 1
|
||||
dynamic_routing:
|
||||
enabled: true
|
||||
---
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
dynamic_routing:
|
||||
enabled: true
|
||||
tier_models: # explicit model per tier (optional)
|
||||
light: claude-haiku-4-5
|
||||
standard: claude-sonnet-4-6
|
||||
heavy: claude-opus-4-6
|
||||
escalate_on_failure: true # bump tier on task failure (default: true)
|
||||
budget_pressure: true # auto-downgrade when approaching budget ceiling (default: true)
|
||||
cross_provider: true # consider models from other providers (default: true)
|
||||
hooks: true # apply routing to post-unit hooks (default: true)
|
||||
```
|
||||
|
||||
### `tier_models`
|
||||
|
||||
Override which model is used for each tier. When omitted, the router uses a built-in capability mapping that knows common model families:
|
||||
|
||||
- **Light:** `claude-haiku-4-5`, `gpt-4o-mini`, `gemini-2.0-flash`
|
||||
- **Standard:** `claude-sonnet-4-6`, `gpt-4o`, `gemini-2.5-pro`
|
||||
- **Heavy:** `claude-opus-4-6`, `gpt-4.5-preview`, `gemini-2.5-pro`
|
||||
|
||||
### `escalate_on_failure`
|
||||
|
||||
When a task fails at a given tier, the router escalates to the next tier on retry. Light → Standard → Heavy. This prevents cheap models from burning retries on work that needs more reasoning.
|
||||
|
||||
### `budget_pressure`
|
||||
|
||||
When approaching the budget ceiling, the router progressively downgrades:
|
||||
|
||||
| Budget Used | Effect |
|
||||
|------------|--------|
|
||||
| < 50% | No adjustment |
|
||||
| 50-75% | Standard → Light |
|
||||
| 75-90% | More aggressive downgrading |
|
||||
| > 90% | Nearly everything → Light; only Heavy stays at Standard |
|
||||
|
||||
### `cross_provider`
|
||||
|
||||
When enabled, the router may select models from providers other than your primary. This uses the built-in cost table to find the cheapest model at each tier. Requires the target provider to be configured.
|
||||
|
||||
## Complexity Classification
|
||||
|
||||
Units are classified using pure heuristics — no LLM calls, sub-millisecond:
|
||||
|
||||
### Unit Type Defaults
|
||||
|
||||
| Unit Type | Default Tier |
|
||||
|-----------|-------------|
|
||||
| `complete-slice`, `run-uat` | Light |
|
||||
| `research-*`, `plan-*`, `complete-milestone` | Standard |
|
||||
| `execute-task` | Standard (upgraded by task analysis) |
|
||||
| `replan-slice`, `reassess-roadmap` | Heavy |
|
||||
| `hook/*` | Light |
|
||||
|
||||
### Task Plan Analysis
|
||||
|
||||
For `execute-task` units, the classifier analyzes the task plan:
|
||||
|
||||
| Signal | Simple → Light | Complex → Heavy |
|
||||
|--------|---------------|----------------|
|
||||
| Step count | ≤ 3 | ≥ 8 |
|
||||
| File count | ≤ 3 | ≥ 8 |
|
||||
| Description length | < 500 chars | > 2000 chars |
|
||||
| Code blocks | — | ≥ 5 |
|
||||
| Complexity keywords | None | Present |
|
||||
|
||||
**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`, `distributed`, `backward compat`
|
||||
|
||||
### Adaptive Learning
|
||||
|
||||
The routing history (`.gsd/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20% for a given pattern, future classifications are bumped up. User feedback (`over`/`under`/`ok`) is weighted 2× vs automatic outcomes.
|
||||
|
||||
## Interaction with Token Profiles
|
||||
|
||||
Dynamic routing and token profiles are complementary:
|
||||
|
||||
- **Token profiles** (`budget`/`balanced`/`quality`) control phase skipping and context compression
|
||||
- **Dynamic routing** controls per-unit model selection within the configured phase model
|
||||
|
||||
When both are active, token profiles set the baseline models and dynamic routing further optimizes within those baselines. The `budget` token profile + dynamic routing provides maximum cost savings.
|
||||
|
||||
## Cost Table
|
||||
|
||||
The router includes a built-in cost table for common models, used for cross-provider cost comparison. Costs are per-million tokens (input/output):
|
||||
|
||||
| Model | Input | Output |
|
||||
|-------|-------|--------|
|
||||
| claude-haiku-4-5 | $0.80 | $4.00 |
|
||||
| claude-sonnet-4-6 | $3.00 | $15.00 |
|
||||
| claude-opus-4-6 | $15.00 | $75.00 |
|
||||
| gpt-4o-mini | $0.15 | $0.60 |
|
||||
| gpt-4o | $2.50 | $10.00 |
|
||||
| gemini-2.0-flash | $0.10 | $0.40 |
|
||||
|
||||
The cost table is used for comparison only — actual billing comes from your provider.
|
||||
|
|
@ -1,18 +0,0 @@
|
|||
# What Are Extensions?
|
||||
|
||||
|
||||
Extensions are TypeScript modules that hook into pi's runtime to extend its behavior. They can:
|
||||
|
||||
- **Register custom tools** the LLM can call (via `pi.registerTool()`)
|
||||
- **Intercept and modify events** — block dangerous tool calls, transform user input, inject context
|
||||
- **Register slash commands** (`/mycommand`) for the user
|
||||
- **Render custom UI** — dialogs, selectors, games, overlays, custom editors
|
||||
- **Persist state** across session restarts
|
||||
- **Control how tool calls and messages appear** in the TUI
|
||||
- **Modify the system prompt** dynamically per-turn
|
||||
- **Manage models and providers** — register custom providers, switch models
|
||||
- **Override built-in tools** — wrap `read`, `bash`, `edit`, `write` with custom logic
|
||||
|
||||
**Why this matters:** Extensions are the primary mechanism for customizing pi. They turn pi from a generic coding agent into *your* coding agent — with your guardrails, your tools, your workflow.
|
||||
|
||||
---
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
# Architecture & Mental Model
|
||||
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Pi Runtime │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
|
||||
│ │ Session │ │ Agent │ │ Tool Executor │ │
|
||||
│ │ Manager │ │ Loop │ │ │ │
|
||||
│ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │
|
||||
│ │ │ │ │
|
||||
│ └──────────────┼─────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌───────▼────────┐ │
|
||||
│ │ Event System │ ◄── All events flow │
|
||||
│ └───────┬────────┘ through here │
|
||||
│ │ │
|
||||
│ ┌────────────┼────────────┐ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ Extension A Extension B Extension C │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Key concepts:**
|
||||
|
||||
- **Extensions are loaded once** when pi starts (or on `/reload`). Your default export function runs, and you subscribe to events and register tools/commands during that function call.
|
||||
- **Events are the communication mechanism.** Pi emits events at every stage of its lifecycle. Your extension listens and reacts.
|
||||
- **Tools are the LLM's interface to your extension.** The LLM sees tool descriptions in its system prompt and calls them when appropriate.
|
||||
- **Commands are the user's interface.** Users type `/mycommand` to invoke your extension directly.
|
||||
- **State lives in tool result `details`** for proper branching/forking support, or in `pi.appendEntry()` for extension-private state.
|
||||
|
||||
---
|
||||
|
|
@ -1,36 +0,0 @@
|
|||
# Getting Started
|
||||
|
||||
|
||||
### Minimal Extension
|
||||
|
||||
Create `~/.gsd/agent/extensions/my-extension.ts`:
|
||||
|
||||
```typescript
|
||||
import type { ExtensionAPI } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
ctx.ui.notify("Extension loaded!", "info");
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Testing
|
||||
|
||||
```bash
|
||||
# Quick test (doesn't need to be in extensions dir)
|
||||
pi -e ./my-extension.ts
|
||||
|
||||
# Or just place it in the extensions dir and start pi
|
||||
pi
|
||||
```
|
||||
|
||||
### Hot Reload
|
||||
|
||||
Extensions in auto-discovered locations (`~/.gsd/agent/extensions/` or `.gsd/extensions/`) can be hot-reloaded:
|
||||
|
||||
```
|
||||
/reload
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,32 +0,0 @@
|
|||
# Extension Locations & Discovery
|
||||
|
||||
|
||||
### Auto-Discovery Paths
|
||||
|
||||
| Location | Scope |
|
||||
|----------|-------|
|
||||
| `~/.gsd/agent/extensions/*.ts` | Global (all projects) |
|
||||
| `~/.gsd/agent/extensions/*/index.ts` | Global (subdirectory) |
|
||||
| `.gsd/extensions/*.ts` | Project-local |
|
||||
| `.gsd/extensions/*/index.ts` | Project-local (subdirectory) |
|
||||
|
||||
### Additional Paths (via settings.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"extensions": [
|
||||
"/path/to/local/extension.ts",
|
||||
"/path/to/local/extension/dir"
|
||||
],
|
||||
"packages": [
|
||||
"npm:@foo/bar@1.0.0",
|
||||
"git:github.com/user/repo@v1"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Security Warning
|
||||
|
||||
> Extensions run with your **full system permissions**. They can execute arbitrary code, read/write any file, make network requests. Only install from sources you trust.
|
||||
|
||||
---
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
# Extension Structure & Styles
|
||||
|
||||
|
||||
### Single File (simplest)
|
||||
|
||||
```
|
||||
~/.gsd/agent/extensions/
|
||||
└── my-extension.ts
|
||||
```
|
||||
|
||||
### Directory with index.ts (multi-file)
|
||||
|
||||
```
|
||||
~/.gsd/agent/extensions/
|
||||
└── my-extension/
|
||||
├── index.ts # Entry point (must export default function)
|
||||
├── tools.ts
|
||||
└── utils.ts
|
||||
```
|
||||
|
||||
### Package with Dependencies (npm packages needed)
|
||||
|
||||
```
|
||||
~/.gsd/agent/extensions/
|
||||
└── my-extension/
|
||||
├── package.json
|
||||
├── package-lock.json
|
||||
├── node_modules/
|
||||
└── src/
|
||||
└── index.ts
|
||||
```
|
||||
|
||||
```json
|
||||
// package.json
|
||||
{
|
||||
"name": "my-extension",
|
||||
"dependencies": { "zod": "^3.0.0" },
|
||||
"pi": { "extensions": ["./src/index.ts"] }
|
||||
}
|
||||
```
|
||||
|
||||
Run `npm install` in the extension directory. Imports from `node_modules/` resolve automatically.
|
||||
|
||||
### Available Imports
|
||||
|
||||
| Package | Purpose |
|
||||
|---------|---------|
|
||||
| `@mariozechner/pi-coding-agent` | Extension types (`ExtensionAPI`, `ExtensionContext`, event types, utilities) |
|
||||
| `@sinclair/typebox` | Schema definitions for tool parameters (`Type.Object`, `Type.String`, etc.) |
|
||||
| `@mariozechner/pi-ai` | AI utilities (`StringEnum` for Google-compatible enums) |
|
||||
| `@mariozechner/pi-tui` | TUI components (`Text`, `Box`, `Container`, `SelectList`, etc.) |
|
||||
| Node.js built-ins | `node:fs`, `node:path`, `node:child_process`, etc. |
|
||||
|
||||
---
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
# The Extension Lifecycle
|
||||
|
||||
|
||||
```
|
||||
pi starts
|
||||
│
|
||||
└─► Extension default function runs
|
||||
├── pi.on("event", handler) ← Subscribe to events
|
||||
├── pi.registerTool({...}) ← Register tools
|
||||
├── pi.registerCommand(...) ← Register commands
|
||||
└── pi.registerShortcut(...) ← Register keyboard shortcuts
|
||||
|
||||
└─► session_start event fires
|
||||
│
|
||||
▼
|
||||
User types a prompt ─────────────────────────────────────┐
|
||||
│ │
|
||||
├─► Extension commands checked (bypass if match) │
|
||||
├─► input event (can intercept/transform) │
|
||||
├─► Skill/template expansion │
|
||||
├─► before_agent_start (inject message, modify │
|
||||
│ system prompt) │
|
||||
├─► agent_start │
|
||||
│ │
|
||||
│ ┌── Turn loop (repeats while LLM calls tools)──┐│
|
||||
│ │ turn_start ││
|
||||
│ │ context (can modify messages sent to LLM) ││
|
||||
│ │ LLM responds → may call tools: ││
|
||||
│ │ tool_call (can BLOCK) ││
|
||||
│ │ tool_execution_start/update/end ││
|
||||
│ │ tool_result (can MODIFY) ││
|
||||
│ │ turn_end ││
|
||||
│ └───────────────────────────────────────────────┘│
|
||||
│ │
|
||||
└─► agent_end │
|
||||
│
|
||||
User types another prompt ◄──────────────────────────────┘
|
||||
```
|
||||
|
||||
**Critical insight:** The event system is your primary mechanism for interacting with pi. Every meaningful thing that happens emits an event, and most events let you modify or block the behavior.
|
||||
|
||||
---
|
||||
|
|
@ -1,93 +0,0 @@
|
|||
# Events — The Nervous System
|
||||
|
||||
|
||||
Events are the core of the extension system. They fall into five categories:
|
||||
|
||||
### 7.1 Session Events
|
||||
|
||||
| Event | When | Can Return |
|
||||
|-------|------|------------|
|
||||
| `session_start` | Session loads | — |
|
||||
| `session_before_switch` | Before `/new` or `/resume` | `{ cancel: true }` |
|
||||
| `session_switch` | After session switch | — |
|
||||
| `session_before_fork` | Before `/fork` | `{ cancel: true }` or `{ skipConversationRestore: true }` |
|
||||
| `session_fork` | After fork | — |
|
||||
| `session_before_compact` | Before compaction | `{ cancel: true }` or `{ compaction: {...} }` (custom summary) |
|
||||
| `session_compact` | After compaction | — |
|
||||
| `session_before_tree` | Before `/tree` navigation | `{ cancel: true }` or `{ summary: {...} }` |
|
||||
| `session_tree` | After tree navigation | — |
|
||||
| `session_shutdown` | On exit (Ctrl+C, Ctrl+D, SIGTERM) | — |
|
||||
|
||||
### 7.2 Agent Events
|
||||
|
||||
| Event | When | Can Return |
|
||||
|-------|------|------------|
|
||||
| `before_agent_start` | After user prompt, before agent loop | `{ message: {...}, systemPrompt: "..." }` |
|
||||
| `agent_start` | Agent loop begins | — |
|
||||
| `agent_end` | Agent loop ends | — |
|
||||
| `turn_start` | Each LLM turn begins | — |
|
||||
| `turn_end` | Each LLM turn ends | — |
|
||||
| `context` | Before each LLM call | `{ messages: [...] }` (modified copy) |
|
||||
| `message_start/update/end` | Message lifecycle | — |
|
||||
|
||||
### 7.3 Tool Events
|
||||
|
||||
| Event | When | Can Return |
|
||||
|-------|------|------------|
|
||||
| `tool_call` | Before tool executes | `{ block: true, reason: "..." }` |
|
||||
| `tool_execution_start` | Tool begins executing | — |
|
||||
| `tool_execution_update` | Tool sends progress | — |
|
||||
| `tool_execution_end` | Tool finishes | — |
|
||||
| `tool_result` | After tool executes | `{ content: [...], details: {...}, isError: bool }` (modify result) |
|
||||
|
||||
### 7.4 Input Events
|
||||
|
||||
| Event | When | Can Return |
|
||||
|-------|------|------------|
|
||||
| `input` | User input received (before skill/template expansion) | `{ action: "transform", text: "..." }` or `{ action: "handled" }` or `{ action: "continue" }` |
|
||||
|
||||
### 7.5 Model Events
|
||||
|
||||
| Event | When | Can Return |
|
||||
|-------|------|------------|
|
||||
| `model_select` | Model changes (`/model`, Ctrl+P, restore) | — |
|
||||
|
||||
### 7.6 User Bash Events
|
||||
|
||||
| Event | When | Can Return |
|
||||
|-------|------|------------|
|
||||
| `user_bash` | User runs `!` or `!!` commands | `{ operations: ... }` or `{ result: {...} }` |
|
||||
|
||||
### Event Handler Signature
|
||||
|
||||
```typescript
|
||||
pi.on("event_name", async (event, ctx: ExtensionContext) => {
|
||||
// event — typed payload for this event
|
||||
// ctx — access to UI, session, model, and control flow
|
||||
|
||||
// Return undefined for no action, or a typed response object
|
||||
});
|
||||
```
|
||||
|
||||
### Type Narrowing for Tool Events
|
||||
|
||||
```typescript
|
||||
import { isToolCallEventType, isToolResultEventType } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
pi.on("tool_call", async (event, ctx) => {
|
||||
if (isToolCallEventType("bash", event)) {
|
||||
// event.input is typed as { command: string; timeout?: number }
|
||||
}
|
||||
if (isToolCallEventType("write", event)) {
|
||||
// event.input is typed as { path: string; content: string }
|
||||
}
|
||||
});
|
||||
|
||||
pi.on("tool_result", async (event, ctx) => {
|
||||
if (isToolResultEventType("bash", event)) {
|
||||
// event.details is typed as BashToolDetails
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,85 +0,0 @@
|
|||
# ExtensionContext — What You Can Access
|
||||
|
||||
|
||||
Every event handler receives `ctx: ExtensionContext`. This is your window into pi's runtime state.
|
||||
|
||||
### ctx.ui — User Interaction
|
||||
|
||||
The primary way to interact with the user. See [Section 12: Custom UI](#12-custom-ui--visual-components) for full details.
|
||||
|
||||
```typescript
|
||||
// Dialogs (blocking, wait for user response)
|
||||
const choice = await ctx.ui.select("Pick one:", ["A", "B", "C"]);
|
||||
const ok = await ctx.ui.confirm("Delete?", "This cannot be undone");
|
||||
const name = await ctx.ui.input("Name:", "placeholder");
|
||||
const text = await ctx.ui.editor("Edit:", "prefilled text");
|
||||
|
||||
// Non-blocking UI
|
||||
ctx.ui.notify("Done!", "info"); // Toast notification
|
||||
ctx.ui.setStatus("my-ext", "Active"); // Footer status
|
||||
ctx.ui.setWidget("my-id", ["Line 1"]); // Widget above/below editor
|
||||
ctx.ui.setTitle("pi - my project"); // Terminal title
|
||||
ctx.ui.setEditorText("Prefill text"); // Set editor content
|
||||
ctx.ui.setWorkingMessage("Thinking..."); // Working message during streaming
|
||||
```
|
||||
|
||||
### ctx.hasUI
|
||||
|
||||
`false` in print mode (`-p`) and JSON mode. `true` in interactive and RPC mode. Always check before calling dialog methods in non-interactive contexts.
|
||||
|
||||
### ctx.cwd
|
||||
|
||||
Current working directory (string).
|
||||
|
||||
### ctx.sessionManager — Session State
|
||||
|
||||
Read-only access to the session:
|
||||
|
||||
```typescript
|
||||
ctx.sessionManager.getEntries() // All entries in session
|
||||
ctx.sessionManager.getBranch() // Current branch entries
|
||||
ctx.sessionManager.getLeafId() // Current leaf entry ID
|
||||
ctx.sessionManager.getSessionFile() // Path to session JSONL file
|
||||
ctx.sessionManager.getLabel(entryId) // Get label on entry
|
||||
```
|
||||
|
||||
### ctx.modelRegistry / ctx.model
|
||||
|
||||
Access to available models and the current model.
|
||||
|
||||
### ctx.isIdle() / ctx.abort() / ctx.hasPendingMessages()
|
||||
|
||||
Control flow helpers for checking agent state.
|
||||
|
||||
### ctx.shutdown()
|
||||
|
||||
Request graceful shutdown. Deferred until agent is idle. Emits `session_shutdown` before exiting.
|
||||
|
||||
### ctx.getContextUsage()
|
||||
|
||||
Returns current context token usage. Useful for triggering compaction or showing stats.
|
||||
|
||||
```typescript
|
||||
const usage = ctx.getContextUsage();
|
||||
if (usage && usage.tokens > 100_000) {
|
||||
// Context is getting large
|
||||
}
|
||||
```
|
||||
|
||||
### ctx.compact(options?)
|
||||
|
||||
Trigger compaction programmatically:
|
||||
|
||||
```typescript
|
||||
ctx.compact({
|
||||
customInstructions: "Focus on recent changes",
|
||||
onComplete: (result) => ctx.ui.notify("Compacted!", "info"),
|
||||
onError: (error) => ctx.ui.notify(`Failed: ${error.message}`, "error"),
|
||||
});
|
||||
```
|
||||
|
||||
### ctx.getSystemPrompt()
|
||||
|
||||
Returns the current effective system prompt (including any `before_agent_start` modifications).
|
||||
|
||||
---
|
||||
|
|
@ -1,77 +0,0 @@
|
|||
# ExtensionAPI — What You Can Do
|
||||
|
||||
|
||||
The `pi` object (received in your default export function) is your registration interface. It persists for the lifetime of the extension.
|
||||
|
||||
### Core Registration
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `pi.on(event, handler)` | Subscribe to events |
|
||||
| `pi.registerTool(definition)` | Register a tool the LLM can call |
|
||||
| `pi.registerCommand(name, options)` | Register a `/command` |
|
||||
| `pi.registerShortcut(key, options)` | Register a keyboard shortcut |
|
||||
| `pi.registerFlag(name, options)` | Register a CLI flag |
|
||||
| `pi.registerMessageRenderer(customType, renderer)` | Custom message rendering |
|
||||
| `pi.registerProvider(name, config)` | Register/override a model provider |
|
||||
| `pi.unregisterProvider(name)` | Remove a provider |
|
||||
|
||||
### Messaging
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `pi.sendMessage(message, options?)` | Inject a custom message into the session |
|
||||
| `pi.sendUserMessage(content, options?)` | Send a user message (triggers a turn) |
|
||||
|
||||
**`sendMessage` delivery modes:**
|
||||
- `"steer"` (default) — Interrupts streaming. Delivered after current tool finishes, remaining tools skipped.
|
||||
- `"followUp"` — Waits for agent to finish. Delivered when agent has no more tool calls.
|
||||
- `"nextTurn"` — Queued for next user prompt. Does not interrupt.
|
||||
|
||||
### State & Session
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `pi.appendEntry(customType, data?)` | Persist extension state (NOT sent to LLM) |
|
||||
| `pi.setSessionName(name)` | Set display name for session selector |
|
||||
| `pi.getSessionName()` | Get current session name |
|
||||
| `pi.setLabel(entryId, label)` | Bookmark an entry for `/tree` navigation |
|
||||
|
||||
### Tool Management
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `pi.getActiveTools()` | Get currently active tool names |
|
||||
| `pi.getAllTools()` | Get all registered tools (name + description) |
|
||||
| `pi.setActiveTools(names)` | Enable/disable tools at runtime |
|
||||
|
||||
### Model Management
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `pi.setModel(model)` | Switch model. Returns `false` if no API key. |
|
||||
| `pi.getThinkingLevel()` | Get current thinking level |
|
||||
| `pi.setThinkingLevel(level)` | Set thinking level (`"off"` through `"xhigh"`) |
|
||||
|
||||
### Utilities
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `pi.exec(command, args, options?)` | Execute a shell command |
|
||||
| `pi.events` | Shared event bus for inter-extension communication |
|
||||
| `pi.getFlag(name)` | Get value of a registered CLI flag |
|
||||
| `pi.getCommands()` | Get all available slash commands |
|
||||
|
||||
### ExtensionCommandContext (commands only)
|
||||
|
||||
Command handlers receive `ExtensionCommandContext`, which adds session control methods not available in regular event handlers (they would deadlock there):
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `ctx.waitForIdle()` | Wait for agent to finish streaming |
|
||||
| `ctx.newSession(options?)` | Create a new session |
|
||||
| `ctx.fork(entryId)` | Fork from an entry |
|
||||
| `ctx.navigateTree(targetId, options?)` | Navigate the session tree |
|
||||
| `ctx.reload()` | Hot-reload extensions, skills, prompts, themes |
|
||||
|
||||
---
|
||||
|
|
@ -1,148 +0,0 @@
|
|||
# Custom Tools — Giving the LLM New Abilities
|
||||
|
||||
|
||||
Tools are the most powerful extension capability. They appear in the LLM's system prompt and the LLM calls them autonomously when appropriate.
|
||||
|
||||
### Tool Definition
|
||||
|
||||
```typescript
|
||||
import { Type } from "@sinclair/typebox";
|
||||
import { StringEnum } from "@mariozechner/pi-ai";
|
||||
|
||||
pi.registerTool({
|
||||
name: "my_tool", // Unique identifier
|
||||
label: "My Tool", // Display name in TUI
|
||||
description: "What this does", // Shown to LLM in system prompt
|
||||
|
||||
// Optional: customize the one-liner in the system prompt's "Available tools" section
|
||||
promptSnippet: "List or add items to the project todo list",
|
||||
|
||||
// Optional: add bullets to the system prompt's "Guidelines" section when tool is active
|
||||
promptGuidelines: [
|
||||
"Use this tool for todo planning instead of direct file edits."
|
||||
],
|
||||
|
||||
// Parameter schema (MUST use TypeBox)
|
||||
parameters: Type.Object({
|
||||
action: StringEnum(["list", "add"] as const), // ⚠️ Use StringEnum, NOT Type.Union/Type.Literal
|
||||
text: Type.Optional(Type.String()),
|
||||
}),
|
||||
|
||||
// The execution function
|
||||
async execute(toolCallId, params, signal, onUpdate, ctx) {
|
||||
// Check for cancellation
|
||||
if (signal?.aborted) {
|
||||
return { content: [{ type: "text", text: "Cancelled" }] };
|
||||
}
|
||||
|
||||
// Stream progress updates to the UI
|
||||
onUpdate?.({
|
||||
content: [{ type: "text", text: "Working..." }],
|
||||
details: { progress: 50 },
|
||||
});
|
||||
|
||||
// Do the work
|
||||
const result = await doSomething(params);
|
||||
|
||||
// Return result
|
||||
return {
|
||||
content: [{ type: "text", text: "Done" }], // Sent to LLM as context
|
||||
details: { data: result }, // For rendering & state reconstruction
|
||||
};
|
||||
},
|
||||
|
||||
// Optional: Custom TUI rendering (see Section 14)
|
||||
renderCall(args, theme) { ... },
|
||||
renderResult(result, options, theme) { ... },
|
||||
});
|
||||
```
|
||||
|
||||
### ⚠️ Critical: Use StringEnum
|
||||
|
||||
For string enum parameters, you **must** use `StringEnum` from `@mariozechner/pi-ai`. `Type.Union([Type.Literal("a"), Type.Literal("b")])` does NOT work with Google's API.
|
||||
|
||||
```typescript
|
||||
import { StringEnum } from "@mariozechner/pi-ai";
|
||||
|
||||
// ✅ Correct
|
||||
action: StringEnum(["list", "add", "remove"] as const)
|
||||
|
||||
// ❌ Broken with Google
|
||||
action: Type.Union([Type.Literal("list"), Type.Literal("add")])
|
||||
```
|
||||
|
||||
### Dynamic Tool Registration
|
||||
|
||||
Tools can be registered at any time — during load, in `session_start`, in command handlers, etc. New tools are available immediately without `/reload`.
|
||||
|
||||
```typescript
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
pi.registerTool({ name: "dynamic_tool", ... });
|
||||
});
|
||||
|
||||
pi.registerCommand("add-tool", {
|
||||
handler: async (args, ctx) => {
|
||||
pi.registerTool({ name: "runtime_tool", ... });
|
||||
ctx.ui.notify("Tool registered!", "info");
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### Output Truncation
|
||||
|
||||
**Tools MUST truncate output** to avoid overwhelming the LLM context. The built-in limit is 50KB / 2000 lines (whichever first).
|
||||
|
||||
```typescript
|
||||
import {
|
||||
truncateHead, truncateTail, formatSize,
|
||||
DEFAULT_MAX_BYTES, DEFAULT_MAX_LINES,
|
||||
} from "@mariozechner/pi-coding-agent";
|
||||
|
||||
async execute(toolCallId, params, signal, onUpdate, ctx) {
|
||||
const output = await runCommand();
|
||||
const truncation = truncateHead(output, {
|
||||
maxLines: DEFAULT_MAX_LINES,
|
||||
maxBytes: DEFAULT_MAX_BYTES,
|
||||
});
|
||||
|
||||
let result = truncation.content;
|
||||
if (truncation.truncated) {
|
||||
result += `\n\n[Output truncated: ${truncation.outputLines}/${truncation.totalLines} lines]`;
|
||||
}
|
||||
return { content: [{ type: "text", text: result }] };
|
||||
}
|
||||
```
|
||||
|
||||
### Overriding Built-in Tools
|
||||
|
||||
Register a tool with the same name as a built-in (`read`, `bash`, `edit`, `write`, `grep`, `find`, `ls`) to override it. Your implementation **must match the exact result shape** including the `details` type.
|
||||
|
||||
```bash
|
||||
# Start with no built-in tools, only your extensions
|
||||
pi --no-tools -e ./my-extension.ts
|
||||
```
|
||||
|
||||
### Remote Execution via Pluggable Operations
|
||||
|
||||
Built-in tools support pluggable operations for SSH, containers, etc.:
|
||||
|
||||
```typescript
|
||||
import { createReadTool, createBashTool } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
const remoteBash = createBashTool(cwd, {
|
||||
operations: { execute: (cmd) => sshExec(remote, cmd) }
|
||||
});
|
||||
|
||||
// The bash tool also supports a spawnHook:
|
||||
const bashTool = createBashTool(cwd, {
|
||||
spawnHook: ({ command, cwd, env }) => ({
|
||||
command: `source ~/.profile\n${command}`,
|
||||
cwd: `/mnt/sandbox${cwd}`,
|
||||
env: { ...env, CI: "1" },
|
||||
}),
|
||||
});
|
||||
```
|
||||
|
||||
**Operations interfaces:** `ReadOperations`, `WriteOperations`, `EditOperations`, `BashOperations`, `LsOperations`, `GrepOperations`, `FindOperations`
|
||||
|
||||
---
|
||||
|
|
@ -1,40 +0,0 @@
|
|||
# Custom Commands — User-Facing Actions
|
||||
|
||||
|
||||
Commands let users invoke your extension directly via `/mycommand`.
|
||||
|
||||
```typescript
|
||||
pi.registerCommand("deploy", {
|
||||
description: "Deploy to an environment",
|
||||
|
||||
// Optional: argument auto-completion
|
||||
getArgumentCompletions: (prefix: string) => {
|
||||
const envs = ["dev", "staging", "prod"];
|
||||
return envs
|
||||
.filter(e => e.startsWith(prefix))
|
||||
.map(e => ({ value: e, label: e }));
|
||||
},
|
||||
|
||||
handler: async (args, ctx) => {
|
||||
// args = everything after "/deploy "
|
||||
// ctx = ExtensionCommandContext (has extra session control methods)
|
||||
|
||||
await ctx.waitForIdle(); // Wait for agent to finish
|
||||
ctx.ui.notify(`Deploying to ${args}`, "info");
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### Command Context Extras
|
||||
|
||||
Command handlers get `ExtensionCommandContext` which extends `ExtensionContext` with:
|
||||
|
||||
- `ctx.waitForIdle()` — Wait for agent to finish
|
||||
- `ctx.newSession(options?)` — Create a new session
|
||||
- `ctx.fork(entryId)` — Fork from an entry
|
||||
- `ctx.navigateTree(targetId, options?)` — Navigate the session tree
|
||||
- `ctx.reload()` — Hot-reload everything
|
||||
|
||||
> **Important:** These methods are only available in commands, not in event handlers, because they would deadlock there.
|
||||
|
||||
---
|
||||
|
|
@ -1,195 +0,0 @@
|
|||
# Custom UI — Visual Components
|
||||
|
||||
|
||||
Pi's extension UI has multiple layers, from simple notifications to full custom components.
|
||||
|
||||
### 12.1 Dialogs (Blocking)
|
||||
|
||||
```typescript
|
||||
// Selection
|
||||
const choice = await ctx.ui.select("Pick one:", ["A", "B", "C"]);
|
||||
|
||||
// Confirmation
|
||||
const ok = await ctx.ui.confirm("Delete?", "This cannot be undone");
|
||||
|
||||
// Text input
|
||||
const name = await ctx.ui.input("Name:", "placeholder");
|
||||
|
||||
// Multi-line editor
|
||||
const text = await ctx.ui.editor("Edit:", "prefilled text");
|
||||
```
|
||||
|
||||
#### Timed Dialogs
|
||||
|
||||
```typescript
|
||||
// Auto-dismiss after 5s with countdown: "Title (5s)" → "Title (4s)" → ...
|
||||
const ok = await ctx.ui.confirm("Auto-confirm?", "Proceeds in 5s", { timeout: 5000 });
|
||||
// Returns false on timeout
|
||||
```
|
||||
|
||||
### 12.2 Persistent UI Elements
|
||||
|
||||
```typescript
|
||||
// Footer status (persistent until cleared)
|
||||
ctx.ui.setStatus("my-ext", "● Active");
|
||||
ctx.ui.setStatus("my-ext", undefined); // Clear
|
||||
|
||||
// Widget above editor (default placement)
|
||||
ctx.ui.setWidget("my-widget", ["Line 1", "Line 2"]);
|
||||
|
||||
// Widget below editor
|
||||
ctx.ui.setWidget("my-widget", ["Below!"], { placement: "belowEditor" });
|
||||
|
||||
// Widget with theme callback
|
||||
ctx.ui.setWidget("my-widget", (_tui, theme) => ({
|
||||
render: () => [theme.fg("accent", "Styled widget")],
|
||||
invalidate: () => {},
|
||||
}));
|
||||
|
||||
// Working message during streaming
|
||||
ctx.ui.setWorkingMessage("Analyzing code...");
|
||||
ctx.ui.setWorkingMessage(); // Restore default
|
||||
|
||||
// Custom footer (replaces built-in entirely)
|
||||
ctx.ui.setFooter((tui, theme, footerData) => ({
|
||||
render(width) { return [theme.fg("dim", `branch: ${footerData.getGitBranch()}`)]; },
|
||||
invalidate() {},
|
||||
dispose: footerData.onBranchChange(() => tui.requestRender()),
|
||||
}));
|
||||
|
||||
// Editor control
|
||||
ctx.ui.setEditorText("Prefill");
|
||||
const current = ctx.ui.getEditorText();
|
||||
ctx.ui.pasteToEditor("pasted content");
|
||||
|
||||
// Tool expansion
|
||||
ctx.ui.setToolsExpanded(true);
|
||||
ctx.ui.setToolsExpanded(false);
|
||||
|
||||
// Theme management
|
||||
const themes = ctx.ui.getAllThemes();
|
||||
ctx.ui.setTheme("light");
|
||||
```
|
||||
|
||||
### 12.3 Custom Components (ctx.ui.custom)
|
||||
|
||||
For complex UI, `ctx.ui.custom()` temporarily replaces the editor with your component:
|
||||
|
||||
```typescript
|
||||
const result = await ctx.ui.custom<string | null>((tui, theme, keybindings, done) => {
|
||||
// Return a component object
|
||||
return {
|
||||
render(width: number): string[] {
|
||||
return ["Press Enter to confirm, Escape to cancel"];
|
||||
},
|
||||
handleInput(data: string) {
|
||||
if (matchesKey(data, Key.enter)) done("confirmed");
|
||||
if (matchesKey(data, Key.escape)) done(null);
|
||||
},
|
||||
invalidate() {},
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### 12.4 Overlays (Floating Modals)
|
||||
|
||||
```typescript
|
||||
const result = await ctx.ui.custom<string | null>(
|
||||
(tui, theme, keybindings, done) => new MyDialog({ onClose: done }),
|
||||
{
|
||||
overlay: true,
|
||||
overlayOptions: {
|
||||
anchor: "center", // 9 positions: center, top-left, top-right, etc.
|
||||
width: "50%",
|
||||
maxHeight: "80%",
|
||||
margin: 2,
|
||||
visible: (w, h) => w >= 80, // Hide on narrow terminals
|
||||
},
|
||||
onHandle: (handle) => {
|
||||
// handle.setHidden(true/false)
|
||||
},
|
||||
}
|
||||
);
|
||||
```
|
||||
|
||||
### 12.5 Custom Editor (Replace Main Input)
|
||||
|
||||
```typescript
|
||||
import { CustomEditor } from "@mariozechner/pi-coding-agent";
|
||||
import { matchesKey, truncateToWidth } from "@mariozechner/pi-tui";
|
||||
|
||||
class VimEditor extends CustomEditor {
|
||||
private mode: "normal" | "insert" = "insert";
|
||||
|
||||
handleInput(data: string): void {
|
||||
if (matchesKey(data, "escape") && this.mode === "insert") {
|
||||
this.mode = "normal";
|
||||
return;
|
||||
}
|
||||
if (this.mode === "insert") {
|
||||
super.handleInput(data); // Normal text editing + app keybindings
|
||||
return;
|
||||
}
|
||||
// Vim normal mode keys...
|
||||
if (data === "i") { this.mode = "insert"; return; }
|
||||
super.handleInput(data); // Pass unhandled to parent
|
||||
}
|
||||
}
|
||||
|
||||
// Register:
|
||||
ctx.ui.setEditorComponent((_tui, theme, keybindings) => new VimEditor(theme, keybindings));
|
||||
ctx.ui.setEditorComponent(undefined); // Restore default
|
||||
```
|
||||
|
||||
> **Key point:** Extend `CustomEditor` (not `Editor`) to get app keybindings (escape to abort, ctrl+d, model switching).
|
||||
|
||||
### 12.6 Built-in TUI Components
|
||||
|
||||
Import from `@mariozechner/pi-tui`:
|
||||
|
||||
| Component | Purpose |
|
||||
|-----------|---------|
|
||||
| `Text` | Multi-line text with word wrapping |
|
||||
| `Box` | Container with padding and background |
|
||||
| `Container` | Groups children vertically |
|
||||
| `Spacer` | Empty vertical space |
|
||||
| `Markdown` | Rendered markdown with syntax highlighting |
|
||||
| `Image` | Image rendering (Kitty, iTerm2, etc.) |
|
||||
| `SelectList` | Interactive selection from list |
|
||||
| `SettingsList` | Toggle settings UI |
|
||||
| `Input` | Text input field |
|
||||
|
||||
Import from `@mariozechner/pi-coding-agent`:
|
||||
|
||||
| Component | Purpose |
|
||||
|-----------|---------|
|
||||
| `DynamicBorder` | Border line with theming |
|
||||
| `BorderedLoader` | Spinner with cancel support |
|
||||
|
||||
### 12.7 Keyboard Input
|
||||
|
||||
```typescript
|
||||
import { matchesKey, Key } from "@mariozechner/pi-tui";
|
||||
|
||||
handleInput(data: string) {
|
||||
if (matchesKey(data, Key.up)) { /* arrow up */ }
|
||||
if (matchesKey(data, Key.enter)) { /* enter */ }
|
||||
if (matchesKey(data, Key.escape)) { /* escape */ }
|
||||
if (matchesKey(data, Key.ctrl("c"))) { /* ctrl+c */ }
|
||||
if (matchesKey(data, Key.shift("tab"))) { /* shift+tab */ }
|
||||
}
|
||||
```
|
||||
|
||||
### 12.8 Line Width Rules
|
||||
|
||||
**Critical:** Each line from `render()` must not exceed the `width` parameter.
|
||||
|
||||
```typescript
|
||||
import { visibleWidth, truncateToWidth, wrapTextWithAnsi } from "@mariozechner/pi-tui";
|
||||
|
||||
render(width: number): string[] {
|
||||
return [truncateToWidth(this.text, width)];
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,55 +0,0 @@
|
|||
# State Management & Persistence
|
||||
|
||||
|
||||
### Pattern: State in Tool Result Details
|
||||
|
||||
The recommended approach for stateful tools. State lives in `details` so it works correctly with branching/forking.
|
||||
|
||||
```typescript
|
||||
export default function (pi: ExtensionAPI) {
|
||||
let items: string[] = [];
|
||||
|
||||
// Reconstruct from session on load
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
items = [];
|
||||
for (const entry of ctx.sessionManager.getBranch()) {
|
||||
if (entry.type === "message" && entry.message.role === "toolResult") {
|
||||
if (entry.message.toolName === "my_tool") {
|
||||
items = entry.message.details?.items ?? [];
|
||||
}
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
pi.registerTool({
|
||||
name: "my_tool",
|
||||
// ...
|
||||
async execute(toolCallId, params, signal, onUpdate, ctx) {
|
||||
items.push(params.text);
|
||||
return {
|
||||
content: [{ type: "text", text: "Added" }],
|
||||
details: { items: [...items] }, // ← Snapshot state here
|
||||
};
|
||||
},
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern: Extension-Private State (appendEntry)
|
||||
|
||||
For state that doesn't participate in LLM context but needs to survive restarts:
|
||||
|
||||
```typescript
|
||||
pi.appendEntry("my-state", { count: 42, lastRun: Date.now() });
|
||||
|
||||
// Restore on reload
|
||||
pi.on("session_start", async (_event, ctx) => {
|
||||
for (const entry of ctx.sessionManager.getEntries()) {
|
||||
if (entry.type === "custom" && entry.customType === "my-state") {
|
||||
const data = entry.data; // { count: 42, lastRun: ... }
|
||||
}
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,97 +0,0 @@
|
|||
# Custom Rendering — Controlling What the User Sees
|
||||
|
||||
|
||||
### Tool Rendering
|
||||
|
||||
Tools can provide `renderCall` (how the tool call looks) and `renderResult` (how the result looks):
|
||||
|
||||
```typescript
|
||||
import { Text } from "@mariozechner/pi-tui";
|
||||
import { keyHint } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
pi.registerTool({
|
||||
name: "my_tool",
|
||||
// ...
|
||||
|
||||
renderCall(args, theme) {
|
||||
let text = theme.fg("toolTitle", theme.bold("my_tool "));
|
||||
text += theme.fg("muted", args.action);
|
||||
return new Text(text, 0, 0); // 0,0 padding — Box handles it
|
||||
},
|
||||
|
||||
renderResult(result, { expanded, isPartial }, theme) {
|
||||
if (isPartial) {
|
||||
return new Text(theme.fg("warning", "Processing..."), 0, 0);
|
||||
}
|
||||
|
||||
let text = theme.fg("success", "✓ Done");
|
||||
if (!expanded) {
|
||||
text += ` (${keyHint("expandTools", "to expand")})`;
|
||||
}
|
||||
if (expanded && result.details?.items) {
|
||||
for (const item of result.details.items) {
|
||||
text += "\n " + theme.fg("dim", item);
|
||||
}
|
||||
}
|
||||
return new Text(text, 0, 0);
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### Message Rendering
|
||||
|
||||
Register a renderer for custom message types:
|
||||
|
||||
```typescript
|
||||
import { Text } from "@mariozechner/pi-tui";
|
||||
|
||||
pi.registerMessageRenderer("my-extension", (message, options, theme) => {
|
||||
const { expanded } = options;
|
||||
let text = theme.fg("accent", `[${message.customType}] `) + message.content;
|
||||
if (expanded && message.details) {
|
||||
text += "\n" + theme.fg("dim", JSON.stringify(message.details, null, 2));
|
||||
}
|
||||
return new Text(text, 0, 0);
|
||||
});
|
||||
|
||||
// Send messages that use this renderer:
|
||||
pi.sendMessage({
|
||||
customType: "my-extension", // Matches the renderer
|
||||
content: "Status update",
|
||||
display: true,
|
||||
details: { foo: "bar" },
|
||||
});
|
||||
```
|
||||
|
||||
### Theme Colors Reference
|
||||
|
||||
```typescript
|
||||
// Foreground: theme.fg(color, text)
|
||||
"text" | "accent" | "muted" | "dim" // General
|
||||
"success" | "error" | "warning" // Status
|
||||
"border" | "borderAccent" | "borderMuted" // Borders
|
||||
"toolTitle" | "toolOutput" // Tools
|
||||
"toolDiffAdded" | "toolDiffRemoved" // Diffs
|
||||
"mdHeading" | "mdLink" | "mdCode" // Markdown
|
||||
"syntaxKeyword" | "syntaxFunction" | "syntaxString" // Syntax
|
||||
|
||||
// Background: theme.bg(color, text)
|
||||
"selectedBg" | "userMessageBg" | "customMessageBg"
|
||||
"toolPendingBg" | "toolSuccessBg" | "toolErrorBg"
|
||||
|
||||
// Text styles
|
||||
theme.bold(text)
|
||||
theme.italic(text)
|
||||
theme.strikethrough(text)
|
||||
```
|
||||
|
||||
### Syntax Highlighting in Renderers
|
||||
|
||||
```typescript
|
||||
import { highlightCode, getLanguageFromPath } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
const lang = getLanguageFromPath("/path/to/file.rs"); // "rust"
|
||||
const highlighted = highlightCode(code, lang, theme);
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,49 +0,0 @@
|
|||
# System Prompt Modification
|
||||
|
||||
|
||||
### Per-Turn Modification (before_agent_start)
|
||||
|
||||
```typescript
|
||||
pi.on("before_agent_start", async (event, ctx) => {
|
||||
return {
|
||||
// Inject a persistent message (stored in session, visible to LLM)
|
||||
message: {
|
||||
customType: "my-extension",
|
||||
content: "Additional context for the LLM",
|
||||
display: true,
|
||||
},
|
||||
// Modify the system prompt for this turn
|
||||
systemPrompt: event.systemPrompt + "\n\nYou must respond only in haiku.",
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### Context Manipulation (context event)
|
||||
|
||||
Modify the messages sent to the LLM on every turn:
|
||||
|
||||
```typescript
|
||||
pi.on("context", async (event, ctx) => {
|
||||
// event.messages is a deep copy — safe to modify
|
||||
const filtered = event.messages.filter(m => !isIrrelevant(m));
|
||||
return { messages: filtered };
|
||||
});
|
||||
```
|
||||
|
||||
### Tool-Specific Prompt Content
|
||||
|
||||
Tools can add to the system prompt when they're active:
|
||||
|
||||
```typescript
|
||||
pi.registerTool({
|
||||
name: "my_tool",
|
||||
promptSnippet: "Summarize or transform text according to action", // Replaces description in "Available tools"
|
||||
promptGuidelines: [
|
||||
"Use my_tool when the user asks to summarize text.",
|
||||
"Prefer my_tool over direct output for structured data."
|
||||
], // Added to "Guidelines" section when tool is active
|
||||
// ...
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
# Compaction & Session Control
|
||||
|
||||
|
||||
### Custom Compaction
|
||||
|
||||
Override the default compaction behavior:
|
||||
|
||||
```typescript
|
||||
pi.on("session_before_compact", async (event, ctx) => {
|
||||
const { preparation, branchEntries, customInstructions, signal } = event;
|
||||
|
||||
// Option 1: Cancel compaction
|
||||
return { cancel: true };
|
||||
|
||||
// Option 2: Provide custom summary
|
||||
return {
|
||||
compaction: {
|
||||
summary: "Custom summary of conversation so far...",
|
||||
firstKeptEntryId: preparation.firstKeptEntryId,
|
||||
tokensBefore: preparation.tokensBefore,
|
||||
}
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### Triggering Compaction
|
||||
|
||||
```typescript
|
||||
ctx.compact({
|
||||
customInstructions: "Focus on the authentication changes",
|
||||
onComplete: (result) => ctx.ui.notify("Compacted!", "info"),
|
||||
});
|
||||
```
|
||||
|
||||
### Session Control (Commands Only)
|
||||
|
||||
```typescript
|
||||
pi.registerCommand("handoff", {
|
||||
handler: async (args, ctx) => {
|
||||
// Create a new session with initial context
|
||||
await ctx.newSession({
|
||||
setup: async (sm) => {
|
||||
sm.appendMessage({
|
||||
role: "user",
|
||||
content: [{ type: "text", text: "Context: " + args }],
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
},
|
||||
});
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,61 +0,0 @@
|
|||
# Model & Provider Management
|
||||
|
||||
|
||||
### Switching Models
|
||||
|
||||
```typescript
|
||||
const model = ctx.modelRegistry.find("anthropic", "claude-sonnet-4-5");
|
||||
if (model) {
|
||||
const success = await pi.setModel(model);
|
||||
if (!success) ctx.ui.notify("No API key for this model", "error");
|
||||
}
|
||||
```
|
||||
|
||||
### Registering Custom Providers
|
||||
|
||||
```typescript
|
||||
pi.registerProvider("my-proxy", {
|
||||
baseUrl: "https://proxy.example.com",
|
||||
apiKey: "PROXY_API_KEY", // Env var name or literal
|
||||
api: "anthropic-messages",
|
||||
models: [
|
||||
{
|
||||
id: "claude-sonnet-4-20250514",
|
||||
name: "Claude 4 Sonnet (proxy)",
|
||||
reasoning: false,
|
||||
input: ["text", "image"],
|
||||
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
|
||||
contextWindow: 200000,
|
||||
maxTokens: 16384,
|
||||
}
|
||||
],
|
||||
// Optional: OAuth support for /login
|
||||
oauth: {
|
||||
name: "Corporate AI (SSO)",
|
||||
async login(callbacks) { /* ... */ },
|
||||
async refreshToken(credentials) { /* ... */ },
|
||||
getApiKey(credentials) { return credentials.access; },
|
||||
},
|
||||
});
|
||||
|
||||
// Override just the baseUrl for an existing provider
|
||||
pi.registerProvider("anthropic", {
|
||||
baseUrl: "https://proxy.example.com",
|
||||
});
|
||||
|
||||
// Remove a provider
|
||||
pi.unregisterProvider("my-proxy");
|
||||
```
|
||||
|
||||
### Reacting to Model Changes
|
||||
|
||||
```typescript
|
||||
pi.on("model_select", async (event, ctx) => {
|
||||
// event.model — new model
|
||||
// event.previousModel — previous model (undefined if first)
|
||||
// event.source — "set" | "cycle" | "restore"
|
||||
ctx.ui.setStatus("model", `${event.model.provider}/${event.model.id}`);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
# Remote Execution & Tool Overrides
|
||||
|
||||
|
||||
### SSH Example Pattern
|
||||
|
||||
```typescript
|
||||
import { createReadTool, createBashTool, createWriteTool } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.registerFlag("ssh", { description: "SSH target", type: "string" });
|
||||
|
||||
const localBash = createBashTool(process.cwd());
|
||||
|
||||
pi.registerTool({
|
||||
...localBash,
|
||||
async execute(id, params, signal, onUpdate, ctx) {
|
||||
const sshTarget = pi.getFlag("--ssh");
|
||||
if (sshTarget) {
|
||||
const remoteBash = createBashTool(process.cwd(), {
|
||||
operations: createSSHOperations(sshTarget),
|
||||
});
|
||||
return remoteBash.execute(id, params, signal, onUpdate);
|
||||
}
|
||||
return localBash.execute(id, params, signal, onUpdate);
|
||||
},
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Tool Override Pattern (Logging/Access Control)
|
||||
|
||||
```typescript
|
||||
pi.registerTool({
|
||||
name: "read", // Same name = overrides built-in
|
||||
label: "Read (Logged)",
|
||||
description: "Read file contents with logging",
|
||||
parameters: Type.Object({
|
||||
path: Type.String(),
|
||||
offset: Type.Optional(Type.Number()),
|
||||
limit: Type.Optional(Type.Number()),
|
||||
}),
|
||||
async execute(toolCallId, params, signal, onUpdate, ctx) {
|
||||
console.log(`[AUDIT] Reading: ${params.path}`);
|
||||
// Delegate to built-in implementation
|
||||
const builtIn = createReadTool(ctx.cwd);
|
||||
return builtIn.execute(toolCallId, params, signal, onUpdate);
|
||||
},
|
||||
// Omit renderCall/renderResult to use built-in renderer automatically
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
# Packaging & Distribution
|
||||
|
||||
|
||||
### Creating a Pi Package
|
||||
|
||||
Add a `pi` manifest to `package.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "my-pi-package",
|
||||
"keywords": ["pi-package"],
|
||||
"pi": {
|
||||
"extensions": ["./extensions"],
|
||||
"skills": ["./skills"],
|
||||
"prompts": ["./prompts"],
|
||||
"themes": ["./themes"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Installing Packages
|
||||
|
||||
```bash
|
||||
pi install npm:@foo/bar@1.0.0
|
||||
pi install git:github.com/user/repo@v1
|
||||
pi install ./local/path
|
||||
|
||||
# Try without installing:
|
||||
pi -e npm:@foo/bar
|
||||
```
|
||||
|
||||
### Convention Directories (no manifest needed)
|
||||
|
||||
If no `pi` manifest exists, pi auto-discovers:
|
||||
- `extensions/` → `.ts` and `.js` files
|
||||
- `skills/` → `SKILL.md` folders
|
||||
- `prompts/` → `.md` files
|
||||
- `themes/` → `.json` files
|
||||
|
||||
### Gallery Metadata
|
||||
|
||||
```json
|
||||
{
|
||||
"pi": {
|
||||
"video": "https://example.com/demo.mp4",
|
||||
"image": "https://example.com/screenshot.png"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Dependencies
|
||||
|
||||
- List `@mariozechner/pi-ai`, `@mariozechner/pi-coding-agent`, `@mariozechner/pi-tui`, `@sinclair/typebox` in `peerDependencies` with `"*"` — they're bundled by pi.
|
||||
- Other npm deps go in `dependencies`. Pi runs `npm install` on package installation.
|
||||
|
||||
---
|
||||
|
|
@ -1,21 +0,0 @@
|
|||
# Mode Behavior
|
||||
|
||||
|
||||
| Mode | UI Methods | Notes |
|
||||
|------|-----------|-------|
|
||||
| **Interactive** (default) | Full TUI | Normal operation |
|
||||
| **RPC** (`--mode rpc`) | JSON protocol | Host handles UI, dialogs work via sub-protocol |
|
||||
| **JSON** (`--mode json`) | No-op | Event stream to stdout |
|
||||
| **Print** (`-p`) | No-op | Extensions run but can't prompt users |
|
||||
|
||||
**Always check `ctx.hasUI`** before calling dialog methods in extensions that might run in non-interactive modes:
|
||||
|
||||
```typescript
|
||||
if (ctx.hasUI) {
|
||||
const ok = await ctx.ui.confirm("Delete?", "Sure?");
|
||||
} else {
|
||||
// Default behavior for non-interactive mode
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,8 +0,0 @@
|
|||
# Error Handling
|
||||
|
||||
|
||||
- **Extension errors** are logged but don't crash pi. The agent continues.
|
||||
- **`tool_call` handler errors** block the tool (fail-safe behavior).
|
||||
- **Tool `execute` errors** are reported to the LLM with `isError: true`, allowing it to recover.
|
||||
|
||||
---
|
||||
|
|
@ -1,25 +0,0 @@
|
|||
# Key Rules & Gotchas
|
||||
|
||||
|
||||
### Must-Follow Rules
|
||||
|
||||
1. **Use `StringEnum` for string enums** — `Type.Union`/`Type.Literal` breaks Google's API.
|
||||
2. **Truncate tool output** — Large output causes context overflow, compaction failures, degraded performance.
|
||||
3. **Use theme from callback** — Don't import theme directly. Use the `theme` parameter from `ctx.ui.custom()` or render functions.
|
||||
4. **Type the DynamicBorder color param** — Write `(s: string) => theme.fg("accent", s)`.
|
||||
5. **Call `tui.requestRender()` after state changes** in `handleInput`.
|
||||
6. **Return `{ render, invalidate, handleInput }`** from custom components.
|
||||
7. **Lines must not exceed `width`** in `render()` — use `truncateToWidth()`.
|
||||
8. **Session control methods only in commands** — `waitForIdle()`, `newSession()`, `fork()`, `navigateTree()`, `reload()` will deadlock in event handlers.
|
||||
9. **Strip leading `@` from path arguments** in custom tools — some models add it.
|
||||
10. **Store state in tool result `details`** for proper branching support.
|
||||
|
||||
### Common Patterns
|
||||
|
||||
- **Rebuild on `invalidate()`** when your component pre-bakes theme colors
|
||||
- **Check `signal?.aborted`** in long-running tool executions
|
||||
- **Use `pi.exec()` instead of `child_process`** for shell commands
|
||||
- **Overlay components are disposed when closed** — create fresh instances each time
|
||||
- **Treat `ctx.reload()` as terminal** — code after it runs from the pre-reload version
|
||||
|
||||
---
|
||||
|
|
@ -1,24 +0,0 @@
|
|||
# File Reference — Documentation
|
||||
|
||||
|
||||
All paths relative to:
|
||||
```
|
||||
/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/
|
||||
```
|
||||
|
||||
| File | What It Covers |
|
||||
|------|---------------|
|
||||
| `docs/extensions.md` | **Primary reference** — Full extensions API (1,972 lines) |
|
||||
| `docs/tui.md` | TUI component system — `Component` interface, built-in components, keyboard input, theming, overlay system, performance, patterns |
|
||||
| `docs/packages.md` | Creating and distributing pi packages (npm, git, local) |
|
||||
| `docs/session.md` | Session file format, entry types, message types, SessionManager API |
|
||||
| `docs/compaction.md` | Auto-compaction, branch summarization, custom compaction hooks |
|
||||
| `docs/rpc.md` | RPC mode protocol for headless/embedded operation |
|
||||
| `docs/sdk.md` | SDK integrations |
|
||||
| `docs/custom-provider.md` | Custom model providers, OAuth, streaming APIs |
|
||||
| `docs/keybindings.md` | Keyboard shortcut format and built-in keybindings |
|
||||
| `docs/themes.md` | Creating custom themes |
|
||||
| `docs/settings.md` | Settings configuration |
|
||||
| `README.md` | Main pi documentation |
|
||||
|
||||
---
|
||||
|
|
@ -1,132 +0,0 @@
|
|||
# File Reference — Example Extensions
|
||||
|
||||
|
||||
All paths relative to:
|
||||
```
|
||||
/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples/extensions/
|
||||
```
|
||||
|
||||
### Lifecycle & Safety
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `protected-paths.ts` | Blocking writes to `.env`, `.git/`, `node_modules/` via `tool_call` |
|
||||
| `dirty-repo-guard.ts` | Preventing session changes with uncommitted git changes |
|
||||
|
||||
### Custom Tools
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `todo.ts` | **Best example** — Stateful tool with persistence, custom rendering, command |
|
||||
| `hello.ts` | Minimal tool registration |
|
||||
| `question.ts` | Tool with `ctx.ui.select()` for user interaction |
|
||||
| `questionnaire.ts` | Multi-question wizard with tab navigation |
|
||||
| `tool-override.ts` | Overriding built-in `read` with logging/access control |
|
||||
| `dynamic-tools.ts` | Registering tools after startup and at runtime |
|
||||
| `truncated-tool.ts` | Output truncation with `truncateHead` |
|
||||
| `built-in-tool-renderer.ts` | Custom compact rendering for built-in tools |
|
||||
| `antigravity-image-gen.ts` | Image generation tool |
|
||||
| `ssh.ts` | Full SSH remote execution with pluggable operations |
|
||||
|
||||
### Commands & UI
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `commands.ts` | Basic command registration |
|
||||
| `preset.ts` | Named presets (model, thinking, tools) with flag and command |
|
||||
| `plan-mode/` | Full plan mode — commands, shortcuts, flags, widgets, status, tool management |
|
||||
| `qna.ts` | Extract questions + `BorderedLoader` + `setEditorText` |
|
||||
| `send-user-message.ts` | `pi.sendUserMessage()` for injecting user messages |
|
||||
| `modal-editor.ts` | Vim-like modal editor via `CustomEditor` |
|
||||
| `snake.ts` | Full game with custom UI, keyboard handling, persistence |
|
||||
| `space-invaders.ts` | Full game with custom UI |
|
||||
| `doom-overlay/` | DOOM running as an overlay at 35 FPS |
|
||||
| `timed-confirm.ts` | Dialogs with `timeout` and `AbortSignal` |
|
||||
| `overlay-test.ts` | Overlay compositing with inline inputs |
|
||||
| `overlay-qa-tests.ts` | Comprehensive overlay tests: anchors, margins, stacking |
|
||||
|
||||
### System Prompt & Context
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `pirate.ts` | `before_agent_start` system prompt modification |
|
||||
| `claude-rules.ts` | Loading rules from `.claude/rules/` into system prompt |
|
||||
| `system-prompt-header.ts` | Displaying system prompt info |
|
||||
| `input-transform.ts` | Transforming user input via `input` event |
|
||||
| `inline-bash.ts` | Expanding `!{command}` patterns in prompts |
|
||||
|
||||
### Compaction & Sessions
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `custom-compaction.ts` | Custom compaction summary via `session_before_compact` |
|
||||
| `trigger-compact.ts` | Triggering compaction at 100k tokens |
|
||||
| `git-checkpoint.ts` | Git stash on turns, restore on fork |
|
||||
| `bookmark.ts` | Labeling entries for `/tree` navigation |
|
||||
| `session-name.ts` | Naming sessions for selector display |
|
||||
|
||||
### UI Components
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `custom-footer.ts` | `setFooter` with git branch and token stats |
|
||||
| `custom-header.ts` | `setHeader` for custom startup header |
|
||||
| `status-line.ts` | `setStatus` for footer indicators |
|
||||
| `widget-placement.ts` | `setWidget` above and below editor |
|
||||
| `notify.ts` | Desktop notifications via OSC 777 |
|
||||
| `titlebar-spinner.ts` | Braille spinner in terminal title |
|
||||
| `message-renderer.ts` | Custom message rendering with `registerMessageRenderer` |
|
||||
| `model-status.ts` | `model_select` event for status bar |
|
||||
| `mac-system-theme.ts` | Auto-sync theme with macOS dark/light mode |
|
||||
|
||||
### Providers
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `custom-provider-anthropic/` | Custom Anthropic provider with OAuth |
|
||||
| `custom-provider-gitlab-duo/` | GitLab Duo via proxy |
|
||||
| `custom-provider-qwen-cli/` | Qwen CLI with OAuth device flow |
|
||||
|
||||
### Communication
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `event-bus.ts` | Inter-extension communication via `pi.events` |
|
||||
| `rpc-demo.ts` | All RPC-supported extension UI methods |
|
||||
| `reload-runtime.ts` | Safe reload flow: command + LLM tool handoff |
|
||||
| `shutdown-command.ts` | `ctx.shutdown()` for graceful exit |
|
||||
| `file-trigger.ts` | File watcher injecting messages via `sendMessage` |
|
||||
|
||||
### Misc
|
||||
| File | What It Demonstrates |
|
||||
|------|---------------------|
|
||||
| `with-deps/` | Extension with its own `package.json` and npm dependencies |
|
||||
| `minimal-mode.ts` | Override built-in tool rendering for minimal display |
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference: "I want to..."
|
||||
|
||||
| Goal | Approach | Key API | Example File |
|
||||
|------|----------|---------|-------------|
|
||||
| Block dangerous commands | Listen to `tool_call`, return `{ block: true }` | `pi.on("tool_call", ...)` | `protected-paths.ts` |
|
||||
| Add a tool the LLM can use | Register a tool with schema and execute | `pi.registerTool({...})` | `todo.ts` |
|
||||
| Add a slash command | Register a command with handler | `pi.registerCommand(...)` | `commands.ts` |
|
||||
| Ask the user a question | Use dialog methods | `ctx.ui.select()`, `ctx.ui.confirm()` | `question.ts` |
|
||||
| Show persistent status | Set footer status | `ctx.ui.setStatus(id, text)` | `status-line.ts` |
|
||||
| Modify the system prompt | Hook `before_agent_start` | Return `{ systemPrompt: "..." }` | `pirate.ts` |
|
||||
| Filter messages sent to LLM | Hook `context` event | Return `{ messages: [...] }` | — |
|
||||
| Save state across restarts | Store in tool details or appendEntry | `details: {...}` / `pi.appendEntry(...)` | `todo.ts` |
|
||||
| Custom compaction | Hook `session_before_compact` | Return `{ compaction: {...} }` | `custom-compaction.ts` |
|
||||
| Build a full-screen UI | Use `ctx.ui.custom()` | Component with render/handleInput | `snake.ts` |
|
||||
| Show a floating dialog | Use overlay mode | `ctx.ui.custom(..., { overlay: true })` | `overlay-test.ts` |
|
||||
| Replace the input editor | Extend `CustomEditor` | `ctx.ui.setEditorComponent(...)` | `modal-editor.ts` |
|
||||
| Override a built-in tool | Register tool with same name | `pi.registerTool({ name: "read" })` | `tool-override.ts` |
|
||||
| Run tools via SSH | Use pluggable operations | `createBashTool(cwd, { operations })` | `ssh.ts` |
|
||||
| Switch models programmatically | Find and set model | `pi.setModel(model)` | `preset.ts` |
|
||||
| Register a custom provider | Provide config with models | `pi.registerProvider(...)` | `custom-provider-anthropic/` |
|
||||
| Transform user input | Hook `input` event | Return `{ action: "transform", text }` | `input-transform.ts` |
|
||||
| Inject messages | Send custom or user messages | `pi.sendMessage()` / `pi.sendUserMessage()` | `send-user-message.ts` |
|
||||
| React to model changes | Hook `model_select` | `pi.on("model_select", ...)` | `model-status.ts` |
|
||||
| Add a keyboard shortcut | Register a shortcut | `pi.registerShortcut("ctrl+x", ...)` | `plan-mode/` |
|
||||
| Package for distribution | Add `pi` key to package.json | `"pi": { "extensions": [...] }` | See packages.md |
|
||||
|
||||
---
|
||||
|
||||
*This document was generated from the Pi extension documentation and examples. Source docs are at:*
|
||||
```
|
||||
/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/docs/
|
||||
/Users/lexchristopherson/.nvm/versions/node/v22.20.0/lib/node_modules/@mariozechner/pi-coding-agent/examples/extensions/
|
||||
```
|
||||
|
|
@ -1,324 +0,0 @@
|
|||
# Slash Command Subcommand Patterns
|
||||
|
||||
Pi does not have a separate built-in concept of "nested slash commands" like `/wt new` or `/foo delete`.
|
||||
|
||||
Instead, this UX is built by registering a single slash command and using **argument completions** to make the first argument behave like a subcommand.
|
||||
|
||||
This is the pattern used by the built-in worktree extension:
|
||||
- `/wt`
|
||||
- `/wt new`
|
||||
- `/wt ls`
|
||||
- `/wt switch my-branch`
|
||||
|
||||
The key API is:
|
||||
- `pi.registerCommand(name, options)`
|
||||
- `getArgumentCompletions(prefix)`
|
||||
- `handler(args, ctx)`
|
||||
|
||||
## Mental Model
|
||||
|
||||
Treat the command as:
|
||||
|
||||
- one top-level slash command
|
||||
- one or more positional arguments
|
||||
- the first positional argument acting as a subcommand
|
||||
- optional later arguments completed dynamically based on the first
|
||||
|
||||
So this:
|
||||
|
||||
```text
|
||||
/wt
|
||||
new
|
||||
ls
|
||||
switch
|
||||
merge
|
||||
rm
|
||||
status
|
||||
```
|
||||
|
||||
is really just:
|
||||
|
||||
- command: `wt`
|
||||
- first arg: one of `new | ls | switch | merge | rm | status`
|
||||
|
||||
## The Core Pattern
|
||||
|
||||
```typescript
|
||||
pi.registerCommand("foo", {
|
||||
description: "Manage foo items: /foo new|list|delete [name]",
|
||||
|
||||
getArgumentCompletions: (prefix: string) => {
|
||||
const subcommands = ["new", "list", "delete"];
|
||||
const parts = prefix.trim().split(/\s+/);
|
||||
|
||||
// Complete the first argument: /foo <subcommand>
|
||||
if (parts.length <= 1) {
|
||||
return subcommands
|
||||
.filter((cmd) => cmd.startsWith(parts[0] ?? ""))
|
||||
.map((cmd) => ({ value: cmd, label: cmd }));
|
||||
}
|
||||
|
||||
// Complete the second argument: /foo delete <name>
|
||||
if (parts[0] === "delete") {
|
||||
const items = ["alpha", "beta", "gamma"];
|
||||
const namePrefix = parts[1] ?? "";
|
||||
return items
|
||||
.filter((name) => name.startsWith(namePrefix))
|
||||
.map((name) => ({ value: `delete ${name}`, label: name }));
|
||||
}
|
||||
|
||||
return [];
|
||||
},
|
||||
|
||||
handler: async (args, ctx) => {
|
||||
const parts = args.trim().split(/\s+/);
|
||||
const sub = parts[0];
|
||||
const name = parts[1];
|
||||
|
||||
await ctx.waitForIdle();
|
||||
|
||||
if (sub === "new") {
|
||||
ctx.ui.notify("Create a new foo item", "info");
|
||||
return;
|
||||
}
|
||||
|
||||
if (sub === "list") {
|
||||
ctx.ui.notify("List foo items", "info");
|
||||
return;
|
||||
}
|
||||
|
||||
if (sub === "delete") {
|
||||
if (!name) {
|
||||
ctx.ui.notify("Usage: /foo delete <name>", "error");
|
||||
return;
|
||||
}
|
||||
ctx.ui.notify(`Deleting ${name}`, "info");
|
||||
return;
|
||||
}
|
||||
|
||||
ctx.ui.notify("Usage: /foo <new|list|delete> [name]", "info");
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
## How `getArgumentCompletions()` Behaves
|
||||
|
||||
`getArgumentCompletions(prefix)` receives everything after the slash command name.
|
||||
|
||||
Examples for `/foo`:
|
||||
|
||||
- typing `/foo ` gives `prefix === ""`
|
||||
- typing `/foo de` gives `prefix === "de"`
|
||||
- typing `/foo delete a` gives `prefix === "delete a"`
|
||||
|
||||
That means you can parse the prefix into words and decide what suggestions to show next.
|
||||
|
||||
A common structure is:
|
||||
|
||||
1. If the user is on the first argument, show available subcommands.
|
||||
2. If the first argument selects a branch like `delete`, show completions for the next argument.
|
||||
3. Otherwise return `[]`.
|
||||
|
||||
## Important Detail: Empty Prefix Handling
|
||||
|
||||
A practical gotcha is that:
|
||||
|
||||
```typescript
|
||||
"".trim().split(/\s+/)
|
||||
```
|
||||
|
||||
produces `['']`, not `[]`.
|
||||
|
||||
That is why the common pattern is:
|
||||
|
||||
```typescript
|
||||
const parts = prefix.trim().split(/\s+/);
|
||||
if (parts.length <= 1) {
|
||||
// complete first argument
|
||||
}
|
||||
```
|
||||
|
||||
This handles both:
|
||||
- completely empty input after the command
|
||||
- partially typed first arguments
|
||||
|
||||
## Dynamic Second-Argument Completion
|
||||
|
||||
This pattern becomes powerful when later arguments depend on the subcommand.
|
||||
|
||||
Example:
|
||||
|
||||
```typescript
|
||||
getArgumentCompletions: (prefix) => {
|
||||
const parts = prefix.trim().split(/\s+/);
|
||||
const sub = parts[0];
|
||||
|
||||
if (parts.length <= 1) {
|
||||
return ["new", "list", "delete"].map((s) => ({ value: s, label: s }));
|
||||
}
|
||||
|
||||
if (sub === "delete") {
|
||||
const items = getCurrentItemsSomehow();
|
||||
const namePrefix = parts[1] ?? "";
|
||||
return items
|
||||
.filter((item) => item.startsWith(namePrefix))
|
||||
.map((item) => ({ value: `delete ${item}`, label: item }));
|
||||
}
|
||||
|
||||
return [];
|
||||
}
|
||||
```
|
||||
|
||||
This is how `/wt switch`, `/wt merge`, and `/wt rm` can suggest current worktree names.
|
||||
|
||||
## Real Example: `/wt`
|
||||
|
||||
The worktree extension uses this exact structure in:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/agent/extensions/worktree/index.ts`
|
||||
|
||||
It defines:
|
||||
|
||||
```typescript
|
||||
const subcommands = ["new", "ls", "switch", "merge", "rm", "status"];
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
- when the first argument is still being typed, it suggests those subcommands
|
||||
- when the first argument is `switch`, `merge`, or `rm`, it suggests matching worktree names for the second argument
|
||||
|
||||
That is why typing:
|
||||
|
||||
```text
|
||||
/wt
|
||||
```
|
||||
|
||||
shows:
|
||||
|
||||
```text
|
||||
new
|
||||
ls
|
||||
switch
|
||||
merge
|
||||
rm
|
||||
status
|
||||
```
|
||||
|
||||
and typing:
|
||||
|
||||
```text
|
||||
/wt switch
|
||||
```
|
||||
|
||||
shows available worktree names.
|
||||
|
||||
## Parsing in the Handler
|
||||
|
||||
Your completion logic and your handler logic should agree on the command shape.
|
||||
|
||||
A common structure is:
|
||||
|
||||
```typescript
|
||||
handler: async (args, ctx) => {
|
||||
const parts = args.trim().split(/\s+/);
|
||||
const sub = parts[0];
|
||||
const rest = parts.slice(1);
|
||||
|
||||
switch (sub) {
|
||||
case "new":
|
||||
// handle /foo new
|
||||
return;
|
||||
case "list":
|
||||
// handle /foo list
|
||||
return;
|
||||
case "delete":
|
||||
// handle /foo delete <name>
|
||||
return;
|
||||
default:
|
||||
ctx.ui.notify("Usage: /foo <new|list|delete>", "info");
|
||||
return;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Keep the parsing simple and mirror the same branches your completions advertise.
|
||||
|
||||
## When to Use This Pattern
|
||||
|
||||
Use a single command with subcommand-style completions when:
|
||||
|
||||
- the actions belong to one clear domain
|
||||
- you want discoverability from one entry point
|
||||
- the subcommands feel like one family of operations
|
||||
- later arguments depend on the earlier choice
|
||||
|
||||
Examples:
|
||||
|
||||
- `/wt new|switch|merge|rm|status`
|
||||
- `/preset save|load|delete`
|
||||
- `/workflow start|list|abort`
|
||||
- `/foo new|list|delete`
|
||||
|
||||
## When to Prefer Separate Commands
|
||||
|
||||
Prefer separate commands when:
|
||||
|
||||
- the actions are conceptually unrelated
|
||||
- each command needs its own distinct description and identity
|
||||
- autocomplete would become too deep or overloaded
|
||||
- the combined command would become hard to remember or document
|
||||
|
||||
Good candidates for separate commands:
|
||||
|
||||
- `/deploy`
|
||||
- `/rollback`
|
||||
- `/handoff`
|
||||
|
||||
rather than forcing all of those into one umbrella command.
|
||||
|
||||
## UX Guidelines
|
||||
|
||||
A few practical rules make this pattern feel good:
|
||||
|
||||
- Keep top-level subcommands short and obvious.
|
||||
- Use names that read naturally after the slash command.
|
||||
- Keep branching shallow; one or two levels is usually enough.
|
||||
- Return an empty array when no completion makes sense.
|
||||
- Make your fallback usage text match your completion structure.
|
||||
- If a subcommand needs required data, validate it again in the handler.
|
||||
|
||||
## Recommended Structure
|
||||
|
||||
A solid command with subcommands usually has:
|
||||
|
||||
- `description` showing the top-level grammar
|
||||
- `getArgumentCompletions()` for first and second argument suggestions
|
||||
- `handler()` that branches on the first argument
|
||||
- a fallback usage message for invalid input
|
||||
|
||||
Example description:
|
||||
|
||||
```typescript
|
||||
description: "Manage foo items: /foo new|list|delete [name]"
|
||||
```
|
||||
|
||||
## Related Docs
|
||||
|
||||
Read these alongside this pattern:
|
||||
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/11-custom-commands-user-facing-actions.md`
|
||||
- `/Users/lexchristopherson/.gsd/docs/extending-pi/09-extensionapi-what-you-can-do.md`
|
||||
- `/Users/lexchristopherson/.gsd/agent/extensions/worktree/index.ts`
|
||||
|
||||
## Summary
|
||||
|
||||
If you want `/foo` to behave like it has nested subcommands, do this:
|
||||
|
||||
1. register one slash command
|
||||
2. treat the first argument as a subcommand
|
||||
3. implement `getArgumentCompletions(prefix)`
|
||||
4. optionally complete later arguments dynamically
|
||||
5. branch in the handler based on the parsed first argument
|
||||
|
||||
That is the mechanism behind the `/wt` experience.
|
||||
|
|
@ -1,36 +0,0 @@
|
|||
# The Complete Guide to Building Pi Extensions
|
||||
|
||||
> Split into individual files for easier consumption.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [01. What Are Extensions?](./01-what-are-extensions.md)
|
||||
- [02. Architecture & Mental Model](./02-architecture-mental-model.md)
|
||||
- [03. Getting Started](./03-getting-started.md)
|
||||
- [04. Extension Locations & Discovery](./04-extension-locations-discovery.md)
|
||||
- [05. Extension Structure & Styles](./05-extension-structure-styles.md)
|
||||
- [06. The Extension Lifecycle](./06-the-extension-lifecycle.md)
|
||||
- [07. Events — The Nervous System](./07-events-the-nervous-system.md)
|
||||
- [08. ExtensionContext — What You Can Access](./08-extensioncontext-what-you-can-access.md)
|
||||
- [09. ExtensionAPI — What You Can Do](./09-extensionapi-what-you-can-do.md)
|
||||
- [10. Custom Tools — Giving the LLM New Abilities](./10-custom-tools-giving-the-llm-new-abilities.md)
|
||||
- [11. Custom Commands — User-Facing Actions](./11-custom-commands-user-facing-actions.md)
|
||||
- [12. Custom UI — Visual Components](./12-custom-ui-visual-components.md)
|
||||
- [13. State Management & Persistence](./13-state-management-persistence.md)
|
||||
- [14. Custom Rendering — Controlling What the User Sees](./14-custom-rendering-controlling-what-the-user-sees.md)
|
||||
- [15. System Prompt Modification](./15-system-prompt-modification.md)
|
||||
- [16. Compaction & Session Control](./16-compaction-session-control.md)
|
||||
- [17. Model & Provider Management](./17-model-provider-management.md)
|
||||
- [18. Remote Execution & Tool Overrides](./18-remote-execution-tool-overrides.md)
|
||||
- [19. Packaging & Distribution](./19-packaging-distribution.md)
|
||||
- [20. Mode Behavior](./20-mode-behavior.md)
|
||||
- [21. Error Handling](./21-error-handling.md)
|
||||
- [22. Key Rules & Gotchas](./22-key-rules-gotchas.md)
|
||||
- [23. File Reference — Documentation](./23-file-reference-documentation.md)
|
||||
- [24. File Reference — Example Extensions](./24-file-reference-example-extensions.md)
|
||||
- [25. Slash Command Subcommand Patterns](./25-slash-command-subcommand-patterns.md)
|
||||
|
||||
---
|
||||
|
||||
*Split into per-section files for surgical context loading.*
|
||||
|
||||
|
|
@ -1,198 +0,0 @@
|
|||
# Getting Started
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
npm install -g gsd-pi
|
||||
```
|
||||
|
||||
Requires Node.js ≥ 22.0.0 (24 LTS recommended) and Git.
|
||||
|
||||
> **`command not found: gsd`?** Your shell may not have npm's global bin directory in `$PATH`. Run `npm prefix -g` to find it, then add `$(npm prefix -g)/bin` to your PATH. See [Troubleshooting](./troubleshooting.md#command-not-found-gsd-after-install) for details.
|
||||
|
||||
GSD checks for updates once every 24 hours. When a new version is available, you'll see an interactive prompt at startup with the option to update immediately or skip. You can also update from within a session with `/gsd update`.
|
||||
|
||||
### Set up API keys
|
||||
|
||||
If you use a non-Anthropic model, you'll need a search API key for web search. Run `/gsd config` to set keys globally — they're saved to `~/.gsd/agent/auth.json` and apply to all projects:
|
||||
|
||||
```bash
|
||||
# Inside any GSD session:
|
||||
/gsd config
|
||||
```
|
||||
|
||||
See [Global API Keys](./configuration.md#global-api-keys-gsd-config) for details on supported keys.
|
||||
|
||||
### Set up custom MCP servers
|
||||
|
||||
If you want GSD to call local or external MCP servers, add project-local config in `.mcp.json` or `.gsd/mcp.json`.
|
||||
|
||||
See [Configuration → MCP Servers](./configuration.md#mcp-servers) for examples and verification steps.
|
||||
|
||||
### VS Code Extension
|
||||
|
||||
GSD is also available as a VS Code extension. Install from the marketplace (publisher: FluxLabs) or search for "GSD" in VS Code extensions. The extension provides:
|
||||
|
||||
- **`@gsd` chat participant** — talk to the agent in VS Code Chat
|
||||
- **Sidebar dashboard** — connection status, model info, token usage, quick actions
|
||||
- **Full command palette** — start/stop agent, switch models, export sessions
|
||||
|
||||
The CLI (`gsd-pi`) must be installed first — the extension connects to it via RPC.
|
||||
|
||||
### Web Interface
|
||||
|
||||
GSD also has a browser-based interface. Run `gsd --web` to start a local web server with a visual dashboard, real-time progress, and multi-project support. See [Web Interface](./web-interface.md) for details.
|
||||
|
||||
## First Launch
|
||||
|
||||
Run `gsd` in any directory:
|
||||
|
||||
```bash
|
||||
gsd
|
||||
```
|
||||
|
||||
GSD displays a welcome screen showing your version, active model, and available tool keys. Then on first launch, it runs a setup wizard:
|
||||
|
||||
1. **LLM Provider** — select from 20+ providers (Anthropic, OpenAI, Google, OpenRouter, GitHub Copilot, Amazon Bedrock, Azure, and more). OAuth flows handle Claude Max and Copilot subscriptions automatically; otherwise paste an API key.
|
||||
2. **Tool API Keys** (optional) — Brave Search, Context7, Jina, Slack, Discord. Press Enter to skip any.
|
||||
|
||||
If you have an existing Pi installation, provider credentials are imported automatically.
|
||||
|
||||
Re-run the wizard anytime with:
|
||||
|
||||
```bash
|
||||
gsd config
|
||||
```
|
||||
|
||||
## Choose a Model
|
||||
|
||||
GSD auto-selects a default model after login. Switch later with:
|
||||
|
||||
```
|
||||
/model
|
||||
```
|
||||
|
||||
Or configure per-phase models in preferences — see [Configuration](./configuration.md).
|
||||
|
||||
## Two Ways to Work
|
||||
|
||||
### Step Mode — `/gsd`
|
||||
|
||||
Type `/gsd` inside a session. GSD executes one unit of work at a time, pausing between each with a wizard showing what completed and what's next.
|
||||
|
||||
- **No `.gsd/` directory** → starts a discussion flow to capture your project vision
|
||||
- **Milestone exists, no roadmap** → discuss or research the milestone
|
||||
- **Roadmap exists, slices pending** → plan the next slice or execute a task
|
||||
- **Mid-task** → resume where you left off
|
||||
|
||||
Step mode is the on-ramp. You stay in the loop, reviewing output between each step.
|
||||
|
||||
### Auto Mode — `/gsd auto`
|
||||
|
||||
Type `/gsd auto` and walk away. GSD autonomously researches, plans, executes, verifies, commits, and advances through every slice until the milestone is complete.
|
||||
|
||||
```
|
||||
/gsd auto
|
||||
```
|
||||
|
||||
See [Auto Mode](./auto-mode.md) for full details.
|
||||
|
||||
## Two Terminals, One Project
|
||||
|
||||
The recommended workflow: auto mode in one terminal, steering from another.
|
||||
|
||||
**Terminal 1 — let it build:**
|
||||
|
||||
```bash
|
||||
gsd
|
||||
/gsd auto
|
||||
```
|
||||
|
||||
**Terminal 2 — steer while it works:**
|
||||
|
||||
```bash
|
||||
gsd
|
||||
/gsd discuss # talk through architecture decisions
|
||||
/gsd status # check progress
|
||||
/gsd queue # queue the next milestone
|
||||
```
|
||||
|
||||
Both terminals read and write the same `.gsd/` files. Decisions in terminal 2 are picked up at the next phase boundary automatically.
|
||||
|
||||
## Project Structure
|
||||
|
||||
GSD organizes work into a hierarchy:
|
||||
|
||||
```
|
||||
Milestone → a shippable version (4-10 slices)
|
||||
Slice → one demoable vertical capability (1-7 tasks)
|
||||
Task → one context-window-sized unit of work
|
||||
```
|
||||
|
||||
The iron rule: **a task must fit in one context window.** If it can't, it's two tasks.
|
||||
|
||||
All state lives on disk in `.gsd/`:
|
||||
|
||||
```
|
||||
.gsd/
|
||||
PROJECT.md — what the project is right now
|
||||
REQUIREMENTS.md — requirement contract (active/validated/deferred)
|
||||
DECISIONS.md — append-only architectural decisions
|
||||
KNOWLEDGE.md — cross-session rules, patterns, and lessons
|
||||
RUNTIME.md — runtime context: API endpoints, env vars, services (v2.39)
|
||||
STATE.md — quick-glance status
|
||||
milestones/
|
||||
M001/
|
||||
M001-ROADMAP.md — slice plan with risk levels and dependencies
|
||||
M001-CONTEXT.md — scope and goals from discussion
|
||||
slices/
|
||||
S01/
|
||||
S01-PLAN.md — task decomposition
|
||||
S01-SUMMARY.md — what happened
|
||||
S01-UAT.md — human test script
|
||||
tasks/
|
||||
T01-PLAN.md
|
||||
T01-SUMMARY.md
|
||||
```
|
||||
|
||||
## Resume a Session
|
||||
|
||||
```bash
|
||||
gsd --continue # or gsd -c
|
||||
```
|
||||
|
||||
Resumes the most recent session for the current directory.
|
||||
|
||||
To browse and pick from all saved sessions:
|
||||
|
||||
```bash
|
||||
gsd sessions
|
||||
```
|
||||
|
||||
Shows each session's date, message count, and first-message preview so you can choose which one to resume.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Auto Mode](./auto-mode.md) — deep dive into autonomous execution
|
||||
- [Configuration](./configuration.md) — model selection, timeouts, budgets
|
||||
- [Commands Reference](./commands.md) — all commands and shortcuts
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### `gsd` command runs `git svn dcommit` instead of GSD
|
||||
|
||||
The [oh-my-zsh git plugin](https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/git) defines `alias gsd='git svn dcommit'`, which shadows the GSD binary.
|
||||
|
||||
**Option 1** — Remove the alias in your `~/.zshrc` (add after the `source $ZSH/oh-my-zsh.sh` line):
|
||||
|
||||
```bash
|
||||
unalias gsd 2>/dev/null
|
||||
```
|
||||
|
||||
**Option 2** — Use the alternative binary name:
|
||||
|
||||
```bash
|
||||
gsd-cli
|
||||
```
|
||||
|
||||
Both `gsd` and `gsd-cli` point to the same binary.
|
||||
|
|
@ -1,187 +0,0 @@
|
|||
# Git Strategy
|
||||
|
||||
GSD uses git for milestone isolation and sequential commits within each milestone. You choose an **isolation mode** that controls where work happens. The strategy is fully automated — you don't need to manage branches manually.
|
||||
|
||||
## Isolation Modes
|
||||
|
||||
GSD supports three isolation modes, configured via the `git.isolation` preference:
|
||||
|
||||
| Mode | Working Directory | Branch | Best For |
|
||||
|------|-------------------|--------|----------|
|
||||
| `worktree` (default) | `.gsd/worktrees/<MID>/` | `milestone/<MID>` | Most projects — full file isolation between milestones |
|
||||
| `branch` | Project root | `milestone/<MID>` | Submodule-heavy repos where worktrees don't work well |
|
||||
| `none` | Project root | Current branch (no milestone branch) | Hot-reload workflows where file isolation breaks dev tooling |
|
||||
|
||||
### `worktree` Mode (Default)
|
||||
|
||||
Each milestone gets its own git worktree at `.gsd/worktrees/<MID>/` on a `milestone/<MID>` branch. All execution happens inside the worktree. On completion, the worktree is squash-merged to main as one clean commit. The worktree and branch are then cleaned up.
|
||||
|
||||
This provides full file isolation — changes in a milestone can't interfere with your main working copy.
|
||||
|
||||
### `branch` Mode
|
||||
|
||||
Work happens in the project root on a `milestone/<MID>` branch. No worktree is created. On completion, the branch is merged to main (squash or regular merge, per `merge_strategy`).
|
||||
|
||||
Use this when worktrees cause problems — submodule-heavy repos, repos with hardcoded paths, or environments where worktree symlinks don't behave.
|
||||
|
||||
### `none` Mode
|
||||
|
||||
Work happens directly on your current branch. No worktree, no milestone branch. GSD still commits sequentially with conventional commit messages, but there's no branch isolation.
|
||||
|
||||
Use this for hot-reload workflows where file isolation breaks dev tooling (e.g., file watchers that only see the project root), or for small projects where branch overhead isn't worth it.
|
||||
|
||||
## Branching Model (Worktree Mode)
|
||||
|
||||
```
|
||||
main ─────────────────────────────────────────────────────────
|
||||
│ ↑
|
||||
└── milestone/M001 (worktree) ────────────────────────┘
|
||||
commit: feat: core types
|
||||
commit: feat: markdown parser
|
||||
commit: feat: file writer
|
||||
commit: docs: workflow docs
|
||||
...
|
||||
→ squash-merged to main as single commit
|
||||
```
|
||||
|
||||
In **branch mode**, the flow is the same except work happens in the project root instead of a separate worktree directory.
|
||||
|
||||
In **none mode**, commits land directly on the current branch — no milestone branch is created, and no merge step is needed.
|
||||
|
||||
### Parallel Worktrees
|
||||
|
||||
With [parallel orchestration](./parallel-orchestration.md) enabled, multiple milestones run in separate worktrees simultaneously:
|
||||
|
||||
```
|
||||
main ──────────────────────────────────────────────────────────
|
||||
│ ↑ ↑
|
||||
├── milestone/M002 (worktree) ─────────┘ │
|
||||
│ commit: feat: auth types │
|
||||
│ commit: feat: JWT middleware │
|
||||
│ → squash-merged first │
|
||||
│ │
|
||||
└── milestone/M003 (worktree) ────────────────────────┘
|
||||
commit: feat: dashboard layout
|
||||
commit: feat: chart components
|
||||
→ squash-merged second
|
||||
```
|
||||
|
||||
Each worktree operates on its own branch with its own commit history. Merges happen sequentially to avoid conflicts.
|
||||
|
||||
### Key Properties
|
||||
|
||||
- **Sequential commits on one branch** — no per-slice branches, no merge conflicts within a milestone
|
||||
- **Squash merge to main** — in worktree and branch modes, all commits are squashed into one clean commit on main (configurable via `merge_strategy`)
|
||||
|
||||
### Commit Format
|
||||
|
||||
Commits use conventional commit format with GSD metadata in trailers:
|
||||
|
||||
```
|
||||
feat: core type definitions
|
||||
|
||||
GSD-Task: M001/S01/T01
|
||||
|
||||
feat: markdown parser for plan files
|
||||
|
||||
GSD-Task: M001/S01/T02
|
||||
```
|
||||
|
||||
## Worktree Management
|
||||
|
||||
These features apply only in **worktree mode**.
|
||||
|
||||
### Automatic (Auto Mode)
|
||||
|
||||
Auto mode creates and manages worktrees automatically:
|
||||
|
||||
1. When a milestone starts, a worktree is created at `.gsd/worktrees/<MID>/` on branch `milestone/<MID>`
|
||||
2. Planning artifacts from `.gsd/milestones/` are copied into the worktree
|
||||
3. All execution happens inside the worktree
|
||||
4. On milestone completion, the worktree is squash-merged to the integration branch
|
||||
5. The worktree and branch are removed
|
||||
|
||||
### Manual
|
||||
|
||||
Use the `/worktree` (or `/wt`) command for manual worktree management:
|
||||
|
||||
```
|
||||
/worktree create
|
||||
/worktree switch
|
||||
/worktree merge
|
||||
/worktree remove
|
||||
```
|
||||
|
||||
## Workflow Modes
|
||||
|
||||
Instead of configuring each git setting individually, set `mode` to get sensible defaults for your workflow:
|
||||
|
||||
```yaml
|
||||
mode: solo # personal projects — auto-push, squash, simple IDs
|
||||
mode: team # shared repos — unique IDs, push branches, pre-merge checks
|
||||
```
|
||||
|
||||
| Setting | `solo` | `team` |
|
||||
|---|---|---|
|
||||
| `git.auto_push` | `true` | `false` |
|
||||
| `git.push_branches` | `false` | `true` |
|
||||
| `git.pre_merge_check` | `false` | `true` |
|
||||
| `git.merge_strategy` | `"squash"` | `"squash"` |
|
||||
| `git.isolation` | `"worktree"` | `"worktree"` |
|
||||
| `git.commit_docs` | `true` | `true` |
|
||||
| `unique_milestone_ids` | `false` | `true` |
|
||||
|
||||
Mode defaults are the lowest priority — any explicit preference overrides them. For example, `mode: solo` with `git.auto_push: false` gives you everything from solo except auto-push.
|
||||
|
||||
Existing configs without `mode` work exactly as before — no defaults are injected.
|
||||
|
||||
## Git Preferences
|
||||
|
||||
Configure git behavior in preferences:
|
||||
|
||||
```yaml
|
||||
git:
|
||||
auto_push: false # push after commits
|
||||
push_branches: false # push milestone branch
|
||||
remote: origin
|
||||
snapshots: false # WIP snapshot commits
|
||||
pre_merge_check: false # pre-merge validation
|
||||
commit_type: feat # override commit type prefix
|
||||
main_branch: main # primary branch name
|
||||
commit_docs: true # commit .gsd/ to git
|
||||
isolation: worktree # "worktree", "branch", or "none"
|
||||
auto_pr: false # create PR on milestone completion
|
||||
pr_target_branch: develop # PR target branch (default: main)
|
||||
```
|
||||
|
||||
### Automatic Pull Requests
|
||||
|
||||
For teams using Gitflow or branch-based workflows, GSD can automatically create a pull request when a milestone completes:
|
||||
|
||||
```yaml
|
||||
git:
|
||||
auto_push: true
|
||||
auto_pr: true
|
||||
pr_target_branch: develop
|
||||
```
|
||||
|
||||
This pushes the milestone branch and creates a PR targeting `develop` (or whichever branch you specify). Requires `gh` CLI installed and authenticated. See [git.auto_pr](./configuration.md#gitauto_pr) for details.
|
||||
```
|
||||
|
||||
### `commit_docs: false`
|
||||
|
||||
When set to `false`, GSD adds `.gsd/` to `.gitignore` and keeps all planning artifacts local-only. Useful for teams where only some members use GSD, or when company policy requires a clean repository.
|
||||
|
||||
## Self-Healing
|
||||
|
||||
GSD includes automatic recovery for common git issues:
|
||||
|
||||
- **Detached HEAD** — automatically reattaches to the correct branch
|
||||
- **Stale lock files** — removes `index.lock` files from crashed processes
|
||||
- **Orphaned worktrees** — detects and offers to clean up abandoned worktrees (worktree mode only)
|
||||
|
||||
Run `/gsd doctor` to check git health manually.
|
||||
|
||||
## Native Git Operations
|
||||
|
||||
Since v2.16, GSD uses libgit2 via native bindings for read-heavy operations in the dispatch hot path. This eliminates ~70 process spawns per dispatch cycle, improving auto-mode throughput.
|
||||
|
|
@ -1,48 +0,0 @@
|
|||
# Migration from v1
|
||||
|
||||
If you have projects with `.planning` directories from the original Get Shit Done (v1), you can migrate them to GSD-2's `.gsd` format.
|
||||
|
||||
## Running the Migration
|
||||
|
||||
```bash
|
||||
# From within the project directory
|
||||
/gsd migrate
|
||||
|
||||
# Or specify a path
|
||||
/gsd migrate ~/projects/my-old-project
|
||||
```
|
||||
|
||||
## What Gets Migrated
|
||||
|
||||
The migration tool:
|
||||
|
||||
- Parses your old `PROJECT.md`, `ROADMAP.md`, `REQUIREMENTS.md`, phase directories, plans, summaries, and research
|
||||
- Maps phases → slices, plans → tasks, milestones → milestones
|
||||
- Preserves completion state (`[x]` phases stay done, summaries carry over)
|
||||
- Consolidates research files into the new structure
|
||||
- Shows a preview before writing anything
|
||||
- Optionally runs an agent-driven review of the output for quality assurance
|
||||
|
||||
## Supported Formats
|
||||
|
||||
The migration handles various v1 format variations:
|
||||
|
||||
- Milestone-sectioned roadmaps with `<details>` blocks
|
||||
- Bold phase entries
|
||||
- Bullet-format requirements
|
||||
- Decimal phase numbering
|
||||
- Duplicate phase numbers across milestones
|
||||
|
||||
## Requirements
|
||||
|
||||
Migration works best with a `ROADMAP.md` file for milestone structure. Without one, milestones are inferred from the `phases/` directory.
|
||||
|
||||
## Post-Migration
|
||||
|
||||
After migrating, verify the output with:
|
||||
|
||||
```
|
||||
/gsd doctor
|
||||
```
|
||||
|
||||
This checks `.gsd/` integrity and flags any structural issues.
|
||||
|
|
@ -1,75 +0,0 @@
|
|||
# Pinning Node.js LTS on macOS with Homebrew
|
||||
|
||||
If you installed Node.js via Homebrew (`brew install node`), you're tracking the **latest current release** — which can include odd-numbered development versions (e.g. 23.x, 25.x). These aren't LTS and may have breaking changes or instability.
|
||||
|
||||
GSD requires Node.js **v22 or later** and works best on an **LTS (even-numbered) release**. This guide shows how to pin Node 24 LTS using Homebrew.
|
||||
|
||||
## Check your current version
|
||||
|
||||
```bash
|
||||
node --version
|
||||
```
|
||||
|
||||
If this shows an odd number (e.g. `v23.x`, `v25.x`), you're on a development release.
|
||||
|
||||
## Install Node 24 LTS
|
||||
|
||||
Homebrew provides versioned formulas for LTS releases:
|
||||
|
||||
```bash
|
||||
# Unlink the current (possibly non-LTS) version
|
||||
brew unlink node
|
||||
|
||||
# Install Node 24 LTS
|
||||
brew install node@24
|
||||
|
||||
# Link it as the default
|
||||
brew link --overwrite node@24
|
||||
```
|
||||
|
||||
Verify:
|
||||
|
||||
```bash
|
||||
node --version
|
||||
# Should show v24.x.x
|
||||
```
|
||||
|
||||
## Why pin to LTS?
|
||||
|
||||
- **Stability** — LTS releases receive bug fixes and security patches for 30 months
|
||||
- **Compatibility** — npm packages (including GSD) test against LTS versions
|
||||
- **No surprises** — `brew upgrade` won't jump you to an unstable development release
|
||||
|
||||
## Prevent accidental upgrades
|
||||
|
||||
By default, `brew upgrade` will upgrade all packages, which could move you off the pinned version. Pin the formula:
|
||||
|
||||
```bash
|
||||
brew pin node@24
|
||||
```
|
||||
|
||||
To unpin later:
|
||||
|
||||
```bash
|
||||
brew unpin node@24
|
||||
```
|
||||
|
||||
## Switching between versions
|
||||
|
||||
If you need multiple Node versions (e.g. 22 and 24), consider using a version manager instead:
|
||||
|
||||
- **[nvm](https://github.com/nvm-sh/nvm)** — `nvm install 24 && nvm use 24`
|
||||
- **[fnm](https://github.com/Schniz/fnm)** — `fnm install 24 && fnm use 24` (faster, Rust-based)
|
||||
- **[mise](https://mise.jdx.dev/)** — `mise use node@24` (polyglot version manager)
|
||||
|
||||
These let you set per-project Node versions via `.node-version` or `.nvmrc` files.
|
||||
|
||||
## Verify GSD works
|
||||
|
||||
After pinning:
|
||||
|
||||
```bash
|
||||
node --version # v24.x.x
|
||||
npm install -g gsd-pi
|
||||
gsd --version
|
||||
```
|
||||
|
|
@ -1,309 +0,0 @@
|
|||
# Parallel Milestone Orchestration
|
||||
|
||||
Run multiple milestones simultaneously in isolated git worktrees. Each milestone gets its own worker process, its own branch, and its own context window — while a coordinator tracks progress, enforces budgets, and keeps everything in sync.
|
||||
|
||||
> **Status:** Behind `parallel.enabled: false` by default. Opt-in only — zero impact to existing users.
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Enable parallel mode in your preferences:
|
||||
|
||||
```yaml
|
||||
---
|
||||
parallel:
|
||||
enabled: true
|
||||
max_workers: 2
|
||||
---
|
||||
```
|
||||
|
||||
2. Start parallel execution:
|
||||
|
||||
```
|
||||
/gsd parallel start
|
||||
```
|
||||
|
||||
GSD scans your milestones, checks dependencies and file overlap, shows an eligibility report, and spawns workers for eligible milestones.
|
||||
|
||||
3. Monitor progress:
|
||||
|
||||
```
|
||||
/gsd parallel status
|
||||
```
|
||||
|
||||
4. Stop when done:
|
||||
|
||||
```
|
||||
/gsd parallel stop
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Coordinator (your GSD session) │
|
||||
│ │
|
||||
│ Responsibilities: │
|
||||
│ - Eligibility analysis (deps + file overlap) │
|
||||
│ - Worker spawning and lifecycle │
|
||||
│ - Budget tracking across all workers │
|
||||
│ - Signal dispatch (pause/resume/stop) │
|
||||
│ - Session status monitoring │
|
||||
│ - Merge reconciliation │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Worker 1 │ │ Worker 2 │ │ Worker 3 │ ... │
|
||||
│ │ M001 │ │ M003 │ │ M005 │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ .gsd/worktrees/ .gsd/worktrees/ .gsd/worktrees/ │
|
||||
│ M001/ M003/ M005/ │
|
||||
│ (milestone/ (milestone/ (milestone/ │
|
||||
│ M001 branch) M003 branch) M005 branch) │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Worker Isolation
|
||||
|
||||
Each worker is a separate `gsd` process with complete isolation:
|
||||
|
||||
| Resource | Isolation Method |
|
||||
|----------|-----------------|
|
||||
| **Filesystem** | Git worktree — each worker has its own checkout |
|
||||
| **Git branch** | `milestone/<MID>` — one branch per milestone |
|
||||
| **State derivation** | `GSD_MILESTONE_LOCK` env var — `deriveState()` only sees the assigned milestone |
|
||||
| **Context window** | Separate process — each worker has its own agent sessions |
|
||||
| **Metrics** | Each worktree has its own `.gsd/metrics.json` |
|
||||
| **Crash recovery** | Each worktree has its own `.gsd/auto.lock` |
|
||||
|
||||
### Coordination
|
||||
|
||||
Workers and the coordinator communicate through file-based IPC:
|
||||
|
||||
- **Session status files** (`.gsd/parallel/<MID>.status.json`) — workers write heartbeats, the coordinator reads them
|
||||
- **Signal files** (`.gsd/parallel/<MID>.signal.json`) — coordinator writes signals, workers consume them
|
||||
- **Atomic writes** — write-to-temp + rename prevents partial reads
|
||||
|
||||
## Eligibility Analysis
|
||||
|
||||
Before starting parallel execution, GSD checks which milestones can safely run concurrently.
|
||||
|
||||
### Rules
|
||||
|
||||
1. **Not complete** — Finished milestones are skipped
|
||||
2. **Dependencies satisfied** — All `dependsOn` entries must have status `complete`
|
||||
3. **File overlap check** — Milestones touching the same files get a warning (but are still eligible)
|
||||
|
||||
### Example Report
|
||||
|
||||
```
|
||||
# Parallel Eligibility Report
|
||||
|
||||
## Eligible for Parallel Execution (2)
|
||||
|
||||
- **M002** — Auth System
|
||||
All dependencies satisfied.
|
||||
- **M003** — Dashboard UI
|
||||
All dependencies satisfied.
|
||||
|
||||
## Ineligible (2)
|
||||
|
||||
- **M001** — Core Types
|
||||
Already complete.
|
||||
- **M004** — API Integration
|
||||
Blocked by incomplete dependencies: M002.
|
||||
|
||||
## File Overlap Warnings (1)
|
||||
|
||||
- **M002** <-> **M003** — 2 shared file(s):
|
||||
- `src/types.ts`
|
||||
- `src/middleware.ts`
|
||||
```
|
||||
|
||||
File overlaps are warnings, not blockers. Both milestones work in separate worktrees, so they won't interfere at the filesystem level. Conflicts are detected and resolved during merge.
|
||||
|
||||
## Configuration
|
||||
|
||||
Add to `~/.gsd/preferences.md` or `.gsd/preferences.md`:
|
||||
|
||||
```yaml
|
||||
---
|
||||
parallel:
|
||||
enabled: false # Master toggle (default: false)
|
||||
max_workers: 2 # Concurrent workers (1-4, default: 2)
|
||||
budget_ceiling: 50.00 # Aggregate cost limit in dollars (optional)
|
||||
merge_strategy: "per-milestone" # When to merge: "per-slice" or "per-milestone"
|
||||
auto_merge: "confirm" # "auto", "confirm", or "manual"
|
||||
---
|
||||
```
|
||||
|
||||
### Configuration Reference
|
||||
|
||||
| Key | Type | Default | Description |
|
||||
|-----|------|---------|-------------|
|
||||
| `enabled` | boolean | `false` | Master toggle. Must be `true` for `/gsd parallel` commands to work. |
|
||||
| `max_workers` | number (1-4) | `2` | Maximum concurrent worker processes. Higher values use more memory and API budget. |
|
||||
| `budget_ceiling` | number | none | Aggregate cost ceiling in USD across all workers. When reached, no new units are dispatched. |
|
||||
| `merge_strategy` | `"per-slice"` or `"per-milestone"` | `"per-milestone"` | When worktree changes merge back to main. Per-milestone waits for the full milestone to complete. |
|
||||
| `auto_merge` | `"auto"`, `"confirm"`, `"manual"` | `"confirm"` | How merge-back is handled. `confirm` prompts before merging. `manual` requires explicit `/gsd parallel merge`. |
|
||||
|
||||
## Commands
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/gsd parallel start` | Analyze eligibility, confirm, and start workers |
|
||||
| `/gsd parallel status` | Show all workers with state, units completed, and cost |
|
||||
| `/gsd parallel stop` | Stop all workers (sends SIGTERM) |
|
||||
| `/gsd parallel stop M002` | Stop a specific milestone's worker |
|
||||
| `/gsd parallel pause` | Pause all workers (finish current unit, then wait) |
|
||||
| `/gsd parallel pause M002` | Pause a specific worker |
|
||||
| `/gsd parallel resume` | Resume all paused workers |
|
||||
| `/gsd parallel resume M002` | Resume a specific worker |
|
||||
| `/gsd parallel merge` | Merge all completed milestones back to main |
|
||||
| `/gsd parallel merge M002` | Merge a specific milestone back to main |
|
||||
|
||||
## Signal Lifecycle
|
||||
|
||||
The coordinator communicates with workers through signals:
|
||||
|
||||
```
|
||||
Coordinator Worker
|
||||
│ │
|
||||
├── sendSignal("pause") ──→ │
|
||||
│ ├── consumeSignal()
|
||||
│ ├── pauseAuto()
|
||||
│ │ (finish current unit, wait)
|
||||
│ │
|
||||
├── sendSignal("resume") ─→ │
|
||||
│ ├── consumeSignal()
|
||||
│ ├── resume dispatch loop
|
||||
│ │
|
||||
├── sendSignal("stop") ───→ │
|
||||
│ + SIGTERM ────────────→ │
|
||||
│ ├── consumeSignal() or SIGTERM handler
|
||||
│ ├── stopAuto()
|
||||
│ └── process exits
|
||||
```
|
||||
|
||||
Workers check for signals between units (in `handleAgentEnd`). The coordinator also sends `SIGTERM` for immediate response on stop.
|
||||
|
||||
## Merge Reconciliation
|
||||
|
||||
When milestones complete, their worktree changes need to merge back to main.
|
||||
|
||||
### Merge Order
|
||||
|
||||
- **Sequential** (default): Milestones merge in ID order (M001 before M002)
|
||||
- **By-completion**: Milestones merge in the order they finish
|
||||
|
||||
### Conflict Handling
|
||||
|
||||
1. `.gsd/` state files (STATE.md, metrics.json, etc.) — **auto-resolved** by accepting the milestone branch version
|
||||
2. Code conflicts — **stop and report**. The merge halts, showing which files conflict. Resolve manually and retry with `/gsd parallel merge <MID>`.
|
||||
|
||||
### Example
|
||||
|
||||
```
|
||||
/gsd parallel merge
|
||||
|
||||
# Merge Results
|
||||
|
||||
- **M002** — merged successfully (pushed)
|
||||
- **M003** — CONFLICT (2 file(s)):
|
||||
- `src/types.ts`
|
||||
- `src/middleware.ts`
|
||||
Resolve conflicts manually and run `/gsd parallel merge M003` to retry.
|
||||
```
|
||||
|
||||
## Budget Management
|
||||
|
||||
When `budget_ceiling` is set, the coordinator tracks aggregate cost across all workers:
|
||||
|
||||
- Cost is summed from each worker's session status
|
||||
- When the ceiling is reached, the coordinator signals workers to stop
|
||||
- Each worker also respects the project-level `budget_ceiling` preference independently
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
### Doctor Integration
|
||||
|
||||
`/gsd doctor` detects parallel session issues:
|
||||
|
||||
- **Stale parallel sessions** — Worker process died without cleanup. Doctor finds `.gsd/parallel/*.status.json` files with dead PIDs or expired heartbeats and removes them.
|
||||
|
||||
Run `/gsd doctor --fix` to clean up automatically.
|
||||
|
||||
### Stale Detection
|
||||
|
||||
Sessions are considered stale when:
|
||||
- The worker PID is no longer running (checked via `process.kill(pid, 0)`)
|
||||
- The last heartbeat is older than 30 seconds
|
||||
|
||||
The coordinator runs stale detection during `refreshWorkerStatuses()` and automatically removes dead sessions.
|
||||
|
||||
## Safety Model
|
||||
|
||||
| Safety Layer | Protection |
|
||||
|-------------|------------|
|
||||
| **Feature flag** | `parallel.enabled: false` by default — existing users unaffected |
|
||||
| **Eligibility analysis** | Dependency and file overlap checks before starting |
|
||||
| **Worker isolation** | Separate processes, worktrees, branches, context windows |
|
||||
| **`GSD_MILESTONE_LOCK`** | Each worker only sees its milestone in state derivation |
|
||||
| **`GSD_PARALLEL_WORKER`** | Workers cannot spawn nested parallel sessions |
|
||||
| **Budget ceiling** | Aggregate cost enforcement across all workers |
|
||||
| **Signal-based shutdown** | Graceful stop via file signals + SIGTERM |
|
||||
| **Doctor integration** | Detects and cleans up orphaned sessions |
|
||||
| **Conflict-aware merge** | Stops on code conflicts, auto-resolves `.gsd/` state conflicts |
|
||||
|
||||
## File Layout
|
||||
|
||||
```
|
||||
.gsd/
|
||||
├── parallel/ # Coordinator ↔ worker IPC
|
||||
│ ├── M002.status.json # Worker heartbeat + progress
|
||||
│ ├── M002.signal.json # Coordinator → worker signals
|
||||
│ ├── M003.status.json
|
||||
│ └── M003.signal.json
|
||||
├── worktrees/ # Git worktrees (one per milestone)
|
||||
│ ├── M002/ # M002's isolated checkout
|
||||
│ │ ├── .gsd/ # M002's own state files
|
||||
│ │ │ ├── auto.lock
|
||||
│ │ │ ├── metrics.json
|
||||
│ │ │ └── milestones/
|
||||
│ │ └── src/ # M002's working copy
|
||||
│ └── M003/
|
||||
│ └── ...
|
||||
└── ...
|
||||
```
|
||||
|
||||
Both `.gsd/parallel/` and `.gsd/worktrees/` are gitignored — they're runtime-only coordination files that never get committed.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Parallel mode is not enabled"
|
||||
|
||||
Set `parallel.enabled: true` in your preferences file.
|
||||
|
||||
### "No milestones are eligible for parallel execution"
|
||||
|
||||
All milestones are either complete or blocked by dependencies. Check `/gsd queue` to see milestone status and dependency chains.
|
||||
|
||||
### Worker crashed — how to recover
|
||||
|
||||
Workers now persist their state to disk automatically. If a worker process dies, the coordinator detects the dead PID via heartbeat expiry and marks the worker as crashed. On restart, the worker picks up from disk state — crash recovery, worktree re-entry, and completed-unit tracking carry over from the crashed session.
|
||||
|
||||
1. Run `/gsd doctor --fix` to clean up stale sessions
|
||||
2. Run `/gsd parallel status` to see current state
|
||||
3. Re-run `/gsd parallel start` to spawn new workers for remaining milestones
|
||||
|
||||
### Merge conflicts after parallel completion
|
||||
|
||||
1. Run `/gsd parallel merge` to see which milestones have conflicts
|
||||
2. Resolve conflicts in the worktree at `.gsd/worktrees/<MID>/`
|
||||
3. Retry with `/gsd parallel merge <MID>`
|
||||
|
||||
### Workers seem stuck
|
||||
|
||||
Check if budget ceiling was reached: `/gsd parallel status` shows per-worker costs. Increase `parallel.budget_ceiling` or remove it to continue.
|
||||
|
|
@ -1,57 +0,0 @@
|
|||
# The UI Architecture
|
||||
|
||||
Pi's TUI is a custom terminal rendering system. Understanding its architecture prevents most mistakes:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Terminal Window │
|
||||
│ │
|
||||
│ ┌────────────────────────────────────────────────────────┐ │
|
||||
│ │ Custom Header (ctx.ui.setHeader) │ │
|
||||
│ ├────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ Message Area │ │
|
||||
│ │ - User messages │ │
|
||||
│ │ - Assistant responses │ │
|
||||
│ │ - Tool calls and results ◄── renderCall/renderResult │ │
|
||||
│ │ - Custom messages ◄── registerMessageRenderer │ │
|
||||
│ │ - Notifications │ │
|
||||
│ │ │ │
|
||||
│ ├────────────────────────────────────────────────────────┤ │
|
||||
│ │ Widgets (above editor) ◄── ctx.ui.setWidget │ │
|
||||
│ ├────────────────────────────────────────────────────────┤ │
|
||||
│ │ │ │
|
||||
│ │ Editor ◄── Can be replaced by: │ │
|
||||
│ │ - ctx.ui.custom() (temporary full replacement) │ │
|
||||
│ │ - ctx.ui.setEditorComponent() (permanent replace) │ │
|
||||
│ │ │ │
|
||||
│ ├────────────────────────────────────────────────────────┤ │
|
||||
│ │ Widgets (below editor) ◄── ctx.ui.setWidget │ │
|
||||
│ ├────────────────────────────────────────────────────────┤ │
|
||||
│ │ Footer ◄── ctx.ui.setFooter / ctx.ui.setStatus │ │
|
||||
│ └────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌────────────────────────┐ │
|
||||
│ │ Overlay (floating) │ ◄── ctx.ui.custom({ overlay }) │
|
||||
│ │ Rendered on top of │ │
|
||||
│ │ everything │ │
|
||||
│ └────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
**Key principles:**
|
||||
- Everything renders as **arrays of strings** (one per line)
|
||||
- Each line **must not exceed the `width` parameter** — this is enforced
|
||||
- **ANSI escape codes** are used for styling — they don't count toward visible width
|
||||
- **Styles do NOT carry across lines** — the TUI resets SGR at the end of each line
|
||||
- All **state changes require explicit invalidation** followed by a render request
|
||||
- **Theme is always passed via callbacks** — never import it directly
|
||||
|
||||
### Packages
|
||||
|
||||
| Package | What it provides |
|
||||
|---------|-----------------|
|
||||
| `@mariozechner/pi-tui` | Core components (`Text`, `Box`, `Container`, `SelectList`, etc.), keyboard handling, text utilities |
|
||||
| `@mariozechner/pi-coding-agent` | Higher-level components (`DynamicBorder`, `BorderedLoader`, `CustomEditor`), theming helpers, code highlighting |
|
||||
|
||||
---
|
||||
|
|
@ -1,44 +0,0 @@
|
|||
# The Component Interface — Foundation of Everything
|
||||
|
||||
Every visual element in Pi implements this interface:
|
||||
|
||||
```typescript
|
||||
interface Component {
|
||||
render(width: number): string[];
|
||||
handleInput?(data: string): void;
|
||||
wantsKeyRelease?: boolean;
|
||||
invalidate(): void;
|
||||
}
|
||||
```
|
||||
|
||||
| Method | Purpose | Required? |
|
||||
|--------|---------|-----------|
|
||||
| `render(width)` | Return array of strings (one per line). Each line ≤ `width` visible chars. | **Yes** |
|
||||
| `handleInput(data)` | Receive keyboard input when component has focus. | Optional |
|
||||
| `wantsKeyRelease` | If `true`, receive key release events (Kitty protocol). | Optional, default `false` |
|
||||
| `invalidate()` | Clear cached render state. Called on theme changes. | **Yes** |
|
||||
|
||||
### The Render Contract
|
||||
|
||||
```typescript
|
||||
render(width: number): string[] {
|
||||
// MUST return an array of strings
|
||||
// Each string MUST NOT exceed `width` in visible characters
|
||||
// ANSI escape codes (colors, styles) don't count toward visible width
|
||||
// Styles are reset at end of each line — reapply per line
|
||||
// Return [] for zero-height component
|
||||
}
|
||||
```
|
||||
|
||||
### The Invalidation Contract
|
||||
|
||||
```typescript
|
||||
invalidate(): void {
|
||||
// Clear ALL cached render output
|
||||
// Clear any pre-baked themed strings
|
||||
// Call super.invalidate() if extending a built-in component
|
||||
// After invalidation, next render() must produce fresh output
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,19 +0,0 @@
|
|||
# Entry Points — How UI Gets on Screen
|
||||
|
||||
There are **six different ways** to put custom UI on screen, each for a different purpose:
|
||||
|
||||
| Method | Purpose | Blocks? | Replaces editor? |
|
||||
|--------|---------|---------|-------------------|
|
||||
| `ctx.ui.select/confirm/input/editor` | Quick dialogs | Yes | Temporarily |
|
||||
| `ctx.ui.notify` | Toast notifications | No | No |
|
||||
| `ctx.ui.setStatus` | Footer status text | No | No |
|
||||
| `ctx.ui.setWidget` | Persistent widget above/below editor | No | No |
|
||||
| `ctx.ui.setFooter` | Replace entire footer | No | No (replaces footer) |
|
||||
| `ctx.ui.custom()` | Full custom component | Yes | Temporarily |
|
||||
| `ctx.ui.custom({overlay})` | Floating overlay | Yes | No (renders on top) |
|
||||
| `ctx.ui.setEditorComponent` | Replace editor permanently | No | Yes (permanently) |
|
||||
| `ctx.ui.setHeader` | Custom startup header | No | No (replaces header) |
|
||||
| `renderCall/renderResult` | Tool display | No | No (inline in messages) |
|
||||
| `registerMessageRenderer` | Custom message display | No | No (inline in messages) |
|
||||
|
||||
---
|
||||
|
|
@ -1,77 +0,0 @@
|
|||
# Built-in Dialog Methods
|
||||
|
||||
The simplest UI — blocking dialogs that wait for user response:
|
||||
|
||||
### Selection
|
||||
|
||||
```typescript
|
||||
const choice = await ctx.ui.select("Pick a color:", ["Red", "Green", "Blue"]);
|
||||
// Returns: "Red" | "Green" | "Blue" | undefined (if cancelled)
|
||||
```
|
||||
|
||||
### Confirmation
|
||||
|
||||
```typescript
|
||||
const ok = await ctx.ui.confirm("Delete file?", "This action cannot be undone.");
|
||||
// Returns: true | false
|
||||
```
|
||||
|
||||
### Text Input
|
||||
|
||||
```typescript
|
||||
const name = await ctx.ui.input("Project name:", "my-project");
|
||||
// Returns: string | undefined (if cancelled)
|
||||
```
|
||||
|
||||
### Multi-line Editor
|
||||
|
||||
```typescript
|
||||
const text = await ctx.ui.editor("Edit the description:", "Default text here");
|
||||
// Returns: string | undefined (if cancelled)
|
||||
```
|
||||
|
||||
### Timed Dialogs (Auto-Dismiss)
|
||||
|
||||
Dialogs can auto-dismiss with a live countdown:
|
||||
|
||||
```typescript
|
||||
// Shows "Confirm? (5s)" → "Confirm? (4s)" → ... → auto-dismisses
|
||||
const ok = await ctx.ui.confirm(
|
||||
"Auto-proceed?",
|
||||
"Continuing in 5 seconds...",
|
||||
{ timeout: 5000 }
|
||||
);
|
||||
// Returns false on timeout
|
||||
```
|
||||
|
||||
**Timeout return values:**
|
||||
- `select()` → `undefined`
|
||||
- `confirm()` → `false`
|
||||
- `input()` → `undefined`
|
||||
|
||||
### Manual Dismissal with AbortSignal
|
||||
|
||||
For more control (distinguish timeout from user cancel):
|
||||
|
||||
```typescript
|
||||
const controller = new AbortController();
|
||||
const timeoutId = setTimeout(() => controller.abort(), 5000);
|
||||
|
||||
const ok = await ctx.ui.confirm(
|
||||
"Timed Confirm",
|
||||
"Auto-cancels in 5s",
|
||||
{ signal: controller.signal }
|
||||
);
|
||||
|
||||
clearTimeout(timeoutId);
|
||||
|
||||
if (ok) {
|
||||
// User confirmed
|
||||
} else if (controller.signal.aborted) {
|
||||
// Timed out
|
||||
} else {
|
||||
// User cancelled (Escape)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,120 +0,0 @@
|
|||
# Persistent UI Elements
|
||||
|
||||
These stay on screen until explicitly cleared:
|
||||
|
||||
### Status (Footer)
|
||||
|
||||
```typescript
|
||||
// Set (persists until cleared or overwritten)
|
||||
ctx.ui.setStatus("my-ext", "● Active");
|
||||
ctx.ui.setStatus("my-ext", ctx.ui.theme.fg("accent", "● Mode: Plan"));
|
||||
|
||||
// Clear
|
||||
ctx.ui.setStatus("my-ext", undefined);
|
||||
```
|
||||
|
||||
Multiple extensions can set independent status entries. They appear in the footer.
|
||||
|
||||
### Widgets (Above/Below Editor)
|
||||
|
||||
```typescript
|
||||
// Simple string array (above editor, default)
|
||||
ctx.ui.setWidget("my-widget", ["Line 1", "Line 2", "Line 3"]);
|
||||
|
||||
// Below editor
|
||||
ctx.ui.setWidget("my-widget", ["Below the editor!"], { placement: "belowEditor" });
|
||||
|
||||
// With theme (component factory)
|
||||
ctx.ui.setWidget("my-widget", (_tui, theme) => {
|
||||
const lines = items.map(item =>
|
||||
item.done
|
||||
? theme.fg("success", "✓ ") + theme.fg("muted", theme.strikethrough(item.text))
|
||||
: theme.fg("dim", "○ ") + item.text
|
||||
);
|
||||
return {
|
||||
render: () => lines,
|
||||
invalidate: () => {},
|
||||
};
|
||||
});
|
||||
|
||||
// Clear
|
||||
ctx.ui.setWidget("my-widget", undefined);
|
||||
```
|
||||
|
||||
### Working Message (During Streaming)
|
||||
|
||||
```typescript
|
||||
ctx.ui.setWorkingMessage("Analyzing code structure...");
|
||||
ctx.ui.setWorkingMessage(); // Restore default
|
||||
```
|
||||
|
||||
### Custom Footer (Full Replacement)
|
||||
|
||||
```typescript
|
||||
ctx.ui.setFooter((tui, theme, footerData) => ({
|
||||
invalidate() {},
|
||||
render(width: number): string[] {
|
||||
const branch = footerData.getGitBranch(); // Not accessible elsewhere!
|
||||
const statuses = footerData.getExtensionStatuses(); // All setStatus values
|
||||
const left = theme.fg("dim", `${ctx.model?.id || "no-model"}`);
|
||||
const right = theme.fg("dim", branch || "no git");
|
||||
const pad = " ".repeat(Math.max(1, width - visibleWidth(left) - visibleWidth(right)));
|
||||
return [truncateToWidth(left + pad + right, width)];
|
||||
},
|
||||
// Reactive: re-render when branch changes
|
||||
dispose: footerData.onBranchChange(() => tui.requestRender()),
|
||||
}));
|
||||
|
||||
// Restore default
|
||||
ctx.ui.setFooter(undefined);
|
||||
```
|
||||
|
||||
**`footerData` provides:**
|
||||
- `getGitBranch(): string | null` — current git branch (not accessible through any other API)
|
||||
- `getExtensionStatuses(): ReadonlyMap<string, string>` — all `setStatus` values
|
||||
- `onBranchChange(callback): () => void` — subscribe to branch changes, returns dispose function
|
||||
|
||||
### Custom Header
|
||||
|
||||
```typescript
|
||||
ctx.ui.setHeader((tui, theme) => ({
|
||||
render(width: number): string[] {
|
||||
return [theme.fg("accent", theme.bold("My Custom Header"))];
|
||||
},
|
||||
invalidate() {},
|
||||
}));
|
||||
```
|
||||
|
||||
### Editor Control
|
||||
|
||||
```typescript
|
||||
// Set editor text
|
||||
ctx.ui.setEditorText("Prefilled text for the user");
|
||||
|
||||
// Get current editor text
|
||||
const current = ctx.ui.getEditorText();
|
||||
|
||||
// Paste into editor (triggers paste handling, including collapse for large content)
|
||||
ctx.ui.pasteToEditor("pasted content");
|
||||
|
||||
// Tool output expansion
|
||||
const wasExpanded = ctx.ui.getToolsExpanded();
|
||||
ctx.ui.setToolsExpanded(true); // Expand all
|
||||
ctx.ui.setToolsExpanded(false); // Collapse all
|
||||
|
||||
// Terminal title
|
||||
ctx.ui.setTitle("pi - my project");
|
||||
```
|
||||
|
||||
### Theme Management
|
||||
|
||||
```typescript
|
||||
const themes = ctx.ui.getAllThemes(); // [{ name: "dark", path: ... }, ...]
|
||||
const lightTheme = ctx.ui.getTheme("light"); // Load without switching
|
||||
const result = ctx.ui.setTheme("light"); // Switch by name
|
||||
if (!result.success) ctx.ui.notify(result.error!, "error");
|
||||
ctx.ui.setTheme(lightTheme!); // Switch by Theme object
|
||||
ctx.ui.theme.fg("accent", "styled text"); // Access current theme
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,130 +0,0 @@
|
|||
# ctx.ui.custom() — Full Custom Components
|
||||
|
||||
This is the most powerful UI mechanism. It **temporarily replaces the editor** with your component. Returns a value when `done()` is called.
|
||||
|
||||
### Basic Pattern
|
||||
|
||||
```typescript
|
||||
const result = await ctx.ui.custom<string | null>((tui, theme, keybindings, done) => {
|
||||
// tui — TUI instance (requestRender, screen dimensions)
|
||||
// theme — Current theme for styling
|
||||
// keybindings — App keybinding manager
|
||||
// done(value) — Call to close component and return value
|
||||
|
||||
return {
|
||||
render(width: number): string[] {
|
||||
return [
|
||||
theme.fg("accent", "─".repeat(width)),
|
||||
" Press Enter to confirm, Escape to cancel",
|
||||
theme.fg("accent", "─".repeat(width)),
|
||||
];
|
||||
},
|
||||
handleInput(data: string) {
|
||||
if (matchesKey(data, Key.enter)) done("confirmed");
|
||||
if (matchesKey(data, Key.escape)) done(null);
|
||||
},
|
||||
invalidate() {},
|
||||
};
|
||||
});
|
||||
|
||||
if (result === "confirmed") {
|
||||
ctx.ui.notify("Confirmed!", "info");
|
||||
}
|
||||
```
|
||||
|
||||
### The Factory Callback
|
||||
|
||||
The factory function receives four arguments:
|
||||
|
||||
| Argument | Type | Purpose |
|
||||
|----------|------|---------|
|
||||
| `tui` | `TUI` | Screen info and render control. `tui.requestRender()` triggers re-render after state changes. |
|
||||
| `theme` | `Theme` | Current theme. Use `theme.fg()`, `theme.bg()`, `theme.bold()`, etc. |
|
||||
| `keybindings` | `KeybindingsManager` | App keybinding config. For checking what keys do what. |
|
||||
| `done` | `(value: T) => void` | Call this to close the component and return a value to the awaiting code. |
|
||||
|
||||
### Using Existing Components as Children
|
||||
|
||||
```typescript
|
||||
const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
|
||||
const container = new Container();
|
||||
container.addChild(new DynamicBorder((s: string) => theme.fg("accent", s)));
|
||||
container.addChild(new Text(theme.fg("accent", theme.bold("Title")), 1, 0));
|
||||
|
||||
const selectList = new SelectList(items, 10, {
|
||||
selectedPrefix: (t) => theme.fg("accent", t),
|
||||
selectedText: (t) => theme.fg("accent", t),
|
||||
description: (t) => theme.fg("muted", t),
|
||||
scrollInfo: (t) => theme.fg("dim", t),
|
||||
noMatch: (t) => theme.fg("warning", t),
|
||||
});
|
||||
selectList.onSelect = (item) => done(item.value);
|
||||
selectList.onCancel = () => done(null);
|
||||
container.addChild(selectList);
|
||||
|
||||
container.addChild(new DynamicBorder((s: string) => theme.fg("accent", s)));
|
||||
|
||||
return {
|
||||
render: (w) => container.render(w),
|
||||
invalidate: () => container.invalidate(),
|
||||
handleInput: (data) => { selectList.handleInput(data); tui.requestRender(); },
|
||||
};
|
||||
});
|
||||
```
|
||||
|
||||
### Using a Class
|
||||
|
||||
```typescript
|
||||
class MyComponent {
|
||||
private selected = 0;
|
||||
private items: string[];
|
||||
private done: (value: string | null) => void;
|
||||
private tui: { requestRender: () => void };
|
||||
private cachedWidth?: number;
|
||||
private cachedLines?: string[];
|
||||
|
||||
constructor(tui: TUI, items: string[], done: (value: string | null) => void) {
|
||||
this.tui = tui;
|
||||
this.items = items;
|
||||
this.done = done;
|
||||
}
|
||||
|
||||
handleInput(data: string) {
|
||||
if (matchesKey(data, Key.up) && this.selected > 0) {
|
||||
this.selected--;
|
||||
this.invalidate();
|
||||
this.tui.requestRender();
|
||||
} else if (matchesKey(data, Key.down) && this.selected < this.items.length - 1) {
|
||||
this.selected++;
|
||||
this.invalidate();
|
||||
this.tui.requestRender();
|
||||
} else if (matchesKey(data, Key.enter)) {
|
||||
this.done(this.items[this.selected]);
|
||||
} else if (matchesKey(data, Key.escape)) {
|
||||
this.done(null);
|
||||
}
|
||||
}
|
||||
|
||||
render(width: number): string[] {
|
||||
if (this.cachedLines && this.cachedWidth === width) return this.cachedLines;
|
||||
this.cachedLines = this.items.map((item, i) => {
|
||||
const prefix = i === this.selected ? "> " : " ";
|
||||
return truncateToWidth(prefix + item, width);
|
||||
});
|
||||
this.cachedWidth = width;
|
||||
return this.cachedLines;
|
||||
}
|
||||
|
||||
invalidate() {
|
||||
this.cachedWidth = undefined;
|
||||
this.cachedLines = undefined;
|
||||
}
|
||||
}
|
||||
|
||||
// Usage:
|
||||
const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
|
||||
return new MyComponent(tui, ["Option A", "Option B", "Option C"], done);
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,198 +0,0 @@
|
|||
# Built-in Components — The Building Blocks
|
||||
|
||||
Import from `@mariozechner/pi-tui`:
|
||||
|
||||
### Text
|
||||
|
||||
Multi-line text with automatic word wrapping and optional background.
|
||||
|
||||
```typescript
|
||||
import { Text } from "@mariozechner/pi-tui";
|
||||
|
||||
const text = new Text(
|
||||
"Hello World\nSecond line", // content (supports \n)
|
||||
1, // paddingX (default: 1)
|
||||
1, // paddingY (default: 1)
|
||||
(s) => bgGray(s) // optional background function
|
||||
);
|
||||
|
||||
text.setText("Updated content"); // Update text dynamically
|
||||
```
|
||||
|
||||
**When to use:** Single or multi-line text blocks, styled labels, error messages.
|
||||
|
||||
### Box
|
||||
|
||||
Container with padding and background color. Add children inside it.
|
||||
|
||||
```typescript
|
||||
import { Box } from "@mariozechner/pi-tui";
|
||||
|
||||
const box = new Box(
|
||||
1, // paddingX
|
||||
1, // paddingY
|
||||
(s) => bgGray(s) // background function
|
||||
);
|
||||
box.addChild(new Text("Content inside a box", 0, 0));
|
||||
box.setBgFn((s) => bgBlue(s)); // Change background dynamically
|
||||
```
|
||||
|
||||
**When to use:** Visually grouping content with a colored background.
|
||||
|
||||
### Container
|
||||
|
||||
Groups child components vertically (stacked). No visual styling of its own.
|
||||
|
||||
```typescript
|
||||
import { Container } from "@mariozechner/pi-tui";
|
||||
|
||||
const container = new Container();
|
||||
container.addChild(component1);
|
||||
container.addChild(component2);
|
||||
container.removeChild(component1);
|
||||
container.clear(); // Remove all children
|
||||
```
|
||||
|
||||
**When to use:** Composing complex layouts from simpler components.
|
||||
|
||||
### Spacer
|
||||
|
||||
Empty vertical space.
|
||||
|
||||
```typescript
|
||||
import { Spacer } from "@mariozechner/pi-tui";
|
||||
|
||||
const spacer = new Spacer(2); // 2 empty lines
|
||||
```
|
||||
|
||||
**When to use:** Visual separation between components.
|
||||
|
||||
### Markdown
|
||||
|
||||
Renders markdown with full formatting and syntax highlighting.
|
||||
|
||||
```typescript
|
||||
import { Markdown } from "@mariozechner/pi-tui";
|
||||
import { getMarkdownTheme } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
const md = new Markdown(
|
||||
"# Title\n\nSome **bold** text\n\n```js\nconst x = 1;\n```",
|
||||
1, // paddingX
|
||||
1, // paddingY
|
||||
getMarkdownTheme() // MarkdownTheme (from pi-coding-agent)
|
||||
);
|
||||
|
||||
md.setText("Updated markdown content");
|
||||
```
|
||||
|
||||
**When to use:** Rendering documentation, help text, formatted content.
|
||||
|
||||
### Image
|
||||
|
||||
Renders images in supported terminals (Kitty, iTerm2, Ghostty, WezTerm).
|
||||
|
||||
```typescript
|
||||
import { Image } from "@mariozechner/pi-tui";
|
||||
|
||||
const image = new Image(
|
||||
base64Data, // base64-encoded image data
|
||||
"image/png", // MIME type
|
||||
theme, // ImageTheme
|
||||
{ maxWidthCells: 80, maxHeightCells: 24 } // Optional size constraints
|
||||
);
|
||||
```
|
||||
|
||||
**When to use:** Displaying generated images, screenshots, diagrams.
|
||||
|
||||
### SelectList
|
||||
|
||||
Interactive selection from a list with search, scrolling, and descriptions.
|
||||
|
||||
```typescript
|
||||
import { SelectList, type SelectItem } from "@mariozechner/pi-tui";
|
||||
|
||||
const items: SelectItem[] = [
|
||||
{ value: "opt1", label: "Option 1", description: "First option" },
|
||||
{ value: "opt2", label: "Option 2", description: "Second option" },
|
||||
{ value: "opt3", label: "Option 3" }, // description is optional
|
||||
];
|
||||
|
||||
const selectList = new SelectList(
|
||||
items,
|
||||
10, // maxVisible (scrollable if more items)
|
||||
{
|
||||
selectedPrefix: (t) => theme.fg("accent", t),
|
||||
selectedText: (t) => theme.fg("accent", t),
|
||||
description: (t) => theme.fg("muted", t),
|
||||
scrollInfo: (t) => theme.fg("dim", t),
|
||||
noMatch: (t) => theme.fg("warning", t),
|
||||
}
|
||||
);
|
||||
|
||||
selectList.onSelect = (item) => { /* item.value */ };
|
||||
selectList.onCancel = () => { /* escape pressed */ };
|
||||
```
|
||||
|
||||
**When to use:** Letting users pick from a list. Handles arrow keys, search filtering, scrolling.
|
||||
|
||||
### SettingsList
|
||||
|
||||
Toggle settings with left/right arrow keys.
|
||||
|
||||
```typescript
|
||||
import { SettingsList, type SettingItem } from "@mariozechner/pi-tui";
|
||||
import { getSettingsListTheme } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
const items: SettingItem[] = [
|
||||
{ id: "verbose", label: "Verbose mode", currentValue: "off", values: ["on", "off"] },
|
||||
{ id: "theme", label: "Theme", currentValue: "dark", values: ["dark", "light", "auto"] },
|
||||
];
|
||||
|
||||
const settingsList = new SettingsList(
|
||||
items,
|
||||
Math.min(items.length + 2, 15), // maxVisible
|
||||
getSettingsListTheme(),
|
||||
(id, newValue) => { /* setting changed */ },
|
||||
() => { /* close requested (escape) */ },
|
||||
{ enableSearch: true }, // Optional: fuzzy search by label
|
||||
);
|
||||
```
|
||||
|
||||
**When to use:** Settings panels, toggle groups, configuration UIs.
|
||||
|
||||
### Input
|
||||
|
||||
Text input field with cursor.
|
||||
|
||||
```typescript
|
||||
import { Input } from "@mariozechner/pi-tui";
|
||||
|
||||
const input = new Input();
|
||||
input.setText("initial value");
|
||||
// Route keyboard input via handleInput
|
||||
```
|
||||
|
||||
### Editor
|
||||
|
||||
Multi-line text editor with undo, word deletion, cursor movement.
|
||||
|
||||
```typescript
|
||||
import { Editor, type EditorTheme } from "@mariozechner/pi-tui";
|
||||
|
||||
const editorTheme: EditorTheme = {
|
||||
borderColor: (s) => theme.fg("accent", s),
|
||||
selectList: {
|
||||
selectedPrefix: (t) => theme.fg("accent", t),
|
||||
selectedText: (t) => theme.fg("accent", t),
|
||||
description: (t) => theme.fg("muted", t),
|
||||
scrollInfo: (t) => theme.fg("dim", t),
|
||||
noMatch: (t) => theme.fg("warning", t),
|
||||
},
|
||||
};
|
||||
|
||||
const editor = new Editor(tui, editorTheme);
|
||||
editor.setText("prefilled");
|
||||
editor.onSubmit = (value) => { /* enter pressed */ };
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
# High-Level Components from pi-coding-agent
|
||||
|
||||
### DynamicBorder
|
||||
|
||||
A horizontal border line with themed color. Use for framing dialogs.
|
||||
|
||||
```typescript
|
||||
import { DynamicBorder } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
// ⚠️ MUST explicitly type the parameter as string
|
||||
const border = new DynamicBorder((s: string) => theme.fg("accent", s));
|
||||
```
|
||||
|
||||
### BorderedLoader
|
||||
|
||||
Spinner with cancel support. Shows a message and an animated spinner while async work runs.
|
||||
|
||||
```typescript
|
||||
import { BorderedLoader } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
const result = await ctx.ui.custom<string | null>((tui, theme, _kb, done) => {
|
||||
const loader = new BorderedLoader(tui, theme, "Fetching data...");
|
||||
loader.onAbort = () => done(null); // Escape pressed
|
||||
|
||||
// Do async work with the loader's AbortSignal
|
||||
fetchData(loader.signal)
|
||||
.then(data => done(data))
|
||||
.catch(() => done(null));
|
||||
|
||||
return loader;
|
||||
});
|
||||
```
|
||||
|
||||
### CustomEditor
|
||||
|
||||
Base class for custom editors that replace the input. Provides app keybindings (escape to abort, ctrl+d, model switching) automatically.
|
||||
|
||||
```typescript
|
||||
import { CustomEditor } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
class MyEditor extends CustomEditor {
|
||||
handleInput(data: string): void {
|
||||
// Handle your keys first
|
||||
if (data === "x") { /* custom behavior */ return; }
|
||||
// Fall through to CustomEditor for app keybindings + text editing
|
||||
super.handleInput(data);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,62 +0,0 @@
|
|||
# Keyboard Input — How to Handle Keys
|
||||
|
||||
### matchesKey — The Key Detection Function
|
||||
|
||||
```typescript
|
||||
import { matchesKey, Key } from "@mariozechner/pi-tui";
|
||||
|
||||
handleInput(data: string) {
|
||||
// Using Key constants (recommended — gives autocomplete)
|
||||
if (matchesKey(data, Key.up)) { /* arrow up */ }
|
||||
if (matchesKey(data, Key.down)) { /* arrow down */ }
|
||||
if (matchesKey(data, Key.left)) { /* arrow left */ }
|
||||
if (matchesKey(data, Key.right)) { /* arrow right */ }
|
||||
if (matchesKey(data, Key.enter)) { /* enter */ }
|
||||
if (matchesKey(data, Key.escape)) { /* escape */ }
|
||||
if (matchesKey(data, Key.tab)) { /* tab */ }
|
||||
if (matchesKey(data, Key.space)) { /* space */ }
|
||||
if (matchesKey(data, Key.backspace)) { /* backspace */ }
|
||||
if (matchesKey(data, Key.delete)) { /* delete */ }
|
||||
if (matchesKey(data, Key.home)) { /* home */ }
|
||||
if (matchesKey(data, Key.end)) { /* end */ }
|
||||
|
||||
// With modifiers
|
||||
if (matchesKey(data, Key.ctrl("c"))) { /* ctrl+c */ }
|
||||
if (matchesKey(data, Key.ctrl("x"))) { /* ctrl+x */ }
|
||||
if (matchesKey(data, Key.shift("tab"))) { /* shift+tab */ }
|
||||
if (matchesKey(data, Key.alt("left"))) { /* alt+left */ }
|
||||
if (matchesKey(data, Key.ctrlShift("p"))) { /* ctrl+shift+p */ }
|
||||
|
||||
// String format also works
|
||||
if (matchesKey(data, "enter")) { }
|
||||
if (matchesKey(data, "ctrl+c")) { }
|
||||
if (matchesKey(data, "shift+tab")) { }
|
||||
|
||||
// Printable character detection
|
||||
if (data.length === 1 && data.charCodeAt(0) >= 32) {
|
||||
// It's a printable character (letter, number, symbol)
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key identifiers Reference
|
||||
|
||||
| Category | Keys |
|
||||
|----------|------|
|
||||
| Basic | `enter`, `escape`, `tab`, `space`, `backspace`, `delete`, `home`, `end` |
|
||||
| Arrow | `up`, `down`, `left`, `right` |
|
||||
| Modifiers | `ctrl("x")`, `shift("tab")`, `alt("left")`, `ctrlShift("p")` |
|
||||
|
||||
### The handleInput Contract
|
||||
|
||||
```typescript
|
||||
handleInput(data: string): void {
|
||||
// 1. Check for your keys
|
||||
// 2. Update state
|
||||
// 3. Call this.invalidate() if render output changes
|
||||
// 4. Call tui.requestRender() to trigger a re-render
|
||||
// (or if you're the top-level custom component, the TUI does this automatically)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,46 +0,0 @@
|
|||
# Line Width — The Cardinal Rule
|
||||
|
||||
**Every line from `render()` MUST NOT exceed the `width` parameter in visible characters.** This is the single most common source of rendering bugs.
|
||||
|
||||
### Utilities
|
||||
|
||||
```typescript
|
||||
import { visibleWidth, truncateToWidth, wrapTextWithAnsi } from "@mariozechner/pi-tui";
|
||||
|
||||
// Get display width (ignores ANSI escape codes)
|
||||
visibleWidth("\x1b[32mHello\x1b[0m"); // Returns 5, not 14
|
||||
|
||||
// Truncate to fit width (preserves ANSI codes)
|
||||
truncateToWidth("Very long text here", 10); // "Very lo..."
|
||||
truncateToWidth("Very long text here", 10, ""); // "Very long " (no ellipsis)
|
||||
truncateToWidth("Very long text here", 10, "→"); // "Very long→"
|
||||
|
||||
// Word wrap preserving ANSI codes
|
||||
wrapTextWithAnsi("\x1b[32mThis is a long green text\x1b[0m", 15);
|
||||
// Returns ["This is a long", "green text"] with ANSI codes preserved per line
|
||||
```
|
||||
|
||||
### The Pattern
|
||||
|
||||
```typescript
|
||||
render(width: number): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
// Always truncate any line that could exceed width
|
||||
lines.push(truncateToWidth(` ${prefix}${content}`, width));
|
||||
|
||||
// For dynamic content, calculate available space
|
||||
const labelWidth = visibleWidth(label);
|
||||
const available = width - labelWidth - 4; // Leave room for padding
|
||||
const truncated = truncateToWidth(value, available);
|
||||
lines.push(` ${label}: ${truncated}`);
|
||||
|
||||
return lines;
|
||||
}
|
||||
```
|
||||
|
||||
### Why This Matters
|
||||
|
||||
If a line exceeds `width`, the terminal wraps it, causing visual corruption — lines overlap, the cursor mispositions, and the entire TUI can become garbled. The TUI framework **cannot fix this for you** because it doesn't know how you want lines truncated.
|
||||
|
||||
---
|
||||
|
|
@ -1,72 +0,0 @@
|
|||
# Theming — Colors and Styles
|
||||
|
||||
### Using Theme Colors
|
||||
|
||||
The `theme` object is always passed via callbacks — never import it directly.
|
||||
|
||||
```typescript
|
||||
// Foreground color
|
||||
theme.fg("accent", "Highlighted text") // Apply foreground color
|
||||
theme.fg("success", "✓ Passed")
|
||||
theme.fg("error", "✗ Failed")
|
||||
theme.fg("warning", "⚠ Warning")
|
||||
theme.fg("muted", "Secondary text")
|
||||
theme.fg("dim", "Tertiary text")
|
||||
|
||||
// Background color
|
||||
theme.bg("selectedBg", "Selected item")
|
||||
theme.bg("toolSuccessBg", "Success background")
|
||||
|
||||
// Text styles
|
||||
theme.bold("Bold text")
|
||||
theme.italic("Italic text")
|
||||
theme.strikethrough("Struck through")
|
||||
|
||||
// Combination
|
||||
theme.fg("accent", theme.bold("Bold and colored"))
|
||||
theme.bg("selectedBg", theme.fg("text", " Selected "))
|
||||
```
|
||||
|
||||
### All Foreground Colors
|
||||
|
||||
| Category | Colors |
|
||||
|----------|--------|
|
||||
| **General** | `text`, `accent`, `muted`, `dim` |
|
||||
| **Status** | `success`, `error`, `warning` |
|
||||
| **Borders** | `border`, `borderAccent`, `borderMuted` |
|
||||
| **Messages** | `userMessageText`, `customMessageText`, `customMessageLabel` |
|
||||
| **Tools** | `toolTitle`, `toolOutput` |
|
||||
| **Diffs** | `toolDiffAdded`, `toolDiffRemoved`, `toolDiffContext` |
|
||||
| **Markdown** | `mdHeading`, `mdLink`, `mdLinkUrl`, `mdCode`, `mdCodeBlock`, `mdCodeBlockBorder`, `mdQuote`, `mdQuoteBorder`, `mdHr`, `mdListBullet` |
|
||||
| **Syntax** | `syntaxComment`, `syntaxKeyword`, `syntaxFunction`, `syntaxVariable`, `syntaxString`, `syntaxNumber`, `syntaxType`, `syntaxOperator`, `syntaxPunctuation` |
|
||||
| **Thinking** | `thinkingOff`, `thinkingMinimal`, `thinkingLow`, `thinkingMedium`, `thinkingHigh`, `thinkingXhigh` |
|
||||
| **Modes** | `bashMode` |
|
||||
|
||||
### All Background Colors
|
||||
|
||||
`selectedBg`, `userMessageBg`, `customMessageBg`, `toolPendingBg`, `toolSuccessBg`, `toolErrorBg`
|
||||
|
||||
### Syntax Highlighting
|
||||
|
||||
```typescript
|
||||
import { highlightCode, getLanguageFromPath } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
// Highlight with explicit language
|
||||
const highlighted = highlightCode("const x = 1;", "typescript", theme);
|
||||
|
||||
// Auto-detect from file path
|
||||
const lang = getLanguageFromPath("/path/to/file.rs"); // "rust"
|
||||
const highlighted = highlightCode(code, lang, theme);
|
||||
```
|
||||
|
||||
### Markdown Theme
|
||||
|
||||
```typescript
|
||||
import { getMarkdownTheme } from "@mariozechner/pi-coding-agent";
|
||||
import { Markdown } from "@mariozechner/pi-tui";
|
||||
|
||||
const mdTheme = getMarkdownTheme();
|
||||
const md = new Markdown(content, 1, 1, mdTheme);
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,115 +0,0 @@
|
|||
# Overlays — Floating Modals and Panels
|
||||
|
||||
Overlays render **on top of existing content** without clearing the screen. Essential for dialogs, side panels, and floating UI.
|
||||
|
||||
### Basic Overlay
|
||||
|
||||
```typescript
|
||||
const result = await ctx.ui.custom<string | null>(
|
||||
(tui, theme, keybindings, done) => new MyDialog({ onClose: done }),
|
||||
{ overlay: true }
|
||||
);
|
||||
```
|
||||
|
||||
### Positioned Overlay
|
||||
|
||||
```typescript
|
||||
const result = await ctx.ui.custom<string | null>(
|
||||
(tui, theme, _kb, done) => new SidePanel({ onClose: done }),
|
||||
{
|
||||
overlay: true,
|
||||
overlayOptions: {
|
||||
// Size (number = columns, string = percentage)
|
||||
width: "50%",
|
||||
minWidth: 40,
|
||||
maxHeight: "80%",
|
||||
|
||||
// Position: anchor-based (9 positions)
|
||||
anchor: "right-center",
|
||||
offsetX: -2,
|
||||
offsetY: 0,
|
||||
|
||||
// Or absolute/percentage positioning
|
||||
row: "25%", // 25% from top
|
||||
col: 10, // column 10
|
||||
|
||||
// Margins
|
||||
margin: 2, // all sides
|
||||
margin: { top: 2, right: 4, bottom: 2, left: 4 }, // per side
|
||||
|
||||
// Responsive: hide on narrow terminals
|
||||
visible: (termWidth, termHeight) => termWidth >= 80,
|
||||
},
|
||||
}
|
||||
);
|
||||
```
|
||||
|
||||
### Anchor Positions
|
||||
|
||||
```
|
||||
top-left top-center top-right
|
||||
┌────────────────────────────┐
|
||||
│ │
|
||||
left-center center right-center
|
||||
│ │
|
||||
└────────────────────────────┘
|
||||
bottom-left bottom-center bottom-right
|
||||
```
|
||||
|
||||
### Programmatic Visibility Control
|
||||
|
||||
```typescript
|
||||
let overlayHandle: OverlayHandle | null = null;
|
||||
|
||||
const result = await ctx.ui.custom<string | null>(
|
||||
(tui, theme, _kb, done) => new MyPanel({ onClose: done }),
|
||||
{
|
||||
overlay: true,
|
||||
overlayOptions: { anchor: "right-center", width: "40%" },
|
||||
onHandle: (handle) => {
|
||||
overlayHandle = handle;
|
||||
// handle.setHidden(true) — temporarily hide
|
||||
// handle.setHidden(false) — show again
|
||||
// handle.hide() — permanently remove
|
||||
},
|
||||
}
|
||||
);
|
||||
```
|
||||
|
||||
### Stacked Overlays
|
||||
|
||||
Multiple overlays can be shown simultaneously. They stack in order (newest on top). Each one's `done()` closes only that overlay:
|
||||
|
||||
```typescript
|
||||
// Show three stacked overlays
|
||||
const p1 = ctx.ui.custom(/* ... */, { overlay: true, overlayOptions: { offsetX: -5, offsetY: -3 } });
|
||||
const p2 = ctx.ui.custom(/* ... */, { overlay: true, overlayOptions: { offsetX: 0, offsetY: 0 } });
|
||||
const p3 = ctx.ui.custom(/* ... */, { overlay: true, overlayOptions: { offsetX: 5, offsetY: 3 } });
|
||||
|
||||
// Last one shown (p3) receives keyboard input
|
||||
// Closing p3 gives focus to p2, closing p2 gives focus to p1
|
||||
```
|
||||
|
||||
### ⚠️ Overlay Lifecycle Rule
|
||||
|
||||
**Overlay components are disposed when closed. Never reuse references.**
|
||||
|
||||
```typescript
|
||||
// ❌ WRONG — stale reference
|
||||
let menu: MenuComponent;
|
||||
await ctx.ui.custom((_, __, ___, done) => {
|
||||
menu = new MenuComponent(done);
|
||||
return menu;
|
||||
}, { overlay: true });
|
||||
menu.doSomething(); // DISPOSED — will fail
|
||||
|
||||
// ✅ CORRECT — re-call the factory
|
||||
const showMenu = () => ctx.ui.custom(
|
||||
(_, __, ___, done) => new MenuComponent(done),
|
||||
{ overlay: true }
|
||||
);
|
||||
await showMenu(); // First show
|
||||
await showMenu(); // Re-show with fresh instance
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,72 +0,0 @@
|
|||
# Custom Editors — Replacing the Input
|
||||
|
||||
Replace the main input editor with a custom implementation. The editor persists until explicitly removed.
|
||||
|
||||
### The Pattern
|
||||
|
||||
```typescript
|
||||
import { CustomEditor, type ExtensionAPI } from "@mariozechner/pi-coding-agent";
|
||||
import { matchesKey, truncateToWidth } from "@mariozechner/pi-tui";
|
||||
|
||||
class VimEditor extends CustomEditor {
|
||||
private mode: "normal" | "insert" = "insert";
|
||||
|
||||
handleInput(data: string): void {
|
||||
// Escape in insert mode → switch to normal
|
||||
if (matchesKey(data, "escape") && this.mode === "insert") {
|
||||
this.mode = "normal";
|
||||
return;
|
||||
}
|
||||
|
||||
// Insert mode: pass everything to CustomEditor for text editing + app keybindings
|
||||
if (this.mode === "insert") {
|
||||
super.handleInput(data);
|
||||
return;
|
||||
}
|
||||
|
||||
// Normal mode: vim keys
|
||||
switch (data) {
|
||||
case "i": this.mode = "insert"; return;
|
||||
case "h": super.handleInput("\x1b[D"); return; // Left arrow
|
||||
case "j": super.handleInput("\x1b[B"); return; // Down arrow
|
||||
case "k": super.handleInput("\x1b[A"); return; // Up arrow
|
||||
case "l": super.handleInput("\x1b[C"); return; // Right arrow
|
||||
}
|
||||
|
||||
// Filter printable chars in normal mode (don't insert them)
|
||||
if (data.length === 1 && data.charCodeAt(0) >= 32) return;
|
||||
|
||||
// Pass unhandled to super (ctrl+c, ctrl+d, etc.)
|
||||
super.handleInput(data);
|
||||
}
|
||||
|
||||
render(width: number): string[] {
|
||||
const lines = super.render(width);
|
||||
// Add mode indicator to last line
|
||||
if (lines.length > 0) {
|
||||
const label = this.mode === "normal" ? " NORMAL " : " INSERT ";
|
||||
const lastLine = lines[lines.length - 1]!;
|
||||
lines[lines.length - 1] = truncateToWidth(lastLine, width - label.length, "") + label;
|
||||
}
|
||||
return lines;
|
||||
}
|
||||
}
|
||||
|
||||
// Register it:
|
||||
export default function (pi: ExtensionAPI) {
|
||||
pi.on("session_start", (_event, ctx) => {
|
||||
ctx.ui.setEditorComponent((_tui, theme, keybindings) =>
|
||||
new VimEditor(theme, keybindings)
|
||||
);
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Critical Rules
|
||||
|
||||
1. **Extend `CustomEditor`**, not `Editor`. `CustomEditor` provides app keybindings (escape to abort, ctrl+d to exit, model switching) that must not be lost.
|
||||
2. **Call `super.handleInput(data)`** for any key you don't handle.
|
||||
3. **Use the factory pattern**: `setEditorComponent` receives a factory `(tui, theme, keybindings) => CustomEditor`.
|
||||
4. **Pass `undefined` to restore default**: `ctx.ui.setEditorComponent(undefined)`.
|
||||
|
||||
---
|
||||
|
|
@ -1,95 +0,0 @@
|
|||
# Tool Rendering — Custom Tool Display
|
||||
|
||||
Tools can control how their calls and results appear in the message area.
|
||||
|
||||
### renderCall — How the Tool Call Looks
|
||||
|
||||
```typescript
|
||||
import { Text } from "@mariozechner/pi-tui";
|
||||
|
||||
pi.registerTool({
|
||||
name: "my_tool",
|
||||
// ...
|
||||
|
||||
renderCall(args, theme) {
|
||||
// args = the tool call arguments
|
||||
let text = theme.fg("toolTitle", theme.bold("my_tool "));
|
||||
text += theme.fg("muted", args.action);
|
||||
if (args.text) text += " " + theme.fg("dim", `"${args.text}"`);
|
||||
return new Text(text, 0, 0); // 0,0 padding — the wrapping Box handles padding
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### renderResult — How the Tool Result Looks
|
||||
|
||||
```typescript
|
||||
import { Text } from "@mariozechner/pi-tui";
|
||||
import { keyHint } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
pi.registerTool({
|
||||
name: "my_tool",
|
||||
// ...
|
||||
|
||||
renderResult(result, { expanded, isPartial }, theme) {
|
||||
// result.content — the content array sent to the LLM
|
||||
// result.details — your custom details object
|
||||
// expanded — whether user toggled expand (Ctrl+O)
|
||||
// isPartial — streaming in progress (onUpdate was called)
|
||||
|
||||
// Handle streaming state
|
||||
if (isPartial) {
|
||||
return new Text(theme.fg("warning", "Processing..."), 0, 0);
|
||||
}
|
||||
|
||||
// Handle errors
|
||||
if (result.details?.error) {
|
||||
return new Text(theme.fg("error", `Error: ${result.details.error}`), 0, 0);
|
||||
}
|
||||
|
||||
// Default view (collapsed)
|
||||
let text = theme.fg("success", "✓ Done");
|
||||
if (!expanded) {
|
||||
text += ` (${keyHint("expandTools", "to expand")})`;
|
||||
}
|
||||
|
||||
// Expanded view — show details
|
||||
if (expanded && result.details?.items) {
|
||||
for (const item of result.details.items) {
|
||||
text += "\n " + theme.fg("dim", item);
|
||||
}
|
||||
}
|
||||
|
||||
return new Text(text, 0, 0);
|
||||
},
|
||||
});
|
||||
```
|
||||
|
||||
### Key Hints for Keybindings
|
||||
|
||||
```typescript
|
||||
import { keyHint, appKeyHint, editorKey, rawKeyHint } from "@mariozechner/pi-coding-agent";
|
||||
|
||||
// Editor action hint (respects user's keybinding config)
|
||||
keyHint("expandTools", "to expand") // e.g., "Ctrl+O to expand"
|
||||
keyHint("selectConfirm", "to select") // e.g., "Enter to select"
|
||||
|
||||
// Raw key hint
|
||||
rawKeyHint("Ctrl+O", "to expand") // Always shows "Ctrl+O to expand"
|
||||
```
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
If `renderCall` or `renderResult` is not defined or throws:
|
||||
- `renderCall` → shows tool name
|
||||
- `renderResult` → shows raw text from `content`
|
||||
|
||||
### Best Practices
|
||||
|
||||
- Return `Text` with padding `(0, 0)` — the wrapping `Box` handles padding
|
||||
- Support `expanded` for detail on demand
|
||||
- Handle `isPartial` for streaming progress
|
||||
- Keep the default (collapsed) view compact
|
||||
- Use `\n` for multi-line content within a single `Text`
|
||||
|
||||
---
|
||||
|
|
@ -1,33 +0,0 @@
|
|||
# Message Rendering — Custom Message Display
|
||||
|
||||
Register a renderer for messages with your `customType`:
|
||||
|
||||
```typescript
|
||||
import { Text } from "@mariozechner/pi-tui";
|
||||
|
||||
pi.registerMessageRenderer("my-extension", (message, options, theme) => {
|
||||
const { expanded } = options;
|
||||
|
||||
let text = theme.fg("accent", `[${message.customType}] `);
|
||||
text += message.content;
|
||||
|
||||
if (expanded && message.details) {
|
||||
text += "\n" + theme.fg("dim", JSON.stringify(message.details, null, 2));
|
||||
}
|
||||
|
||||
return new Text(text, 0, 0);
|
||||
});
|
||||
```
|
||||
|
||||
Send messages that use this renderer:
|
||||
|
||||
```typescript
|
||||
pi.sendMessage({
|
||||
customType: "my-extension", // Must match registerMessageRenderer
|
||||
content: "Status update",
|
||||
display: true, // Show in TUI
|
||||
details: { progress: 50 }, // Available in renderer, NOT sent to LLM
|
||||
});
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,78 +0,0 @@
|
|||
# Performance — Caching and Invalidation
|
||||
|
||||
### The Caching Pattern
|
||||
|
||||
Always cache `render()` output and recompute only when state changes:
|
||||
|
||||
```typescript
|
||||
class CachedComponent {
|
||||
private cachedWidth?: number;
|
||||
private cachedLines?: string[];
|
||||
|
||||
render(width: number): string[] {
|
||||
if (this.cachedLines && this.cachedWidth === width) {
|
||||
return this.cachedLines;
|
||||
}
|
||||
|
||||
// Expensive computation here
|
||||
const lines = this.computeLines(width);
|
||||
|
||||
this.cachedWidth = width;
|
||||
this.cachedLines = lines;
|
||||
return lines;
|
||||
}
|
||||
|
||||
invalidate(): void {
|
||||
this.cachedWidth = undefined;
|
||||
this.cachedLines = undefined;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### The Update Cycle
|
||||
|
||||
```
|
||||
State changes → invalidate() → tui.requestRender() → render(width) called
|
||||
```
|
||||
|
||||
1. Something changes your component's state (user input, timer, async result)
|
||||
2. Call `this.invalidate()` to clear caches
|
||||
3. Call `tui.requestRender()` to schedule a re-render
|
||||
4. The TUI calls `render(width)` on the next frame
|
||||
5. Your component recomputes its output (since cache was cleared)
|
||||
|
||||
### Game Loop Pattern (Real-Time Updates)
|
||||
|
||||
```typescript
|
||||
class GameComponent {
|
||||
private interval: ReturnType<typeof setInterval> | null = null;
|
||||
private version = 0;
|
||||
private cachedVersion = -1;
|
||||
|
||||
constructor(private tui: { requestRender: () => void }) {
|
||||
this.interval = setInterval(() => {
|
||||
this.tick();
|
||||
this.version++;
|
||||
this.tui.requestRender();
|
||||
}, 100); // 10 FPS
|
||||
}
|
||||
|
||||
render(width: number): string[] {
|
||||
if (this.cachedVersion === this.version && /* width unchanged */) {
|
||||
return this.cachedLines;
|
||||
}
|
||||
// ... render ...
|
||||
this.cachedVersion = this.version;
|
||||
return lines;
|
||||
}
|
||||
|
||||
dispose(): void {
|
||||
if (this.interval) {
|
||||
clearInterval(this.interval);
|
||||
this.interval = null;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -1,49 +0,0 @@
|
|||
# Theme Changes and Invalidation
|
||||
|
||||
When the user switches themes, the TUI calls `invalidate()` on all components. If your component pre-bakes theme colors, you must rebuild them.
|
||||
|
||||
### ❌ Wrong — Theme Colors Won't Update
|
||||
|
||||
```typescript
|
||||
class BadComponent extends Container {
|
||||
constructor(message: string, theme: Theme) {
|
||||
super();
|
||||
// Pre-baked theme colors — stuck with old theme forever!
|
||||
this.addChild(new Text(theme.fg("accent", message), 1, 0));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### ✅ Correct — Rebuild on Invalidate
|
||||
|
||||
```typescript
|
||||
class GoodComponent extends Container {
|
||||
private message: string;
|
||||
private theme: Theme;
|
||||
|
||||
constructor(message: string, theme: Theme) {
|
||||
super();
|
||||
this.message = message;
|
||||
this.theme = theme;
|
||||
this.rebuild();
|
||||
}
|
||||
|
||||
private rebuild(): void {
|
||||
this.clear(); // Remove all children
|
||||
this.addChild(new Text(this.theme.fg("accent", this.message), 1, 0));
|
||||
}
|
||||
|
||||
override invalidate(): void {
|
||||
super.invalidate();
|
||||
this.rebuild(); // Rebuild with current theme
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### When You Need This Pattern
|
||||
|
||||
**NEED to rebuild:** Pre-baked `theme.fg()`/`theme.bg()` strings, `highlightCode()` results, complex child trees with embedded colors.
|
||||
|
||||
**DON'T need to rebuild:** Theme callbacks `(text) => theme.fg("accent", text)`, stateless renders that compute fresh each time, simple containers without themed content.
|
||||
|
||||
---
|
||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Reference in a new issue