Self-contained extension at src/resources/extensions/ollama/ that
auto-detects a running Ollama instance, discovers locally pulled models,
and registers them as a first-class provider with zero configuration.
Features:
- Auto-discovery of local models via /api/tags on session_start
- Capability detection (vision, reasoning, context window) for 40+ model families
- /ollama slash command with status, list, pull, remove, ps subcommands
- ollama_manage LLM-callable tool for agent-driven model operations
- Onboarding flow with auto-detect (no API key required)
- Non-blocking async probe — doesn't delay TUI paint
- Respects OLLAMA_HOST env var for non-default endpoints
Core changes (minimal):
- Add "ollama" to KnownProvider in pi-ai types
- Add "ollama" key resolution in env-api-keys.ts
- Add "ollama" default model in model-resolver.ts
- Add "Ollama (Local)" to onboarding wizard with probe flow
- Add extension-manifest.ts and extension-sort.ts to pi-coding-agent
with manifest reading and Kahn's BFS topological sort algorithm
- Add extensionPathsTransform hook to DefaultResourceLoader that runs
between path merging and loadExtensions() — enables pre-load
filtering and reordering without modifying pi internals
- Wire GSD's buildResourceLoader() to provide a transform that:
1. Filters ALL extensions (including community) through the GSD registry
2. Sorts in topological dependency order via sortExtensionPaths()
- Mark discoverAndLoadExtensions() as @deprecated (dead code path)
- Add 16 tests covering manifest reading, dependency sorting, cycles,
missing deps, and non-array deps
Previously, dependencies.extensions in manifests was decorative (sort
existed but was never called), and gsd extensions disable only worked
for bundled extensions. Community extensions in ~/.gsd/agent/extensions/
bypassed the registry entirely.
All other .gsd/ state files use uppercase naming (DECISIONS.md,
REQUIREMENTS.md, PROJECT.md, etc). This renames the canonical
preferences file to PREFERENCES.md while keeping a migration
fallback — the loader checks PREFERENCES.md first, then falls
back to lowercase preferences.md for existing installations.
Closes#2700
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Three work streams bundled into one phase to close the behavioral control
gaps identified in the v2 handler audit:
Stream 1 — State machine guards on all 8 tool handlers:
- Entity existence checks before mutations (milestone, slice, task)
- Valid status transition enforcement (can't double-complete, can't re-plan
closed work, can't complete inside a closed parent)
- depends_on validation for plan-milestone (deps must exist + be complete)
- blockerTaskId verification in replan-slice (must exist + be complete)
- Deep task check in complete-milestone (all tasks, not just slice status)
Stream 2 — Actor identity + persistent audit log:
- WorkflowEvent extended with actor_name, trigger_reason, session_id
- Engine-generated UUID session_id stable per process lifetime
- All 8 handlers accept optional actorName/triggerReason and pass through
- workflow-logger now flushes to .gsd/audit-log.jsonl (survives context resets)
- New setLogBasePath() and readAuditLog() API
Stream 3 — Reversibility + unit ownership:
- New gsd_task_reopen handler (reset task to pending with full guards)
- New gsd_slice_reopen handler (reset slice + all tasks with transaction)
- Opt-in unit ownership via .gsd/unit-claims.json (claim/release/check)
- Ownership enforced in complete-task and complete-slice when claims exist
- insertReplanHistory converted to upsert via schema v11 unique index
Bug fixes (pre-existing):
- renderPlanContent checkbox: checked "done" but tasks are "complete"
- renderRoadmapContent: same "done" vs "complete" mismatch
- renderPlanContent format: **T01:** title didn't match parsePlan regex
- Tests updated to seed DB entities and match projection output format
* feat: add workflow templates — named workflow shapes for different types of work
Introduces `/gsd start <template>` and `/gsd templates` commands with 8 built-in
workflow templates: bugfix, small-feature, spike, hotfix, refactor, security-audit,
dep-upgrade, and full-project. Each template defines purpose-specific phases so
work gets the right level of ceremony instead of forcing everything through the
full milestone pipeline or /gsd quick.
Includes auto-detection from natural language, --dry-run preview, state tracking
for resume support, git branch management, and artifact directory organization.
* fix: guard workflow templates against concurrent auto-mode sessions
Block /gsd start when auto-mode is active to prevent git branch conflicts
and competing message dispatch. When auto-mode is paused, allow templates
to run with an informational notice.
* feat: add workflow resume and in-progress detection
- /gsd start resume — resumes the most recent in-progress workflow
- /gsd start (no args) — shows in-progress workflow if one exists
- STATE.json tracks artifactDir and completedAt for lifecycle management
- Scans .gsd/workflows/*/STATE.json to find unfinished workflows
* chore: remove copyright headers per project conventions
* feat: add comprehensive API key manager (/gsd keys)
Add /gsd keys command with 6 subcommands for full API key lifecycle
management: list, add, remove, test, rotate, and doctor.
- list/status: Dashboard grouped by category (LLM, search, tool, remote)
with masked key previews, OAuth expiry, env var source detection
- add: Interactive provider picker with OAuth vs API key choice,
prefix validation, and env var activation
- remove: Multi-key support with individual or bulk removal
- test: Lightweight API validation per provider with latency reporting
and error classification (401/429/5xx/timeout)
- rotate: Remove-and-replace flow with optional pre-save validation
- doctor: Health checks for expired OAuth, empty keys, duplicates,
env var conflicts, file permissions, missing LLM provider
Includes unified provider registry (22 providers), tab completions,
and redirect from /gsd setup keys. 44 unit tests.
* fix: convert key-manager tests from vitest to node:test for CI typecheck
Extension tests use node:test + node:assert/strict (not vitest) since
tsconfig.extensions.json includes test files and vitest types are not
available in the CI typecheck step.
* feat(gsd): add directory safeguards to prevent running in system/home paths
GSD previously had no protection against being launched from dangerous
directories like $HOME, /, /usr, or /etc. This adds layered validation:
- Blocked system paths (hard stop): /, /usr, /etc, /var, $HOME, tmpdir, etc.
- High entry count heuristic (>200 entries triggers confirmation dialog)
- Symlink resolution via realpathSync to prevent bypass
- Integrated at three chokepoints: projectRoot(), showSmartEntry(), bootstrapGsdDirectory()
Includes 19 tests covering all blocked categories, boundary conditions, and
the assertSafeDirectory throw/return behavior.
* fix: make directory safeguard tests cross-platform (Windows CI)
- Skip Unix-specific blocked path tests on Windows (/, /usr, /etc, etc.)
- Add Windows-specific blocked path tests (C:\, C:\Windows)
- Use platform-appropriate path separator in trailing slash test
- Fix root path normalization for Windows drive letters (C:\ not C:)
Introduces six new modules that work together to reduce token usage across
the dispatch pipeline while preserving semantic content quality:
- Provider-aware token counting with per-provider char/token ratios
- Prompt cache optimizer for maximizing Anthropic/OpenAI cache hit rates
- Structured data formatter (compact notation for decisions/requirements/tasks)
- Deterministic prompt compressor (light/moderate/aggressive levels)
- Semantic chunker with TF-IDF relevance scoring for context selection
- Summary distiller for condensed dependency summaries
Integration points:
- inlineDependencySummaries uses distillation before truncation (3+ deps)
- inlineDecisionsFromDb/inlineRequirementsFromDb use compact format at non-full levels
- buildExecuteTaskPrompt compresses carry-forward when it exceeds 40% of budget
- context-budget.reduceToFit combines compression with section-boundary truncation
- computeBudgets accepts optional provider for accurate char/token ratios
All existing 1475 unit tests + 30 integration tests pass with zero regressions.
157 new tests cover all optimization modules.
- Add argument completions for /thinking command with all 6 levels
(off, minimal, low, medium, high, xhigh) and descriptions
- Add descriptions to all GSD 2nd-level subcommand completions across
14 subcommand groups (auto, mode, parallel, setup, prefs, remote,
next, history, undo, export, cleanup, knowledge, doctor, dispatch)
- Add 35 new tests for autocomplete and fuzzy matching systems
* refactor: TUI dashboard cleanup, dedup, and feature improvements
- Extract shared format-utils.ts: formatDuration, padRight, joinColumns,
centerLine, fitColumns, sparkline, stripAnsi — eliminating 3× duplication
across dashboard-overlay, visualizer-views, and auto-dashboard
- Use shared STATUS_GLYPH/STATUS_COLOR from ui.ts consistently across all
overlay and view files instead of hardcoded Unicode glyphs
- Fix redundant dynamic import('node:fs') in visualizer-data.ts (statSync
already imported at top level)
- Replace (entry as any) casts with proper SessionMessageEntry type narrowing
- Add mtime-based file content cache for visualizer data loader to avoid
re-parsing unchanged roadmap/plan files on every refresh
- Increase visualizer refresh interval from 2s to 5s (with mtime cache,
unchanged files are effectively free)
- Fix sparkline to use loop-based max instead of Math.max(...values) to
avoid stack overflow on large arrays
- Add ETA/time-remaining estimate to progress widget and dashboard overlay
based on average unit duration from metrics ledger
- Show warning glyph for budget-pressured units in completed units list
(continueHereFired units now show ⚠ instead of ✓)
- Add terminal resize (SIGWINCH) handling to both overlays — invalidates
cache and re-renders on window size change
- Fix dispose race in dashboard overlay close path — now calls dispose()
before onClose() to prevent timer callbacks firing after teardown
- Add 23 unit tests for format-utils.ts (including 100k-element sparkline)
- Add 2 tests for estimateTimeRemaining
- Add source-contract tests for resize handler and shared imports
* fix: use STATUS_GLYPH.warning instead of STATUS_GLYPH.statusWarning
STATUS_GLYPH is keyed by ProgressStatus ("warning"), not by GLYPH
property name ("statusWarning"). Fixes typecheck failure in CI.
* fix: reduce CPU usage on long auto-mode sessions
Seven targeted fixes for compounding process/timer/I/O issues that cause
high CPU during multi-hour /gsd auto sessions:
1. Wrap idle watchdog and hard timeout async callbacks in try-catch to
prevent unhandled rejections from orphaning intervals
2. Cache nativeHasChanges fallback (10s TTL) to avoid spawning a new
git process every 15 seconds when native module is unavailable
3. Call clearUnitTimeout() before dispatchNextUnit() in all recovery
paths to prevent stale idle watchdog from firing alongside new timers
4. Add 10-second timeout to subagent worktree cleanup to prevent hangs
when git worktree remove blocks indefinitely
5. Prune dead bg-shell processes after each unit completion to free
retained output buffers (~500KB-1MB per dead process)
6. Throttle STATE.md rebuilds to at most once per 30 seconds (was every
unit completion at 100-400ms each)
7. Increase progress widget refresh interval from 5s to 15s to reduce
synchronous file I/O on the hot path
* fix: reset nativeHasChanges cache in worktree test
The 10s TTL cache on nativeHasChanges was causing the worktree test
to return stale "no changes" when checking a freshly dirtied repo
within the cache window. Reset the cache before the dirty-repo
assertion so the test correctly detects new changes.
* feat: dynamic model routing for token consumption optimization (#575)
Add complexity-based model routing that classifies units into light/standard/heavy
tiers and routes to cheaper models when appropriate. Reduces token consumption
by 20-50% for users on capped plans.
- Complexity classifier with heuristic-based tier assignment (no LLM call)
- Model router with downgrade-only semantics (user's config is ceiling)
- Budget-pressure-aware routing (more aggressive as budget fills)
- Cross-provider cost comparison via bundled cost table
- Hook classification support
- Escalation on failure (light → standard → heavy)
- Full preference validation and merge support
- Metrics tracking with tier and downgrade fields
- 40 new tests (classifier, router, cost table)
Closes#575
* feat: phases 2-4 — dashboard, adaptive learning, task introspection
Phase 2 — Observability & Dashboard:
- Tier badge [L]/[S]/[H] displayed in progress widget next to phase label
- Dynamic routing savings summary shown in footer when units have been downgraded
- Tier and modelDowngraded fields passed through snapshotUnitMetrics
Phase 3 — Adaptive Learning:
- New routing-history.ts: tracks success/failure per tier per unit-type pattern
- Rolling window of 50 entries per pattern to prevent stale data
- User feedback support (over/under/ok) with 2x weight vs automatic
- Failure rate >20% auto-bumps tier for that pattern
- Tag-specific patterns (e.g. execute-task:docs) for granular learning
- History persists to .gsd/routing-history.json
- Classifier consults adaptive history before finalizing tier
Phase 4 — Task Plan Introspection:
- Code block counting in task plans (5+ blocks → heavy)
- Complexity keyword detection: migration, architecture, security,
performance, concurrency, compatibility
- Multiple complexity keywords (2+) → heavy, single → standard
- New codeBlockCount and complexityKeywords fields in TaskMetadata
Tests: 16 new tests (routing history + introspection), 419 total passing
* docs: add startup performance analysis and optimization plan
Profiled GSD CLI startup finding 2.2s for --version and ~3.8s for
interactive mode. Identified 5 root causes with measured timings and
created a phased optimization plan targeting <0.2s for --version
and ~0.8s for interactive startup.
* perf: speed up GSD startup with lazy loading and fast paths
- Fast-path --version/-v and --help/-h in loader.ts before importing
any heavy dependencies (2.2s → 0.15s, 14x faster)
- Lazy-load undici (~200ms) only when HTTP_PROXY env vars are set
- Skip initResources cpSync when managed-resources.json version
matches current GSD version (~128ms saved per launch)
- Lazy-load Mistral SDK (~369ms) on first API call instead of startup
- Lazy-load Google GenAI SDK (~186ms) on first API call instead of
startup
- Parallelize extension loading with Promise.all() instead of
sequential for-loop
---------
Co-authored-by: TÂCHES <afromanguy@me.com>
When all credentials for a provider are exhausted, the system now
automatically falls back to the next available provider in a
user-configured fallback chain. Higher-priority providers are
restored automatically when their backoff expires.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>