21 KiB
SF Agent Mode System
Status: Draft specification. Promoted from
copilot-thoughts.mdresearch notes. Scope: TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target. Decision authority: Product + architecture review required before implementation.
1. Problem Statement
SF's old command surface treated mode switching as separate commands rather than persistent states. There was no visible indicator of the current mode, and the /sf prefix positioned SF as a plugin rather than the system itself.
Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.
Goal: Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.
2. Orthogonal State Axes
SF state is five independent axes, not one overloaded "mode."
workMode: chat | plan | build | review | repair | research
runControl: manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode: fast | smart | deep
surface: tui | web | headless | rpc
2.1 Axis Definitions
| Axis | Question It Answers | Values |
|---|---|---|
workMode |
What kind of work is SF doing? | chat, plan, build, review, repair, research |
runControl |
Who advances the loop? | manual (user), assisted (one unit then pause), autonomous (continuous) |
permissionProfile |
What may proceed without approval? | restricted, normal, trusted, unrestricted |
modelMode |
Speed/cost/reasoning posture? | fast (cheap), smart (balanced), deep (reasoning) |
surface |
How is the user connected? | tui, web, headless, rpc |
2.2 Example Combinations
plan | manual | normal | deep → user plans with reasoning model
build | autonomous | trusted | smart → continuous implementation
repair | assisted | normal | smart → one repair unit at a time
research | autonomous | restricted | deep → continuous research, read-only
review | manual | restricted | deep → user reviews with reasoning model
2.3 Rules
permissionProfilenever impliesrunControl. Autonomous run withrestrictedpermissions is valid.runControlnever impliespermissionProfile. Manual run withunrestrictedpermissions is valid.- Denylists and safety gates override
permissionProfileregardless of value. - Every risk decision logs all five axis values.
sandboxProfilemay become a sixth axis later. It is separate frompermissionProfile: sandboxing controls process/filesystem/network containment, while permission profile controls what SF may approve.
3. Work Modes
3.1 chat
Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.
3.2 plan
Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.
Plan → Build handoff:
plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart
Surfaces:
- TUI: plan acceptance prompt includes "run autonomously" button
- Web: plan acceptance button includes "run autonomously"
- Headless:
--autonomouschains into direct/autonomous - RPC: machine event records transition explicitly
3.3 build
Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.
3.4 review
Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (deep).
3.5 repair
Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies, failed gates, generated/runtime drift, and broken state.
Doctor is the diagnostic engine, not the mode. /doctor inspects. /repair switches work mode.
repair is a workMode, not a separate subsystem.
Commands:
/doctor → inspect and report
/doctor fix → deterministic auto-fix
/doctor heal → LLM-assisted deep healing
/repair → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit
Auto-transition to repair:
build | autonomous | trusted | smart
→ repair | autonomous | normal | smart
Allowed when:
- pre-dispatch health gate fails
- installed runtime drift detected
- SF cannot dispatch safely
- repo workflow state corrupted
Policy: configurable per project. Options: auto, ask, log-only.
3.6 research
Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.
4. Run Control
| Value | Behavior |
|---|---|
manual |
User drives every step. Tool calls require approval. |
assisted |
SF executes one unit, then pauses for user review. |
autonomous |
SF continues until done, blocked, interrupted, budget-hit, or limit-hit. |
4.1 Commands
/control manual
/control assisted
/control autonomous
/autonomous → alias for /control autonomous
/next → alias for /control assisted (one unit)
/pause → pause autonomous, preserve state
/stop → stop autonomous, clear state
4.2 Transition Scopes
| Scope | Behavior |
|---|---|
now |
Apply immediately if no tool active. Abort current tool if policy allows. |
after-current-tool |
Finish active tool, then switch. |
after-current-unit |
Finish current SF unit, then switch. |
next-milestone |
Switch after current milestone completes. |
Autonomous changes affect future decisions, never mutate active tool calls mid-execution.
4.3 Transition Logging
Every transition persists:
{
"timestamp": "2026-05-08T10:00:00Z",
"from": {"workMode": "build", "runControl": "autonomous"},
"to": {"workMode": "repair", "runControl": "autonomous"},
"reason": "pre-dispatch health gate failed",
"scope": "after-current-unit",
"sessionId": "..."
}
5. Permission Profiles
| Profile | Description |
|---|---|
restricted |
Read-only and explicitly allowlisted actions. |
normal |
Safe edits, non-destructive local commands. |
trusted |
Build/test/install/local commits and bounded repo automation. |
unrestricted |
High-risk orchestration only in intentionally trusted environments. |
5.1 Enforcement
Permission profile is enforced at three layers:
- Tool registry: Each tool declares required profile. Tools below profile are hidden from model.
- Execution gate: Each tool call checks profile at invocation. Violation = error.
- Safety harness: Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.
5.2 Commands
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
6. Model Modes
| Mode | Use Case | Routing Hint |
|---|---|---|
fast |
Small bounded tasks | Cheapest available model |
smart |
Default balanced work | Default routing table |
deep |
Planning, debugging, research, review | Reasoning model (o1, Claude Opus, etc.) |
modelMode guides routing. It does not replace explicit /model selection.
7. Mode Switching UX
7.1 Direct Commands
/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
/model-mode fast
/model-mode smart
/model-mode deep
7.2 Combined Forms
/mode repair --autonomous --permission-profile normal
/mode build --autonomous --permission-profile trusted
/mode research --autonomous --permission-profile restricted --model-mode deep
7.3 Autonomous Steering
/steer mode repair
/steer mode review after-current-unit
/steer permission-profile restricted now
/steer model-mode deep for-next-unit
7.4 Keyboard Shortcuts
| Shortcut | Action |
|---|---|
Ctrl+Shift+M |
Cycle workMode: chat → plan → build → review → repair → research → chat |
Ctrl+Shift+A |
Set runControl to autonomous |
Ctrl+Shift+S |
Set runControl to assisted (step) |
Ctrl+Shift+I |
Set runControl to manual (interactive) |
Ctrl+Shift+R |
Set workMode to repair |
Ctrl+Shift+P |
Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted |
8. Status and Mode Badge
8.1 Full Status Line
SF build | autonomous | trusted | smart
8.2 Compact Badge Form
For narrow terminals (< 80 cols):
[B][A][T][S]
8.3 Critical State Labels
When workMode is repair or review, show full labels regardless of width:
repair | autonomous | normal | smart
review | assisted | normal | deep
8.4 Badge Placement
| Surface | Placement |
|---|---|
| TUI header | Left side, after "SF" logo |
| TUI status bar | Bottom line when header hidden |
| tmux/terminal title | `SF[build |
| Web | Top bar, color-coded chip |
8.5 Badge During Auto Mode
Current code hides header/footer during auto mode (if (isAutoActive()) return []). This must change:
- Show minimal header during auto: badge + project name only
- Or show badge in dedicated status bar separate from header/footer
- Badge color pulses slowly during autonomous execution (subtle animation)
8.6 Badge Colors
| Axis | Value | Color |
|---|---|---|
| workMode | chat |
dim |
| workMode | plan |
accent |
| workMode | build |
success |
| workMode | review |
warning |
| workMode | repair |
error |
| workMode | research |
info |
| runControl | manual |
dim |
| runControl | assisted |
warning |
| runControl | autonomous |
success (pulsing) |
| permissionProfile | restricted |
success |
| permissionProfile | normal |
dim |
| permissionProfile | trusted |
warning |
| permissionProfile | unrestricted |
error |
9. Background Work Surface (/tasks)
Unified view of all background work. Replaces scattered /status, /queue, /parallel status for work inspection.
9.1 What /tasks Shows
- autonomous task lifecycle rows
- parallel workers
- scheduled autonomous dispatches and queued scheduler rows
- background shell sessions
- stuck or resumable sessions
- remote questions waiting for answers
- current cost/budget state
- last checkpoint and next action
9.2 Data Model
SQLite tables:
-- Durable task state
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
work_mode TEXT NOT NULL,
run_control TEXT NOT NULL,
permission_profile TEXT NOT NULL,
model_mode TEXT NOT NULL,
status TEXT NOT NULL, -- todo | running | verifying | reviewing | done | blocked | paused | failed | cancelled | retrying
dependency_blockers TEXT, -- JSON array of task IDs
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3,
checkpoint_ref TEXT, -- git ref or patch file
cost_budget REAL,
cost_spent REAL DEFAULT 0,
created_at TEXT, -- Temporal.Instant ISO
started_at TEXT,
completed_at TEXT,
next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);
-- Scheduler state is separate from task lifecycle state
CREATE TABLE task_scheduler (
task_id TEXT PRIMARY KEY REFERENCES tasks(id),
status TEXT NOT NULL, -- queued | due | claimed | dispatched | consumed | expired
due_at TEXT,
claimed_by TEXT,
dispatched_at TEXT,
consumed_at TEXT,
expires_at TEXT
);
-- Ephemeral running state
CREATE TABLE task_runtime (
task_id TEXT PRIMARY KEY REFERENCES tasks(id),
process_pid INTEGER,
worktree_path TEXT,
current_model TEXT,
context_usage_percent REAL,
last_heartbeat_at TEXT, -- Temporal.Instant
FOREIGN KEY (task_id) REFERENCES tasks(id)
);
-- Transition log
CREATE TABLE task_transitions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL,
from_status TEXT NOT NULL,
to_status TEXT NOT NULL,
reason TEXT,
scope TEXT,
timestamp TEXT NOT NULL -- Temporal.Instant
);
Parallel workers must stay worktree-isolated and report heartbeat/status into
.sf state. Scheduler rows may use queued; task lifecycle rows use todo.
9.3 Complementary Commands
/tasks does not replace:
/status→ project health dashboard/queue→ milestone/slice dispatch order/parallel status→ parallel orchestrator detail/session-report→ cost/token summary/logs→ activity logs/forensics→ execution forensics
10. Skills System
10.1 Directory Structure
.agents/skills/<skill-name>/
SKILL.md -- skill definition with YAML frontmatter
scripts/ -- supporting scripts
schemas/ -- JSON schemas for inputs/outputs
checklists/ -- verification checklists
mcp.json -- MCP server config if applicable
10.2 Skill Frontmatter
---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---
Fields:
name: unique identifierdescription: when to use this skilluser-invocable: can user explicitly invoke?model-invocable: can model auto-invoke when relevant?side-effects:none,code-edits,production-mutation, etc.permission-profile: minimum profile required
10.3 Skill Categories
| Type | Example | model-invocable |
|---|---|---|
| Background knowledge | forge-autonomous-runtime |
true |
| User tool | production-deploy |
false |
| Shared capability | forge-command-surface |
true |
Dangerous skills (production-mutation) are never model-invoked by default.
10.4 Auto-Creation Flow
- Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
- Propose skill in manual/restricted contexts
- Generate/update automatically only when policy allows
- Record repeated source evidence in
.sfstate - Keep narrow, linted, and evaled like code
- Commit with repo when accepted
10.5 Skill Eval Cases
Every auto-created skill needs eval cases:
.agents/skills/<skill-name>/evals/
case-1/
task.md -- user-like prompt
grader.js -- deterministic checker
hidden/ -- reference answers (not visible to agent)
work/ -- agent workspace
Graders inspect: files, artifacts, answer.json, trace.jsonl, result state.
Failed trials preserve workspace for debugging.
11. Command Surfaces
11.1 Human Slash Commands
SF registers direct command roots only:
/status
/autonomous
/doctor
/rate
/session-report
/parallel
/remote
/tasks
/sf is not a command root. TUI and browser command parity tests reject it so
compatibility shims do not grow back.
/remote is a full-session steering surface. Remote answers may change
workMode, runControl, permissionProfile, and modelMode; they are not
limited to question delivery.
11.2 Shell Surface
Machine surface remains prefixed:
sf headless autonomous
sf headless --autonomous ...
The shell prefix is the executable name, not an interactive slash-command namespace.
12. Runtime Target: Node 26
12.1 Policy
current compatibility floor: Node 26.1+
internal target runtime: Node 26.1
canonical baseline: Node 26.1
Node 25: skip except quick probes
12.2 Why Node 26
Temporalenabled by default- V8 14.6 baseline
- Undici 8 HTTP/fetch baseline
- Removes legacy APIs, hardens against old assumptions
12.3 Temporal Adoption
Store semantic type, not just formatted string:
| Concept | Temporal Type | Use Case |
|---|---|---|
| Exact instant | Temporal.Instant |
Journal events, checkpoints, lock leases |
| Local time | Temporal.ZonedDateTime |
Reminders, schedules, audits |
| Calendar date | Temporal.PlainDate |
Daily reports, milestone reviews |
| Wall-clock time | Temporal.PlainTime |
Recurring policies |
| Time amount | Temporal.Duration |
Budgets, leases, cooldowns, retry delays |
12.4 Adoption Priority
sf schedule— highest user-visible impact- Lock/lease — highest operational correctness
- Journals/traces — highest debugging impact
- Session reports — nice to have
- Background tasks — future work
12.5 Gate
node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"
13. Implementation Pull-Through
13.1 Already Directionally Right
- UOK lifecycle records carry
runControl - UOK lifecycle records carry
permissionProfile - Schedule command state uses
autonomous_dispatch - DB-backed state, recovery, verification, scheduling, captures, forensics
- Skills and project-specific skill paths exist
- Parallel orchestration and remote-question infrastructure
13.2 Still Needed
| Priority | Item | Effort |
|---|---|---|
| P2 | Decide whether sandboxProfile becomes a sixth persisted axis |
Medium |
| P2 | Remove /sf from docs/web/tests (Phase 2 deprecation) |
Small |
13.4 Recently Completed (This Session)
| Priority | Item | Status |
|---|---|---|
| P1 | Centralized metrics system (metrics-central.js) |
✓ |
| P1 | Cost command (/cost) with DB + ledger queries |
✓ |
| P1 | Explicit stage commands (/research, /plan, /implement) |
✓ |
| P2 | Reasoning assist foundation (reasoning-assist.js) |
✓ |
| P2 | Self-feedback → workMode auto-transition | ✓ |
| P2 | UOK events carry workMode + modelMode | ✓ |
| P2 | /sf prefix deprecation warning (Phase 1) |
✓ |
13.3 Completed
| Priority | Item | Status |
|---|---|---|
| P0 | Make mode state durable in SQLite | ✓ |
| P0 | Add direct /mode, /control, /permission-profile, /model-mode commands |
✓ |
| P0 | Add visible mode badge to TUI header/status bar | ✓ |
| P1 | Make --autonomous chain into direct /autonomous |
✓ |
| P1 | Expose autonomous continuation limits in settings and status | ✓ |
| P1 | Add /tasks backed by DB execution graph state |
✓ |
| P1 | Make repair first-class workflow over doctor |
✓ |
| P1 | Enhanced /steer with mode/permission-profile/model-mode transitions |
✓ |
| P1 | TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P) | ✓ |
| P1 | Minimal auto-mode header/footer (badge visible during autonomy) | ✓ |
| P1 | Remove /sf namespace registration and parity-test against fallback |
✓ |
| P1 | Parallel worker intent/claim registry backed by UOK SQLite coordination | ✓ |
| P1 | Skill eval harness foundation | ✓ |
| P1 | Terminal title mode indicator | ✓ |
| P2 | Policy-aware project skill suggestion/generation with DB cooldown | ✓ |
| P2 | Schema-backed task frontmatter (risk, mutation, verification, approval) | ✓ |
| P2 | Subagent provider/model/permission inheritance audit and guard | ✓ |
| P2 | Remote steering as full-session surface from remote answers | ✓ |
14. Open Questions
- Should
planmode show badge[P]orplantext in full? - Should paused autonomous show previous badge dimmed, or
[P]for paused? - Should mode be per-session or per-project? (Current: per-session)
- Should badge appear in tmux/terminal window titles?
- Should mode transitions have sound/notification?
- Should
repairauto-transition beaskby default for new projects? - Should skill eval cases run in CI or only on-demand?
- Should
/tasksbe a TUI overlay or a separate scrollable panel? - Should reasoning assist call a fast model automatically, or only prepare prompts for now?
15. References
- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot
- Factory Droid, "Autonomy Level" — https://docs.factory.ai/cli/user-guides/auto-run
- Amp manual — https://ampcode.com/manual
- Smelt (mode cycling) — https://github.com/leonardcser/smelt
- ORCH (task state machine) — https://github.com/oxgeneral/ORCH
- AgentPlane (schema-first tasks) — https://github.com/basilisk-labs/agentplane
- Relay (channels and tickets) — https://github.com/jcast90/relay
- Sage (runtime-neutral orchestration) — https://github.com/youwangd/SageCLI
- Wit (symbol-level locks) — https://github.com/amaar-mc/wit