19 KiB
SF Agent Mode System
Status: Draft specification. Promoted from
copilot-thoughts.mdresearch notes. Scope: TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target. Decision authority: Product + architecture review required before implementation.
1. Problem Statement
SF's current command surface (/sf autonomous, /sf next, /sf pause, /sf stop) treats mode switching as separate commands rather than persistent states. There is no visible indicator of the current mode, and the /sf prefix positions SF as a plugin rather than the system itself.
Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.
Goal: Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.
2. Orthogonal State Axes
SF state is five independent axes, not one overloaded "mode."
workMode: chat | plan | build | review | repair | research
runControl: manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode: fast | smart | deep
surface: tui | web | headless | rpc
2.1 Axis Definitions
| Axis | Question It Answers | Values |
|---|---|---|
workMode |
What kind of work is SF doing? | chat, plan, build, review, repair, research |
runControl |
Who advances the loop? | manual (user), assisted (one unit then pause), autonomous (continuous) |
permissionProfile |
What may proceed without approval? | restricted, normal, trusted, unrestricted |
modelMode |
Speed/cost/reasoning posture? | fast (cheap), smart (balanced), deep (reasoning) |
surface |
How is the user connected? | tui, web, headless, rpc |
2.2 Example Combinations
plan | manual | normal | deep → user plans with reasoning model
build | autonomous | trusted | smart → continuous implementation
repair | assisted | normal | smart → one repair unit at a time
research | autonomous | restricted | deep → continuous research, read-only
review | manual | restricted | deep → user reviews with reasoning model
2.3 Rules
permissionProfilenever impliesrunControl. Autonomous run withrestrictedpermissions is valid.runControlnever impliespermissionProfile. Manual run withunrestrictedpermissions is valid.- Denylists and safety gates override
permissionProfileregardless of value. - Every risk decision logs all five axis values.
3. Work Modes
3.1 chat
Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.
3.2 plan
Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.
Plan → Build handoff:
plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart
Surfaces:
- TUI: plan acceptance prompt includes "run autonomously" button
- Web: plan acceptance button includes "run autonomously"
- Headless:
--autonomouschains into direct/autonomous - RPC: machine event records transition explicitly
3.3 build
Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.
3.4 review
Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (deep).
3.5 repair
Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies.
Doctor is the diagnostic engine, not the mode. /doctor inspects. /repair switches work mode.
Commands:
/doctor → inspect and report
/doctor fix → deterministic auto-fix
/doctor heal → LLM-assisted deep healing
/repair → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit
Auto-transition to repair:
build | autonomous | trusted | smart
→ repair | autonomous | normal | smart
Allowed when:
- pre-dispatch health gate fails
- installed runtime drift detected
- SF cannot dispatch safely
- repo workflow state corrupted
Policy: configurable per project. Options: auto, ask, log-only.
3.6 research
Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.
4. Run Control
| Value | Behavior |
|---|---|
manual |
User drives every step. Tool calls require approval. |
assisted |
SF executes one unit, then pauses for user review. |
autonomous |
SF continues until done, blocked, interrupted, budget-hit, or limit-hit. |
4.1 Commands
/control manual
/control assisted
/control autonomous
/autonomous → alias for /control autonomous
/next → alias for /control assisted (one unit)
/pause → pause autonomous, preserve state
/stop → stop autonomous, clear state
4.2 Transition Scopes
| Scope | Behavior |
|---|---|
now |
Apply immediately if no tool active. Abort current tool if policy allows. |
after-current-tool |
Finish active tool, then switch. |
after-current-unit |
Finish current SF unit, then switch. |
next-milestone |
Switch after current milestone completes. |
Autonomous changes affect future decisions, never mutate active tool calls mid-execution.
4.3 Transition Logging
Every transition persists:
{
"timestamp": "2026-05-08T10:00:00Z",
"from": {"workMode": "build", "runControl": "autonomous"},
"to": {"workMode": "repair", "runControl": "autonomous"},
"reason": "pre-dispatch health gate failed",
"scope": "after-current-unit",
"sessionId": "..."
}
5. Permission Profiles
| Profile | Description |
|---|---|
restricted |
Read-only and explicitly allowlisted actions. |
normal |
Safe edits, non-destructive local commands. |
trusted |
Build/test/install/local commits and bounded repo automation. |
unrestricted |
High-risk orchestration only in intentionally trusted environments. |
5.1 Enforcement
Permission profile is enforced at three layers:
- Tool registry: Each tool declares required profile. Tools below profile are hidden from model.
- Execution gate: Each tool call checks profile at invocation. Violation = error.
- Safety harness: Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.
5.2 Commands
/trust restricted
/trust normal
/trust trusted
/trust unrestricted
6. Model Modes
| Mode | Use Case | Routing Hint |
|---|---|---|
fast |
Small bounded tasks | Cheapest available model |
smart |
Default balanced work | Default routing table |
deep |
Planning, debugging, research, review | Reasoning model (o1, Claude Opus, etc.) |
modelMode guides routing. It does not replace explicit /model selection.
7. Mode Switching UX
7.1 Direct Commands
/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/trust restricted
/trust normal
/trust trusted
/trust unrestricted
/model-mode fast
/model-mode smart
/model-mode deep
7.2 Combined Forms
/mode repair --autonomous --trust normal
/mode build --autonomous --trust trusted
/mode research --autonomous --trust restricted --model-mode deep
7.3 Autonomous Steering
/steer mode repair
/steer mode review after-current-unit
/steer trust restricted now
/steer model-mode deep for-next-unit
7.4 Keyboard Shortcuts
| Shortcut | Action |
|---|---|
Ctrl+Shift+M |
Cycle workMode: chat → plan → build → review → repair → research → chat |
Ctrl+Shift+A |
Set runControl to autonomous |
Ctrl+Shift+S |
Set runControl to assisted (step) |
Ctrl+Shift+I |
Set runControl to manual (interactive) |
Ctrl+Shift+R |
Set workMode to repair |
Ctrl+Shift+P |
Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted |
8. Status and Mode Badge
8.1 Full Status Line
SF build | autonomous | trusted | smart
8.2 Compact Badge Form
For narrow terminals (< 80 cols):
[B][A][T][S]
8.3 Critical State Labels
When workMode is repair or review, show full labels regardless of width:
repair | autonomous | normal | smart
review | assisted | normal | deep
8.4 Badge Placement
| Surface | Placement |
|---|---|
| TUI header | Left side, after "SF" logo |
| TUI status bar | Bottom line when header hidden |
| tmux/terminal title | `SF[build |
| Web | Top bar, color-coded chip |
8.5 Badge During Auto Mode
Current code hides header/footer during auto mode (if (isAutoActive()) return []). This must change:
- Show minimal header during auto: badge + project name only
- Or show badge in dedicated status bar separate from header/footer
- Badge color pulses slowly during autonomous execution (subtle animation)
8.6 Badge Colors
| Axis | Value | Color |
|---|---|---|
| workMode | chat |
dim |
| workMode | plan |
accent |
| workMode | build |
success |
| workMode | review |
warning |
| workMode | repair |
error |
| workMode | research |
info |
| runControl | manual |
dim |
| runControl | assisted |
warning |
| runControl | autonomous |
success (pulsing) |
| permissionProfile | restricted |
success |
| permissionProfile | normal |
dim |
| permissionProfile | trusted |
warning |
| permissionProfile | unrestricted |
error |
9. Background Work Surface (/tasks)
Unified view of all background work. Replaces scattered /status, /queue, /parallel status for work inspection.
9.1 What /tasks Shows
- autonomous units (current + queued)
- parallel workers
- scheduled autonomous dispatches
- background shell sessions
- stuck or resumable sessions
- remote questions waiting for answers
- current cost/budget state
- last checkpoint and next action
9.2 Data Model
SQLite tables:
-- Durable task state
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
work_mode TEXT NOT NULL,
run_control TEXT NOT NULL,
permission_profile TEXT NOT NULL,
model_mode TEXT NOT NULL,
status TEXT NOT NULL, -- pending | running | review | done | retrying | failed | cancelled
dependency_blockers TEXT, -- JSON array of task IDs
retry_count INTEGER DEFAULT 0,
max_retries INTEGER DEFAULT 3,
checkpoint_ref TEXT, -- git ref or patch file
cost_budget REAL,
cost_spent REAL DEFAULT 0,
created_at TEXT, -- Temporal.Instant ISO
started_at TEXT,
completed_at TEXT,
next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);
-- Ephemeral running state
CREATE TABLE task_runtime (
task_id TEXT PRIMARY KEY REFERENCES tasks(id),
process_pid INTEGER,
worktree_path TEXT,
current_model TEXT,
context_usage_percent REAL,
last_heartbeat_at TEXT, -- Temporal.Instant
FOREIGN KEY (task_id) REFERENCES tasks(id)
);
-- Transition log
CREATE TABLE task_transitions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL,
from_status TEXT NOT NULL,
to_status TEXT NOT NULL,
reason TEXT,
scope TEXT,
timestamp TEXT NOT NULL -- Temporal.Instant
);
9.3 Complementary Commands
/tasks does not replace:
/status→ project health dashboard/queue→ milestone/slice dispatch order/parallel status→ parallel orchestrator detail/session-report→ cost/token summary/logs→ activity logs/forensics→ execution forensics
10. Skills System
10.1 Directory Structure
.agents/skills/<skill-name>/
SKILL.md -- skill definition with YAML frontmatter
scripts/ -- supporting scripts
schemas/ -- JSON schemas for inputs/outputs
checklists/ -- verification checklists
mcp.json -- MCP server config if applicable
10.2 Skill Frontmatter
---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---
Fields:
name: unique identifierdescription: when to use this skilluser-invocable: can user explicitly invoke?model-invocable: can model auto-invoke when relevant?side-effects:none,code-edits,production-mutation, etc.permission-profile: minimum profile required
10.3 Skill Categories
| Type | Example | model-invocable |
|---|---|---|
| Background knowledge | forge-autonomous-runtime |
true |
| User tool | production-deploy |
false |
| Shared capability | forge-command-surface |
true |
Dangerous skills (production-mutation) are never model-invoked by default.
10.4 Auto-Creation Flow
- Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
- Propose skill in manual/restricted contexts
- Generate/update automatically only when policy allows
- Record source evidence in
.sfstate - Keep narrow and testable
- Commit with repo when accepted
10.5 Skill Eval Cases
Every auto-created skill needs eval cases:
.agents/skills/<skill-name>/evals/
case-1/
task.md -- user-like prompt
grader.js -- deterministic checker
hidden/ -- reference answers (not visible to agent)
work/ -- agent workspace
Graders inspect: files, artifacts, answer.json, trace.jsonl, result state.
Failed trials preserve workspace for debugging.
11. Migration from /sf Commands
11.1 Command Mapping
| Old | New | Status |
|---|---|---|
/sf |
/next |
migrate |
/sf autonomous |
/autonomous |
migrate |
/sf next |
/next |
migrate |
/sf stop |
/stop |
migrate |
/sf pause |
/pause |
migrate |
/sf status |
/status |
migrate |
/sf doctor |
/doctor |
migrate |
/sf rate |
/rate |
migrate |
/sf session-report |
/session-report |
migrate |
/sf parallel |
/parallel |
migrate |
/sf remote |
/remote |
migrate |
/sf tasks |
/tasks |
new |
11.2 Migration Timeline
| Phase | Action |
|---|---|
| Phase 1 (now) | Accept both /sf X and /X. Log deprecation warning for /sf. |
| Phase 2 (2 releases) | /sf X shows warning: "Use /X instead. /sf will be removed." |
| Phase 3 (4 releases) | /sf X errors: "Unknown command. Did you mean /X?" |
| Phase 4 (6 releases) | Remove /sf handler entirely. |
11.3 Shell Surface
Machine surface remains prefixed:
sf headless autonomous
sf headless --autonomous ...
12. Runtime Target: Node 26
12.1 Policy
current compatibility floor: Node 26.1+
internal target runtime: Node 26.1
canonical baseline: Node 26.1
Node 25: skip except quick probes
12.2 Why Node 26
Temporalenabled by default- V8 14.6 baseline
- Undici 8 HTTP/fetch baseline
- Removes legacy APIs, hardens against old assumptions
12.3 Temporal Adoption
Store semantic type, not just formatted string:
| Concept | Temporal Type | Use Case |
|---|---|---|
| Exact instant | Temporal.Instant |
Journal events, checkpoints, lock leases |
| Local time | Temporal.ZonedDateTime |
Reminders, schedules, audits |
| Calendar date | Temporal.PlainDate |
Daily reports, milestone reviews |
| Wall-clock time | Temporal.PlainTime |
Recurring policies |
| Time amount | Temporal.Duration |
Budgets, leases, cooldowns, retry delays |
12.4 Adoption Priority
sf schedule— highest user-visible impact- Lock/lease — highest operational correctness
- Journals/traces — highest debugging impact
- Session reports — nice to have
- Background tasks — future work
12.5 Gate
node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"
13. Implementation Pull-Through
13.1 Already Directionally Right
- UOK lifecycle records carry
runControl - UOK lifecycle records carry
permissionProfile - Schedule command state uses
autonomous_dispatch - DB-backed state, recovery, verification, scheduling, captures, forensics
- Skills and project-specific skill paths exist
- Parallel orchestration and remote-question infrastructure
13.2 Still Needed
| Priority | Item | Effort |
|---|---|---|
| P0 | Remove /sf internal dispatch, docs, tests, help text |
Medium |
| P0 | Make workMode durable state (SQLite + .sf/) |
Medium |
| P0 | Add direct /mode, /control, /trust, /model-mode commands |
Medium |
| P0 | Add visible mode badge to TUI header/status bar | Small |
| P1 | Make --autonomous chain into direct /autonomous |
Small |
| P1 | Expose autonomous continuation limits in settings and status | Small |
| P1 | Add /tasks with durable + ephemeral state |
Large |
| P1 | Make repair first-class workflow over doctor |
Medium |
| P2 | Policy-aware project skill suggestion/generation | Large |
| P2 | Skill eval cases for generated skills | Large |
| P2 | Schema-backed task frontmatter (risk, mutation, verification) | Medium |
| P2 | Intent/claim records for parallel workers | Medium |
| P2 | Audit subagent provider/model/permission inheritance | Medium |
| P2 | Audit remote steering as full-session surface | Medium |
14. Open Questions
- Should
planmode show badge[P]orplantext in full? - Should paused autonomous show previous badge dimmed, or
[P]for paused? - Should mode be per-session or per-project? (Current: per-session)
- Should badge appear in tmux/terminal window titles?
- Should mode transitions have sound/notification?
- Should
repairauto-transition beaskby default for new projects? - Should skill eval cases run in CI or only on-demand?
- Should
/tasksbe a TUI overlay or a separate scrollable panel?
15. References
- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot
- Factory Droid, "Autonomy Level" — https://docs.factory.ai/cli/user-guides/auto-run
- Amp manual — https://ampcode.com/manual
- Smelt (mode cycling) — https://github.com/leonardcser/smelt
- ORCH (task state machine) — https://github.com/oxgeneral/ORCH
- AgentPlane (schema-first tasks) — https://github.com/basilisk-labs/agentplane
- Relay (channels and tickets) — https://github.com/jcast90/relay
- Sage (runtime-neutral orchestration) — https://github.com/youwangd/SageCLI
- Wit (symbol-level locks) — https://github.com/amaar-mc/wit