singularity-forge/docs/specs/agent-mode-system.md

21 KiB

SF Agent Mode System

Status: Draft specification. Promoted from copilot-thoughts.md research notes. Scope: TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target. Decision authority: Product + architecture review required before implementation.


1. Problem Statement

SF's old command surface treated mode switching as separate commands rather than persistent states. There was no visible indicator of the current mode, and the /sf prefix positioned SF as a plugin rather than the system itself.

Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.

Goal: Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.


2. Orthogonal State Axes

SF state is five independent axes, not one overloaded "mode."

workMode:          chat | plan | build | review | repair | research
runControl:        manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode:         fast | smart | deep
surface:           tui | web | headless | rpc

2.1 Axis Definitions

Axis Question It Answers Values
workMode What kind of work is SF doing? chat, plan, build, review, repair, research
runControl Who advances the loop? manual (user), assisted (one unit then pause), autonomous (continuous)
permissionProfile What may proceed without approval? restricted, normal, trusted, unrestricted
modelMode Speed/cost/reasoning posture? fast (cheap), smart (balanced), deep (reasoning)
surface How is the user connected? tui, web, headless, rpc

2.2 Example Combinations

plan     | manual     | normal     | deep     → user plans with reasoning model
build    | autonomous | trusted    | smart    → continuous implementation
repair   | assisted   | normal     | smart    → one repair unit at a time
research | autonomous | restricted | deep     → continuous research, read-only
review   | manual     | restricted | deep     → user reviews with reasoning model

2.3 Rules

  • permissionProfile never implies runControl. Autonomous run with restricted permissions is valid.
  • runControl never implies permissionProfile. Manual run with unrestricted permissions is valid.
  • Denylists and safety gates override permissionProfile regardless of value.
  • Every risk decision logs all five axis values.
  • sandboxProfile may become a sixth axis later. It is separate from permissionProfile: sandboxing controls process/filesystem/network containment, while permission profile controls what SF may approve.

3. Work Modes

3.1 chat

Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.

3.2 plan

Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.

Plan → Build handoff:

plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart

Surfaces:

  • TUI: plan acceptance prompt includes "run autonomously" button
  • Web: plan acceptance button includes "run autonomously"
  • Headless: --autonomous chains into direct /autonomous
  • RPC: machine event records transition explicitly

3.3 build

Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.

3.4 review

Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (deep).

3.5 repair

Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies, failed gates, generated/runtime drift, and broken state.

Doctor is the diagnostic engine, not the mode. /doctor inspects. /repair switches work mode. repair is a workMode, not a separate subsystem.

Commands:

/doctor          → inspect and report
/doctor fix      → deterministic auto-fix
/doctor heal     → LLM-assisted deep healing
/repair          → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit

Auto-transition to repair:

build | autonomous | trusted | smart
→ repair | autonomous | normal | smart

Allowed when:

  • pre-dispatch health gate fails
  • installed runtime drift detected
  • SF cannot dispatch safely
  • repo workflow state corrupted

Policy: configurable per project. Options: auto, ask, log-only.

3.6 research

Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.


4. Run Control

Value Behavior
manual User drives every step. Tool calls require approval.
assisted SF executes one unit, then pauses for user review.
autonomous SF continues until done, blocked, interrupted, budget-hit, or limit-hit.

4.1 Commands

/control manual
/control assisted
/control autonomous
/autonomous    → alias for /control autonomous
/next          → alias for /control assisted (one unit)
/pause         → pause autonomous, preserve state
/stop          → stop autonomous, clear state

4.2 Transition Scopes

Scope Behavior
now Apply immediately if no tool active. Abort current tool if policy allows.
after-current-tool Finish active tool, then switch.
after-current-unit Finish current SF unit, then switch.
next-milestone Switch after current milestone completes.

Autonomous changes affect future decisions, never mutate active tool calls mid-execution.

4.3 Transition Logging

Every transition persists:

{
  "timestamp": "2026-05-08T10:00:00Z",
  "from": {"workMode": "build", "runControl": "autonomous"},
  "to": {"workMode": "repair", "runControl": "autonomous"},
  "reason": "pre-dispatch health gate failed",
  "scope": "after-current-unit",
  "sessionId": "..."
}

5. Permission Profiles

Profile Description
restricted Read-only and explicitly allowlisted actions.
normal Safe edits, non-destructive local commands.
trusted Build/test/install/local commits and bounded repo automation.
unrestricted High-risk orchestration only in intentionally trusted environments.

5.1 Enforcement

Permission profile is enforced at three layers:

  1. Tool registry: Each tool declares required profile. Tools below profile are hidden from model.
  2. Execution gate: Each tool call checks profile at invocation. Violation = error.
  3. Safety harness: Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.

5.2 Commands

/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted

6. Model Modes

Mode Use Case Routing Hint
fast Small bounded tasks Cheapest available model
smart Default balanced work Default routing table
deep Planning, debugging, research, review Reasoning model (o1, Claude Opus, etc.)

modelMode guides routing. It does not replace explicit /model selection.


7. Mode Switching UX

7.1 Direct Commands

/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/permission-profile restricted
/permission-profile normal
/permission-profile trusted
/permission-profile unrestricted
/model-mode fast
/model-mode smart
/model-mode deep

7.2 Combined Forms

/mode repair --autonomous --permission-profile normal
/mode build --autonomous --permission-profile trusted
/mode research --autonomous --permission-profile restricted --model-mode deep

7.3 Autonomous Steering

/steer mode repair
/steer mode review after-current-unit
/steer permission-profile restricted now
/steer model-mode deep for-next-unit

7.4 Keyboard Shortcuts

Shortcut Action
Ctrl+Shift+M Cycle workMode: chat → plan → build → review → repair → research → chat
Ctrl+Shift+A Set runControl to autonomous
Ctrl+Shift+S Set runControl to assisted (step)
Ctrl+Shift+I Set runControl to manual (interactive)
Ctrl+Shift+R Set workMode to repair
Ctrl+Shift+P Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted

8. Status and Mode Badge

8.1 Full Status Line

SF  build | autonomous | trusted | smart

8.2 Compact Badge Form

For narrow terminals (< 80 cols):

[B][A][T][S]

8.3 Critical State Labels

When workMode is repair or review, show full labels regardless of width:

repair | autonomous | normal | smart
review | assisted | normal | deep

8.4 Badge Placement

Surface Placement
TUI header Left side, after "SF" logo
TUI status bar Bottom line when header hidden
tmux/terminal title `SF[build
Web Top bar, color-coded chip

8.5 Badge During Auto Mode

Current code hides header/footer during auto mode (if (isAutoActive()) return []). This must change:

  • Show minimal header during auto: badge + project name only
  • Or show badge in dedicated status bar separate from header/footer
  • Badge color pulses slowly during autonomous execution (subtle animation)

8.6 Badge Colors

Axis Value Color
workMode chat dim
workMode plan accent
workMode build success
workMode review warning
workMode repair error
workMode research info
runControl manual dim
runControl assisted warning
runControl autonomous success (pulsing)
permissionProfile restricted success
permissionProfile normal dim
permissionProfile trusted warning
permissionProfile unrestricted error

9. Background Work Surface (/tasks)

Unified view of all background work. Replaces scattered /status, /queue, /parallel status for work inspection.

9.1 What /tasks Shows

  • autonomous task lifecycle rows
  • parallel workers
  • scheduled autonomous dispatches and queued scheduler rows
  • background shell sessions
  • stuck or resumable sessions
  • remote questions waiting for answers
  • current cost/budget state
  • last checkpoint and next action

9.2 Data Model

SQLite tables:

-- Durable task state
CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  work_mode TEXT NOT NULL,
  run_control TEXT NOT NULL,
  permission_profile TEXT NOT NULL,
  model_mode TEXT NOT NULL,
  status TEXT NOT NULL, -- todo | running | verifying | reviewing | done | blocked | paused | failed | cancelled | retrying
  dependency_blockers TEXT, -- JSON array of task IDs
  retry_count INTEGER DEFAULT 0,
  max_retries INTEGER DEFAULT 3,
  checkpoint_ref TEXT, -- git ref or patch file
  cost_budget REAL,
  cost_spent REAL DEFAULT 0,
  created_at TEXT, -- Temporal.Instant ISO
  started_at TEXT,
  completed_at TEXT,
  next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
  intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);

-- Scheduler state is separate from task lifecycle state
CREATE TABLE task_scheduler (
  task_id TEXT PRIMARY KEY REFERENCES tasks(id),
  status TEXT NOT NULL, -- queued | due | claimed | dispatched | consumed | expired
  due_at TEXT,
  claimed_by TEXT,
  dispatched_at TEXT,
  consumed_at TEXT,
  expires_at TEXT
);

-- Ephemeral running state
CREATE TABLE task_runtime (
  task_id TEXT PRIMARY KEY REFERENCES tasks(id),
  process_pid INTEGER,
  worktree_path TEXT,
  current_model TEXT,
  context_usage_percent REAL,
  last_heartbeat_at TEXT, -- Temporal.Instant
  FOREIGN KEY (task_id) REFERENCES tasks(id)
);

-- Transition log
CREATE TABLE task_transitions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,
  from_status TEXT NOT NULL,
  to_status TEXT NOT NULL,
  reason TEXT,
  scope TEXT,
  timestamp TEXT NOT NULL -- Temporal.Instant
);

Parallel workers must stay worktree-isolated and report heartbeat/status into .sf state. Scheduler rows may use queued; task lifecycle rows use todo.

9.3 Complementary Commands

/tasks does not replace:

  • /status → project health dashboard
  • /queue → milestone/slice dispatch order
  • /parallel status → parallel orchestrator detail
  • /session-report → cost/token summary
  • /logs → activity logs
  • /forensics → execution forensics

10. Skills System

10.1 Directory Structure

.agents/skills/<skill-name>/
  SKILL.md          -- skill definition with YAML frontmatter
  scripts/          -- supporting scripts
  schemas/          -- JSON schemas for inputs/outputs
  checklists/       -- verification checklists
  mcp.json          -- MCP server config if applicable

10.2 Skill Frontmatter

---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---

Fields:

  • name: unique identifier
  • description: when to use this skill
  • user-invocable: can user explicitly invoke?
  • model-invocable: can model auto-invoke when relevant?
  • side-effects: none, code-edits, production-mutation, etc.
  • permission-profile: minimum profile required

10.3 Skill Categories

Type Example model-invocable
Background knowledge forge-autonomous-runtime true
User tool production-deploy false
Shared capability forge-command-surface true

Dangerous skills (production-mutation) are never model-invoked by default.

10.4 Auto-Creation Flow

  1. Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
  2. Propose skill in manual/restricted contexts
  3. Generate/update automatically only when policy allows
  4. Record repeated source evidence in .sf state
  5. Keep narrow, linted, and evaled like code
  6. Commit with repo when accepted

10.5 Skill Eval Cases

Every auto-created skill needs eval cases:

.agents/skills/<skill-name>/evals/
  case-1/
    task.md          -- user-like prompt
    grader.js        -- deterministic checker
    hidden/          -- reference answers (not visible to agent)
    work/            -- agent workspace

Graders inspect: files, artifacts, answer.json, trace.jsonl, result state. Failed trials preserve workspace for debugging.


11. Command Surfaces

11.1 Human Slash Commands

SF registers direct command roots only:

/status
/autonomous
/doctor
/rate
/session-report
/parallel
/remote
/tasks

/sf is not a command root. TUI and browser command parity tests reject it so compatibility shims do not grow back.

/remote is a full-session steering surface. Remote answers may change workMode, runControl, permissionProfile, and modelMode; they are not limited to question delivery.

11.2 Shell Surface

Machine surface remains prefixed:

sf headless autonomous
sf headless --autonomous ...

The shell prefix is the executable name, not an interactive slash-command namespace.


12. Runtime Target: Node 26

12.1 Policy

current compatibility floor: Node 26.1+
internal target runtime:     Node 26.1
canonical baseline:          Node 26.1
Node 25:                     skip except quick probes

12.2 Why Node 26

  • Temporal enabled by default
  • V8 14.6 baseline
  • Undici 8 HTTP/fetch baseline
  • Removes legacy APIs, hardens against old assumptions

12.3 Temporal Adoption

Store semantic type, not just formatted string:

Concept Temporal Type Use Case
Exact instant Temporal.Instant Journal events, checkpoints, lock leases
Local time Temporal.ZonedDateTime Reminders, schedules, audits
Calendar date Temporal.PlainDate Daily reports, milestone reviews
Wall-clock time Temporal.PlainTime Recurring policies
Time amount Temporal.Duration Budgets, leases, cooldowns, retry delays

12.4 Adoption Priority

  1. sf schedule — highest user-visible impact
  2. Lock/lease — highest operational correctness
  3. Journals/traces — highest debugging impact
  4. Session reports — nice to have
  5. Background tasks — future work

12.5 Gate

node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"

13. Implementation Pull-Through

13.1 Already Directionally Right

  • UOK lifecycle records carry runControl
  • UOK lifecycle records carry permissionProfile
  • Schedule command state uses autonomous_dispatch
  • DB-backed state, recovery, verification, scheduling, captures, forensics
  • Skills and project-specific skill paths exist
  • Parallel orchestration and remote-question infrastructure

13.2 Still Needed

Priority Item Effort
P2 Decide whether sandboxProfile becomes a sixth persisted axis Medium
P2 Remove /sf from docs/web/tests (Phase 2 deprecation) Small

13.4 Recently Completed (This Session)

Priority Item Status
P1 Centralized metrics system (metrics-central.js)
P1 Cost command (/cost) with DB + ledger queries
P1 Explicit stage commands (/research, /plan, /implement)
P2 Reasoning assist foundation (reasoning-assist.js)
P2 Self-feedback → workMode auto-transition
P2 UOK events carry workMode + modelMode
P2 /sf prefix deprecation warning (Phase 1)

13.3 Completed

Priority Item Status
P0 Make mode state durable in SQLite
P0 Add direct /mode, /control, /permission-profile, /model-mode commands
P0 Add visible mode badge to TUI header/status bar
P1 Make --autonomous chain into direct /autonomous
P1 Expose autonomous continuation limits in settings and status
P1 Add /tasks backed by DB execution graph state
P1 Make repair first-class workflow over doctor
P1 Enhanced /steer with mode/permission-profile/model-mode transitions
P1 TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P)
P1 Minimal auto-mode header/footer (badge visible during autonomy)
P1 Remove /sf namespace registration and parity-test against fallback
P1 Parallel worker intent/claim registry backed by UOK SQLite coordination
P1 Skill eval harness foundation
P1 Terminal title mode indicator
P2 Policy-aware project skill suggestion/generation with DB cooldown
P2 Schema-backed task frontmatter (risk, mutation, verification, approval)
P2 Subagent provider/model/permission inheritance audit and guard
P2 Remote steering as full-session surface from remote answers

14. Open Questions

  1. Should plan mode show badge [P] or plan text in full?
  2. Should paused autonomous show previous badge dimmed, or [P] for paused?
  3. Should mode be per-session or per-project? (Current: per-session)
  4. Should badge appear in tmux/terminal window titles?
  5. Should mode transitions have sound/notification?
  6. Should repair auto-transition be ask by default for new projects?
  7. Should skill eval cases run in CI or only on-demand?
  8. Should /tasks be a TUI overlay or a separate scrollable panel?
  9. Should reasoning assist call a fast model automatically, or only prepare prompts for now?

15. References