Mikael Hugo 19bfc3d3f6 feat(sf): align node sqlite uok runtime

2026-05-08 03:01:20 +02:00

19 KiB

Raw Blame History

SF Agent Mode System

Status: Draft specification. Promoted from copilot-thoughts.md research notes. Scope: TUI mode surface, command structure, orthogonal state axes, skills, background work, runtime target. Decision authority: Product + architecture review required before implementation.

1. Problem Statement

SF's current command surface (/sf autonomous, /sf next, /sf pause, /sf stop) treats mode switching as separate commands rather than persistent states. There is no visible indicator of the current mode, and the /sf prefix positions SF as a plugin rather than the system itself.

Competitors (Copilot CLI, Factory Droid, Amp) have cleaner mode surfaces with visible state and orthogonal controls. SF has deeper autonomous machinery but weaker presentation.

Goal: Make SF's mode system as obvious as Vim's insert/normal mode indicator, with the control depth of Factory Droid's autonomy levels and the skill system of Amp.

2. Orthogonal State Axes

SF state is five independent axes, not one overloaded "mode."

workMode:          chat | plan | build | review | repair | research
runControl:        manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode:         fast | smart | deep
surface:           tui | web | headless | rpc

2.1 Axis Definitions

Axis	Question It Answers	Values
`workMode`	What kind of work is SF doing?	`chat`, `plan`, `build`, `review`, `repair`, `research`
`runControl`	Who advances the loop?	`manual` (user), `assisted` (one unit then pause), `autonomous` (continuous)
`permissionProfile`	What may proceed without approval?	`restricted`, `normal`, `trusted`, `unrestricted`
`modelMode`	Speed/cost/reasoning posture?	`fast` (cheap), `smart` (balanced), `deep` (reasoning)
`surface`	How is the user connected?	`tui`, `web`, `headless`, `rpc`

2.2 Example Combinations

plan     | manual     | normal     | deep     → user plans with reasoning model
build    | autonomous | trusted    | smart    → continuous implementation
repair   | assisted   | normal     | smart    → one repair unit at a time
research | autonomous | restricted | deep     → continuous research, read-only
review   | manual     | restricted | deep     → user reviews with reasoning model

2.3 Rules

permissionProfile never implies runControl. Autonomous run with restricted permissions is valid.
runControl never implies permissionProfile. Manual run with unrestricted permissions is valid.
Denylists and safety gates override permissionProfile regardless of value.
Every risk decision logs all five axis values.

3. Work Modes

3.1 `chat`

Default conversational mode. Questions, explanations, low-commitment exploration. No durable artifacts created without explicit user request.

3.2 `plan`

Research, clarify, write/update specs, derive tasks, produce explicit acceptance point before implementation. Primary user journey starts here.

Plan → Build handoff:

plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart

Surfaces:

TUI: plan acceptance prompt includes "run autonomously" button
Web: plan acceptance button includes "run autonomously"
Headless: --autonomous chains into direct /autonomous
RPC: machine event records transition explicitly

3.3 `build`

Implement, test, lint, typecheck, verify, prepare commit-ready changes. The autonomous default.

3.4 `review`

Inspect diffs, tests, risks, regressions, security issues, missing evidence. Requires reasoning model (deep).

3.5 `repair`

Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, broken installed runtime copies.

Doctor is the diagnostic engine, not the mode. /doctor inspects. /repair switches work mode.

Commands:

/doctor          → inspect and report
/doctor fix      → deterministic auto-fix
/doctor heal     → LLM-assisted deep healing
/repair          → switch workMode to repair
/repair --autonomous → repair until clean, blocked, or limit-hit

Auto-transition to repair:

build | autonomous | trusted | smart
→ repair | autonomous | normal | smart

Allowed when:

pre-dispatch health gate fails
installed runtime drift detected
SF cannot dispatch safely
repo workflow state corrupted

Policy: configurable per project. Options: auto, ask, log-only.

3.6 `research`

Longer-form codebase, competitor, design, API, or dependency research. Uses web search, local code exploration, cross-repo research, helper agents.

4. Run Control

Value	Behavior
`manual`	User drives every step. Tool calls require approval.
`assisted`	SF executes one unit, then pauses for user review.
`autonomous`	SF continues until done, blocked, interrupted, budget-hit, or limit-hit.

4.1 Commands

/control manual
/control assisted
/control autonomous
/autonomous    → alias for /control autonomous
/next          → alias for /control assisted (one unit)
/pause         → pause autonomous, preserve state
/stop          → stop autonomous, clear state

4.2 Transition Scopes

Scope	Behavior
`now`	Apply immediately if no tool active. Abort current tool if policy allows.
`after-current-tool`	Finish active tool, then switch.
`after-current-unit`	Finish current SF unit, then switch.
`next-milestone`	Switch after current milestone completes.

Autonomous changes affect future decisions, never mutate active tool calls mid-execution.

4.3 Transition Logging

Every transition persists:

{
  "timestamp": "2026-05-08T10:00:00Z",
  "from": {"workMode": "build", "runControl": "autonomous"},
  "to": {"workMode": "repair", "runControl": "autonomous"},
  "reason": "pre-dispatch health gate failed",
  "scope": "after-current-unit",
  "sessionId": "..."
}

5. Permission Profiles

Profile	Description
`restricted`	Read-only and explicitly allowlisted actions.
`normal`	Safe edits, non-destructive local commands.
`trusted`	Build/test/install/local commits and bounded repo automation.
`unrestricted`	High-risk orchestration only in intentionally trusted environments.

5.1 Enforcement

Permission profile is enforced at three layers:

Tool registry: Each tool declares required profile. Tools below profile are hidden from model.
Execution gate: Each tool call checks profile at invocation. Violation = error.
Safety harness: Destructive operations (delete, push to production, etc.) require explicit confirmation regardless of profile.

5.2 Commands

/trust restricted
/trust normal
/trust trusted
/trust unrestricted

6. Model Modes

Mode	Use Case	Routing Hint
`fast`	Small bounded tasks	Cheapest available model
`smart`	Default balanced work	Default routing table
`deep`	Planning, debugging, research, review	Reasoning model (o1, Claude Opus, etc.)

modelMode guides routing. It does not replace explicit /model selection.

7. Mode Switching UX

7.1 Direct Commands

/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/trust restricted
/trust normal
/trust trusted
/trust unrestricted
/model-mode fast
/model-mode smart
/model-mode deep

7.2 Combined Forms

/mode repair --autonomous --trust normal
/mode build --autonomous --trust trusted
/mode research --autonomous --trust restricted --model-mode deep

7.3 Autonomous Steering

/steer mode repair
/steer mode review after-current-unit
/steer trust restricted now
/steer model-mode deep for-next-unit

7.4 Keyboard Shortcuts

Shortcut	Action
`Ctrl+Shift+M`	Cycle workMode: chat → plan → build → review → repair → research → chat
`Ctrl+Shift+A`	Set runControl to autonomous
`Ctrl+Shift+S`	Set runControl to assisted (step)
`Ctrl+Shift+I`	Set runControl to manual (interactive)
`Ctrl+Shift+R`	Set workMode to repair
`Ctrl+Shift+P`	Cycle permissionProfile: restricted → normal → trusted → unrestricted → restricted

8. Status and Mode Badge

8.1 Full Status Line

SF  build | autonomous | trusted | smart

8.2 Compact Badge Form

For narrow terminals (< 80 cols):

[B][A][T][S]

8.3 Critical State Labels

When workMode is repair or review, show full labels regardless of width:

repair | autonomous | normal | smart
review | assisted | normal | deep

8.4 Badge Placement

Surface	Placement
TUI header	Left side, after "SF" logo
TUI status bar	Bottom line when header hidden
tmux/terminal title	`SF[build
Web	Top bar, color-coded chip

8.5 Badge During Auto Mode

Current code hides header/footer during auto mode (if (isAutoActive()) return []). This must change:

Show minimal header during auto: badge + project name only
Or show badge in dedicated status bar separate from header/footer
Badge color pulses slowly during autonomous execution (subtle animation)

8.6 Badge Colors

Axis	Value	Color
workMode	`chat`	dim
workMode	`plan`	accent
workMode	`build`	success
workMode	`review`	warning
workMode	`repair`	error
workMode	`research`	info
runControl	`manual`	dim
runControl	`assisted`	warning
runControl	`autonomous`	success (pulsing)
permissionProfile	`restricted`	success
permissionProfile	`normal`	dim
permissionProfile	`trusted`	warning
permissionProfile	`unrestricted`	error

9. Background Work Surface (`/tasks`)

Unified view of all background work. Replaces scattered /status, /queue, /parallel status for work inspection.

9.1 What `/tasks` Shows

autonomous units (current + queued)
parallel workers
scheduled autonomous dispatches
background shell sessions
stuck or resumable sessions
remote questions waiting for answers
current cost/budget state
last checkpoint and next action

9.2 Data Model

SQLite tables:

-- Durable task state
CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  work_mode TEXT NOT NULL,
  run_control TEXT NOT NULL,
  permission_profile TEXT NOT NULL,
  model_mode TEXT NOT NULL,
  status TEXT NOT NULL, -- pending | running | review | done | retrying | failed | cancelled
  dependency_blockers TEXT, -- JSON array of task IDs
  retry_count INTEGER DEFAULT 0,
  max_retries INTEGER DEFAULT 3,
  checkpoint_ref TEXT, -- git ref or patch file
  cost_budget REAL,
  cost_spent REAL DEFAULT 0,
  created_at TEXT, -- Temporal.Instant ISO
  started_at TEXT,
  completed_at TEXT,
  next_action_at TEXT, -- Temporal.ZonedDateTime for scheduled
  intent_claim TEXT -- for parallel workers: "I will edit src/foo.ts lines 10-50"
);

-- Ephemeral running state
CREATE TABLE task_runtime (
  task_id TEXT PRIMARY KEY REFERENCES tasks(id),
  process_pid INTEGER,
  worktree_path TEXT,
  current_model TEXT,
  context_usage_percent REAL,
  last_heartbeat_at TEXT, -- Temporal.Instant
  FOREIGN KEY (task_id) REFERENCES tasks(id)
);

-- Transition log
CREATE TABLE task_transitions (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,
  from_status TEXT NOT NULL,
  to_status TEXT NOT NULL,
  reason TEXT,
  scope TEXT,
  timestamp TEXT NOT NULL -- Temporal.Instant
);

9.3 Complementary Commands

/tasks does not replace:

/status → project health dashboard
/queue → milestone/slice dispatch order
/parallel status → parallel orchestrator detail
/session-report → cost/token summary
/logs → activity logs
/forensics → execution forensics

10. Skills System

10.1 Directory Structure

.agents/skills/<skill-name>/
  SKILL.md          -- skill definition with YAML frontmatter
  scripts/          -- supporting scripts
  schemas/          -- JSON schemas for inputs/outputs
  checklists/       -- verification checklists
  mcp.json          -- MCP server config if applicable

10.2 Skill Frontmatter

---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---

Fields:

name: unique identifier
description: when to use this skill
user-invocable: can user explicitly invoke?
model-invocable: can model auto-invoke when relevant?
side-effects: none, code-edits, production-mutation, etc.
permission-profile: minimum profile required

10.3 Skill Categories

Type	Example	`model-invocable`
Background knowledge	`forge-autonomous-runtime`	true
User tool	`production-deploy`	false
Shared capability	`forge-command-surface`	true

Dangerous skills (production-mutation) are never model-invoked by default.

10.4 Auto-Creation Flow

Detect repeated repo-specific evidence (same files, commands, failure modes, rules)
Propose skill in manual/restricted contexts
Generate/update automatically only when policy allows
Record source evidence in .sf state
Keep narrow and testable
Commit with repo when accepted

10.5 Skill Eval Cases

Every auto-created skill needs eval cases:

.agents/skills/<skill-name>/evals/
  case-1/
    task.md          -- user-like prompt
    grader.js        -- deterministic checker
    hidden/          -- reference answers (not visible to agent)
    work/            -- agent workspace

Graders inspect: files, artifacts, answer.json, trace.jsonl, result state. Failed trials preserve workspace for debugging.

11. Migration from `/sf` Commands

11.1 Command Mapping

Old	New	Status
`/sf`	`/next`	migrate
`/sf autonomous`	`/autonomous`	migrate
`/sf next`	`/next`	migrate
`/sf stop`	`/stop`	migrate
`/sf pause`	`/pause`	migrate
`/sf status`	`/status`	migrate
`/sf doctor`	`/doctor`	migrate
`/sf rate`	`/rate`	migrate
`/sf session-report`	`/session-report`	migrate
`/sf parallel`	`/parallel`	migrate
`/sf remote`	`/remote`	migrate
`/sf tasks`	`/tasks`	new

11.2 Migration Timeline

Phase	Action
Phase 1 (now)	Accept both `/sf X` and `/X`. Log deprecation warning for `/sf`.
Phase 2 (2 releases)	`/sf X` shows warning: "Use /X instead. /sf will be removed."
Phase 3 (4 releases)	`/sf X` errors: "Unknown command. Did you mean /X?"
Phase 4 (6 releases)	Remove `/sf` handler entirely.

11.3 Shell Surface

Machine surface remains prefixed:

sf headless autonomous
sf headless --autonomous ...

12. Runtime Target: Node 26

12.1 Policy

current compatibility floor: Node 26.1+
internal target runtime:     Node 26.1
canonical baseline:          Node 26.1
Node 25:                     skip except quick probes

12.2 Why Node 26

Temporal enabled by default
V8 14.6 baseline
Undici 8 HTTP/fetch baseline
Removes legacy APIs, hardens against old assumptions

12.3 Temporal Adoption

Store semantic type, not just formatted string:

Concept	Temporal Type	Use Case
Exact instant	`Temporal.Instant`	Journal events, checkpoints, lock leases
Local time	`Temporal.ZonedDateTime`	Reminders, schedules, audits
Calendar date	`Temporal.PlainDate`	Daily reports, milestone reviews
Wall-clock time	`Temporal.PlainTime`	Recurring policies
Time amount	`Temporal.Duration`	Budgets, leases, cooldowns, retry delays

12.4 Adoption Priority

sf schedule — highest user-visible impact
Lock/lease — highest operational correctness
Journals/traces — highest debugging impact
Session reports — nice to have
Background tasks — future work

12.5 Gate

node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"

13. Implementation Pull-Through

13.1 Already Directionally Right

UOK lifecycle records carry runControl
UOK lifecycle records carry permissionProfile
Schedule command state uses autonomous_dispatch
DB-backed state, recovery, verification, scheduling, captures, forensics
Skills and project-specific skill paths exist
Parallel orchestration and remote-question infrastructure

13.2 Still Needed

Priority	Item	Effort
P0	Remove `/sf` internal dispatch, docs, tests, help text	Medium
P0	Make `workMode` durable state (SQLite + `.sf/`)	Medium
P0	Add direct `/mode`, `/control`, `/trust`, `/model-mode` commands	Medium
P0	Add visible mode badge to TUI header/status bar	Small
P1	Make `--autonomous` chain into direct `/autonomous`	Small
P1	Expose autonomous continuation limits in settings and status	Small
P1	Add `/tasks` with durable + ephemeral state	Large
P1	Make `repair` first-class workflow over `doctor`	Medium
P2	Policy-aware project skill suggestion/generation	Large
P2	Skill eval cases for generated skills	Large
P2	Schema-backed task frontmatter (risk, mutation, verification)	Medium
P2	Intent/claim records for parallel workers	Medium
P2	Audit subagent provider/model/permission inheritance	Medium
P2	Audit remote steering as full-session surface	Medium

14. Open Questions

Should plan mode show badge [P] or plan text in full?
Should paused autonomous show previous badge dimmed, or [P] for paused?
Should mode be per-session or per-project? (Current: per-session)
Should badge appear in tmux/terminal window titles?
Should mode transitions have sound/notification?
Should repair auto-transition be ask by default for new projects?
Should skill eval cases run in CI or only on-demand?
Should /tasks be a TUI overlay or a separate scrollable panel?

15. References

GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" — https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot
Factory Droid, "Autonomy Level" — https://docs.factory.ai/cli/user-guides/auto-run
Amp manual — https://ampcode.com/manual
Smelt (mode cycling) — https://github.com/leonardcser/smelt
ORCH (task state machine) — https://github.com/oxgeneral/ORCH
AgentPlane (schema-first tasks) — https://github.com/basilisk-labs/agentplane
Relay (channels and tickets) — https://github.com/jcast90/relay
Sage (runtime-neutral orchestration) — https://github.com/youwangd/SageCLI
Wit (symbol-level locks) — https://github.com/amaar-mc/wit

19 KiB Raw Blame History