- Paused badge: P! prefix + dim (implemented) - Mode per-session confirmed - Tmux: user-opt-in only (SF does not inject tmux config) - No sound/notification - repair auto-transition: no ask gate - Skill evals: on-demand with SF_SKILL_EVALS=1 - /tasks: inline output until pi-tui overlay support exists - modelMode: supplement tiers via bridge (confirmed) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1267 lines
42 KiB
Markdown
1267 lines
42 KiB
Markdown
# Agent Mode And Skills Notes For SF
|
|
|
|
Sources checked 2026-05-08:
|
|
|
|
- GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously"
|
|
<https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot>
|
|
- GitHub Docs, "GitHub Copilot CLI command reference"
|
|
<https://docs.github.com/en/copilot/reference/copilot-cli-reference/cli-command-reference>
|
|
- GitHub Copilot CLI product page
|
|
<https://github.com/features/copilot/cli>
|
|
- GitHub Changelog, "GitHub Copilot CLI is now generally available"
|
|
<https://github.blog/changelog/2026-02-25-github-copilot-cli-is-now-generally-available/>
|
|
- GitHub Changelog, "Copilot CLI now supports BYOK and local models"
|
|
<https://github.blog/changelog/2026-04-07-copilot-cli-now-supports-byok-and-local-models/>
|
|
- Factory Droid, "Autonomy Level"
|
|
<https://docs.factory.ai/cli/user-guides/auto-run>
|
|
- Factory Droid, "CLI Reference"
|
|
<https://docs.factory.ai/reference/cli-reference>
|
|
- Factory Droid, "Skills"
|
|
<https://docs.factory.ai/cli/configuration/skills>
|
|
- Amp manual
|
|
<https://ampcode.com/manual>
|
|
- Amp, "Agent Skills"
|
|
<https://ampcode.com/news/agent-skills>
|
|
|
|
## Competitive Patterns
|
|
|
|
### Copilot CLI
|
|
|
|
Copilot CLI has the cleanest public "continue work" story:
|
|
|
|
- plan first, then accept the plan and continue without step-by-step approval
|
|
- continuation is separate from permission expansion
|
|
- question suppression is separate from continuation
|
|
- a runaway continuation cap is explicit
|
|
- `/fleet` parallelizes with subagents
|
|
- `/remote` steers a running session from another device
|
|
- `/tasks` exposes background work
|
|
- `/session` exposes session info, checkpoints, files, plans, cleanup, and
|
|
pruning
|
|
- `/skills`, `/plugin`, `/mcp`, and `/agent` are visible control surfaces
|
|
- BYOK, local-model, and offline provider configuration are first-class
|
|
|
|
Useful shape:
|
|
|
|
```bash
|
|
copilot --autopilot --yolo --max-autopilot-continues 10 -p "YOUR PROMPT HERE"
|
|
```
|
|
|
|
SF should copy the separation, not the names:
|
|
|
|
- continuation is run control
|
|
- permission expansion is permission profile
|
|
- question behavior is an escalation policy
|
|
- runaway caps are explicit autonomous limits
|
|
|
|
### Factory Droid
|
|
|
|
Factory Droid makes the most important distinction explicit: Autonomy Level is
|
|
separate from interaction mode.
|
|
|
|
- interaction mode: Auto vs Spec Mode
|
|
- autonomy level: Off, Low, Medium, High
|
|
- execution surface: interactive `droid` vs headless `droid exec`
|
|
- tools and commands carry risk levels
|
|
- command allowlists and denylists layer on top of autonomy
|
|
- Spec Mode plans first; after approval, Droid exits Spec Mode and implements
|
|
with the chosen autonomy level
|
|
- `droid exec` is read-only by default and raises permissions with
|
|
`--auto low|medium|high`
|
|
- custom droids/subagents can have their own model/tool/autonomy policies
|
|
- skills are reusable capabilities that can be user-invoked or invoked by the
|
|
Droid when relevant
|
|
|
|
This validates SF's split between work mode, run control, and permission
|
|
profile.
|
|
|
|
### Amp
|
|
|
|
Amp is useful for the agent-shape and skills model:
|
|
|
|
- modes are model/capability presets: `smart`, `rush`, `deep`
|
|
- skills live in `.agents/skills/` and user-level skill directories
|
|
- skill content is lazily loaded only when relevant
|
|
- skills can package instructions, scripts, resources, and tool/MCP config
|
|
- subagents can be spawned automatically for isolated work, but they have
|
|
isolated context and return only final summaries
|
|
- Oracle and Librarian are specialized helper agents for second opinion and
|
|
cross-repository research
|
|
|
|
Amp validates `.agents/skills/` as the preferred repo-local skill path.
|
|
|
|
## SF Model
|
|
|
|
SF should represent agent state as orthogonal axes, not one overloaded mode.
|
|
|
|
```text
|
|
workMode: chat | plan | build | review | repair | research
|
|
runControl: manual | assisted | autonomous
|
|
permissionProfile: restricted | normal | trusted | unrestricted
|
|
modelMode: fast | smart | deep
|
|
surface: tui | web | headless | rpc
|
|
```
|
|
|
|
Note: `repair` is a `workMode`, not a separate subsystem. The `/doctor` command is the diagnostic engine; `/repair` switches `workMode` to `repair`.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
plan | manual | normal | deep
|
|
build | autonomous | trusted | smart
|
|
repair | assisted | normal | smart
|
|
research | autonomous | restricted | deep
|
|
review | manual | restricted | deep
|
|
```
|
|
|
|
Definitions:
|
|
|
|
- `workMode` describes what kind of work SF is doing.
|
|
- `runControl` describes who advances the loop.
|
|
- `permissionProfile` describes what tool/file/network actions may proceed
|
|
without approval.
|
|
- `modelMode` describes speed/cost/reasoning posture.
|
|
- `surface` describes how the user or automation is connected.
|
|
|
|
`autonomous` is not the whole mode. It is a run-control value.
|
|
|
|
## Work Modes
|
|
|
|
### `chat`
|
|
|
|
Default conversational mode for questions, explanations, and low-commitment
|
|
exploration.
|
|
|
|
### `plan`
|
|
|
|
Research, clarify, write/update specs, derive tasks, and produce an explicit
|
|
acceptance point before implementation.
|
|
|
|
### `build`
|
|
|
|
Implement, test, lint, typecheck, verify, and prepare commit-ready changes.
|
|
|
|
### `review`
|
|
|
|
Inspect diffs, tests, risks, regressions, security issues, and missing evidence.
|
|
|
|
### `repair`
|
|
|
|
Fix SF health, repo health, runtime drift, broken generated state, bad command
|
|
surfaces, failing workflow infrastructure, stale locks, and broken installed
|
|
runtime copies.
|
|
|
|
Doctor is not a permanent mode. Doctor is the diagnostic engine used by
|
|
`repair`.
|
|
|
|
### `research`
|
|
|
|
Longer-form codebase, competitor, design, API, or dependency research. This can
|
|
use web search, local code exploration, cross-repo research, and helper agents.
|
|
|
|
## Run Control
|
|
|
|
```text
|
|
manual user drives every step
|
|
assisted SF executes one unit, then pauses
|
|
autonomous SF continues until done, blocked, interrupted, budget-hit, or limit-hit
|
|
```
|
|
|
|
Transitions:
|
|
|
|
```text
|
|
/control manual
|
|
/control assisted
|
|
/control autonomous
|
|
/autonomous
|
|
/next
|
|
/pause
|
|
/stop
|
|
```
|
|
|
|
`/autonomous` is a direct command. Do not route through `/sf autonomous`.
|
|
|
|
## Permission Profiles
|
|
|
|
```text
|
|
restricted read-only and explicitly allowlisted actions
|
|
normal safe edits, non-destructive local commands
|
|
trusted build/test/install/local commits and bounded repo automation
|
|
unrestricted high-risk orchestration only in intentionally trusted environments
|
|
```
|
|
|
|
This is SF's equivalent of Droid autonomy levels and Copilot permission
|
|
expansion, but the names are SF-native and policy-oriented.
|
|
|
|
Rules:
|
|
|
|
- Permission profile never implies autonomous continuation.
|
|
- Autonomous continuation never implies broader permissions.
|
|
- Denylists and safety gates override permission profile.
|
|
- Risk decisions must be logged with the active work mode, run control, and
|
|
permission profile.
|
|
|
|
## Model Modes
|
|
|
|
```text
|
|
fast cheap/quick routing for small bounded tasks
|
|
smart default balanced routing
|
|
deep high-reasoning routing for planning, debugging, research, and review
|
|
```
|
|
|
|
This is SF's equivalent of Amp's `rush`, `smart`, and `deep`, but with names
|
|
that match SF's tone and routing layer.
|
|
|
|
`modelMode` should guide routing; it should not replace explicit model
|
|
selection.
|
|
|
|
## Mode Switching
|
|
|
|
Mode switching must be first-class and visible.
|
|
|
|
Direct commands:
|
|
|
|
```text
|
|
/mode chat
|
|
/mode plan
|
|
/mode build
|
|
/mode review
|
|
/mode repair
|
|
/mode research
|
|
/control manual
|
|
/control assisted
|
|
/control autonomous
|
|
/trust restricted
|
|
/trust normal
|
|
/trust trusted
|
|
/trust unrestricted
|
|
/model-mode fast
|
|
/model-mode smart
|
|
/model-mode deep
|
|
```
|
|
|
|
Combined forms:
|
|
|
|
```text
|
|
/mode repair --autonomous --trust normal
|
|
/mode build --autonomous --trust trusted
|
|
/mode research --autonomous --trust restricted --model-mode deep
|
|
```
|
|
|
|
Autonomous steering:
|
|
|
|
```text
|
|
/steer mode repair
|
|
/steer mode review after-current-unit
|
|
/steer trust restricted now
|
|
/steer model-mode deep for-next-unit
|
|
```
|
|
|
|
Transition scopes:
|
|
|
|
- `now`: apply before the next dispatch point if no tool is active
|
|
- `after-current-tool`: finish the active tool, then switch
|
|
- `after-current-unit`: finish the current SF unit, then switch
|
|
- `next-milestone`: switch after the current milestone completes
|
|
|
|
Autonomous mode changes should affect future decisions, not mutate an active
|
|
tool call midway through execution.
|
|
|
|
Every transition should be logged:
|
|
|
|
```json
|
|
{
|
|
"from": {"workMode": "build", "runControl": "autonomous"},
|
|
"to": {"workMode": "repair", "runControl": "autonomous"},
|
|
"reason": "pre-dispatch health gate failed",
|
|
"scope": "after-current-unit"
|
|
}
|
|
```
|
|
|
|
## Plan To Autonomous Handoff
|
|
|
|
The primary user journey should be:
|
|
|
|
```text
|
|
plan | manual | normal | deep
|
|
accept plan
|
|
build | autonomous | selected-permission-profile | smart
|
|
```
|
|
|
|
Required surfaces:
|
|
|
|
- TUI: plan acceptance prompt includes "run autonomously"
|
|
- Web: plan acceptance button includes "run autonomously"
|
|
- Headless: `--autonomous` chains into direct `/autonomous`
|
|
- RPC: machine event records the transition explicitly
|
|
|
|
This should not use `/sf`.
|
|
|
|
## Repair Work Mode
|
|
|
|
`repair` is a `workMode`, not a separate subsystem.
|
|
|
|
Commands:
|
|
|
|
```text
|
|
/doctor
|
|
/doctor fix
|
|
/doctor heal
|
|
/repair
|
|
/repair --autonomous
|
|
```
|
|
|
|
Semantics:
|
|
|
|
- `/doctor` inspects health and reports.
|
|
- `/doctor fix` applies deterministic repairs.
|
|
- `/doctor heal` uses an LLM-assisted diagnostic flow for deeper issues.
|
|
- `/repair` switches work mode to `repair`.
|
|
- `/repair --autonomous` keeps repairing until clean, blocked, or limit-hit.
|
|
|
|
Automatic transitions:
|
|
|
|
```text
|
|
build | autonomous | trusted | smart
|
|
-> repair | autonomous | normal | smart
|
|
```
|
|
|
|
This is allowed when health gates fail, installed runtime drift is detected, SF
|
|
cannot dispatch safely, or repo workflow state is corrupted.
|
|
|
|
## Skills
|
|
|
|
SF should use `.agents/skills/` for repo-local skills.
|
|
|
|
```text
|
|
.agents/skills/<skill-name>/
|
|
SKILL.md
|
|
scripts/
|
|
schemas/
|
|
checklists/
|
|
mcp.json
|
|
```
|
|
|
|
Skill behavior should match the best Factory/Amp pattern:
|
|
|
|
- skills are narrow reusable capabilities
|
|
- users can invoke a skill directly when it is user-invocable
|
|
- SF can lazily load a skill when relevant if model invocation is allowed
|
|
- supporting files live beside the skill
|
|
- dangerous skills are never model-invoked by default
|
|
- project skills are committed with the repo
|
|
- user skills live in user-level skill directories
|
|
|
|
Recommended frontmatter:
|
|
|
|
```yaml
|
|
---
|
|
name: forge-command-surface
|
|
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
|
|
user-invocable: true
|
|
model-invocable: true
|
|
side-effects: code-edits
|
|
permission-profile: normal
|
|
---
|
|
```
|
|
|
|
Dangerous workflow:
|
|
|
|
```yaml
|
|
---
|
|
name: production-deploy
|
|
description: Deploy production services after release gates pass.
|
|
user-invocable: true
|
|
model-invocable: false
|
|
side-effects: production-mutation
|
|
permission-profile: trusted
|
|
---
|
|
```
|
|
|
|
Background knowledge:
|
|
|
|
```yaml
|
|
---
|
|
name: forge-autonomous-runtime
|
|
description: Explains SF autonomous loop, UOK gates, installed-runtime drift, and recovery paths.
|
|
user-invocable: false
|
|
model-invocable: true
|
|
side-effects: none
|
|
permission-profile: restricted
|
|
---
|
|
```
|
|
|
|
## Automatic Skill Creation
|
|
|
|
SF should add repo-specific skills when it repeatedly rediscovers a useful
|
|
pattern.
|
|
|
|
Flow:
|
|
|
|
1. Detect repeated repo-specific evidence: same files, same commands, same
|
|
failure mode, same architectural rule, same verification path.
|
|
2. Propose a skill in manual/restricted contexts.
|
|
3. Generate or update a project skill automatically only when policy allows it.
|
|
4. Record source evidence in `.sf` state.
|
|
5. Keep the skill narrow and testable.
|
|
6. Commit the skill with the repo when accepted.
|
|
|
|
Examples for Forge:
|
|
|
|
- `forge-command-surface`
|
|
- `forge-web-mode`
|
|
- `forge-autonomous-runtime`
|
|
- `forge-release-verification`
|
|
- `forge-installed-runtime-drift`
|
|
|
|
Examples for DR:
|
|
|
|
- `dr-agent-windows`
|
|
- `dr-portal-ui-and-handlers`
|
|
- `dr-production-readiness`
|
|
- `dr-systematic-debugging`
|
|
|
|
This is the Hermes-agent direction: reusable operational knowledge becomes
|
|
repo-local skills plus `.sf` evidence, not scattered markdown.
|
|
|
|
## Background Work Surface
|
|
|
|
SF needs one coherent background work surface.
|
|
|
|
Direct command:
|
|
|
|
```text
|
|
/tasks
|
|
```
|
|
|
|
It should show:
|
|
|
|
- autonomous units (durable state: todo | in_progress | review | done | retrying | failed | cancelled)
|
|
- parallel workers
|
|
- scheduled autonomous dispatches
|
|
- background shell sessions
|
|
- stuck or resumable sessions
|
|
- remote questions waiting for answers
|
|
- current cost/budget state
|
|
- last checkpoint and next action
|
|
|
|
Task lifecycle uses ORCH-style states. `todo` means ready to run, not "queued."
|
|
|
|
This complements, not replaces:
|
|
|
|
- `/status`
|
|
- `/queue` (milestone dispatch order, not task state)
|
|
- `/parallel status`
|
|
- `/session-report`
|
|
- `/logs`
|
|
- `/forensics`
|
|
|
|
Copilot's `/tasks` and `/session` are less powerful internally, but clearer as
|
|
control surfaces. SF should keep its deeper state and expose it better.
|
|
|
|
## Actual Source Pass: Awesome CLI Agent Repos
|
|
|
|
Checked locally under `/tmp/sf-agent-research`:
|
|
|
|
- `bradAGI/awesome-cli-coding-agents`
|
|
- `plandex-ai/plandex`
|
|
- `leonardcser/smelt`
|
|
- `mikeyobrien/ralph-orchestrator`
|
|
- `subsy/ralph-tui`
|
|
- `oxgeneral/ORCH`
|
|
- `LucasDuys/forge`
|
|
- `ramarlina/agx`
|
|
- `youwangd/SageCLI`
|
|
- `jcast90/relay`
|
|
- `basilisk-labs/agentplane`
|
|
- `amaar-mc/wit`
|
|
- `fastxyz/skill-optimizer`
|
|
- `0xmariowu/AgentLint`
|
|
- `ZENG3LD/gate4agent`
|
|
|
|
`arosstale/pi-builder` was listed but the GitHub repository was not found when
|
|
cloned on 2026-05-08.
|
|
|
|
### Smelt
|
|
|
|
Smelt's source has four modes:
|
|
|
|
```text
|
|
normal -> plan -> apply -> yolo
|
|
```
|
|
|
|
It also has separate reasoning effort:
|
|
|
|
```text
|
|
off | low | medium | high | max
|
|
```
|
|
|
|
Useful:
|
|
|
|
- mode cycling is explicit and configurable
|
|
- permissions differ by mode
|
|
- read-only commands are allowed, writes usually ask, deny wins
|
|
- approval scopes are explicit: once, session, workspace
|
|
- workspace approvals persist under a workspace hash
|
|
|
|
Do not copy:
|
|
|
|
- `yolo` as a name
|
|
- putting work kind and trust level into one mode axis
|
|
|
|
SF should keep Smelt's visible cycling and approval scopes, but preserve SF's
|
|
separate axes: `workMode`, `runControl`, `permissionProfile`, and `modelMode`.
|
|
|
|
### ORCH
|
|
|
|
ORCH has the cleanest small task state machine:
|
|
|
|
```text
|
|
todo -> in_progress -> review -> done
|
|
\-> retrying -> in_progress
|
|
\-> failed
|
|
review -> todo
|
|
* -> cancelled
|
|
```
|
|
|
|
It also keeps runtime state separately:
|
|
|
|
- `running`
|
|
- `claimed`
|
|
- `retry_queue`
|
|
- total run/task/token/runtime stats
|
|
|
|
Useful for SF:
|
|
|
|
- `/tasks` should show both durable task status and ephemeral running state
|
|
- successful completion should pass through review, even when auto-approved
|
|
- dependency blockers should be computed, not implied from ordering
|
|
- retrying should be an explicit state, not hidden inside logs
|
|
|
|
### AgentPlane
|
|
|
|
AgentPlane's strongest idea is schema-first task artifacts. Task README
|
|
frontmatter includes:
|
|
|
|
- `risk_level`
|
|
- `status`
|
|
- `depends_on`
|
|
- `task_kind`
|
|
- `mutation_scope`
|
|
- `risk_flags`
|
|
- `blueprint_request`
|
|
- `verify`
|
|
- `plan_approval`
|
|
- `verification`
|
|
- `runner`
|
|
|
|
Its workflow file also makes operational policy explicit:
|
|
|
|
- workflow mode
|
|
- status commit policy
|
|
- workspace isolation
|
|
- retry policy
|
|
- scheduler concurrency
|
|
- required evaluator checks
|
|
- event log location
|
|
|
|
Useful for SF:
|
|
|
|
- task artifacts should have schema-backed frontmatter, not loose markdown
|
|
- plan approval and verification state deserve durable fields
|
|
- mutation scope and risk flags should feed `permissionProfile`
|
|
- workflow policy should be inspectable by `/status` and `/tasks`
|
|
|
|
### Relay
|
|
|
|
Relay's useful concepts:
|
|
|
|
- a channel is the workspace for one piece of work
|
|
- tickets are parallelizable units with dependency DAGs, retry budgets,
|
|
specialty tags, optional repo routing, and verification commands
|
|
- decisions are first-class durable records
|
|
- crosslink lets agents discover and message other sessions
|
|
- complexity tiers drive approval behavior
|
|
- CLI/TUI/GUI all read the same state
|
|
|
|
Useful for SF:
|
|
|
|
- keep decisions as first-class records, not buried in summaries
|
|
- remote steering should become full-session steering and cross-session
|
|
messaging, not only remote questions
|
|
- multi-repo work needs explicit repo routing on tasks
|
|
- one state store should power TUI, web, headless, and RPC
|
|
|
|
### Ralph
|
|
|
|
Ralph's hat system is useful as a coordination topology:
|
|
|
|
- hats declare triggers, publishes, instructions, backend overrides, max
|
|
activations, and disallowed tools
|
|
- events flow through a bus
|
|
- scope violations are detected when hats publish undeclared topics
|
|
- exhaustion emits explicit events
|
|
|
|
Useful for SF:
|
|
|
|
- specialized helpers should declare trigger/publish contracts
|
|
- helper activation should have max activation limits
|
|
- helper output should be checked against declared output topics
|
|
- mode transitions can be modeled as events, not ad hoc flags
|
|
|
|
### Sage
|
|
|
|
Sage's real value is runtime-neutral orchestration:
|
|
|
|
- agents are processes
|
|
- messages are files
|
|
- tasks are templates with frontmatter
|
|
- plans decompose into dependency waves
|
|
- tasks in a wave execute in parallel
|
|
- resume skips done tasks and resets stale running tasks
|
|
- runtime fallback is explicit
|
|
- bench-as-code compares actual agent CLIs on actual tasks
|
|
|
|
Useful for SF:
|
|
|
|
- `/tasks` should be file/DB-backed enough that headless tools can read it
|
|
without attaching to a live TUI
|
|
- dependency waves should be visible in planning output
|
|
- stale running work should be reset or surfaced clearly on resume
|
|
- model/provider benchmarking should use actual SF workflows, not isolated
|
|
model prompts
|
|
|
|
### AGX
|
|
|
|
AGX has useful low-level patterns:
|
|
|
|
- graph scheduler with hard, soft, failure, and always dependency conditions
|
|
- max concurrent work slots
|
|
- checkpoints with patch files and bounded history
|
|
- deterministic verify gate before LLM fallback
|
|
- repeated verification failure count that forces action
|
|
|
|
Useful for SF:
|
|
|
|
- dependency edges should support more than "depends on success"
|
|
- checkpoints should store patch references and bounded summaries
|
|
- deterministic verification should always run before semantic/LLM review
|
|
- repeated verify failures should force a mode transition to `repair` or
|
|
`review`, not keep retrying indefinitely
|
|
|
|
### Wit
|
|
|
|
Wit is the strongest coordination pattern for parallel edits:
|
|
|
|
- agents declare intent before editing
|
|
- agents acquire symbol-level locks
|
|
- conflicts are warnings, not always hard blocks
|
|
- contracts can be enforced by git hooks
|
|
- Tree-sitter provides symbol ranges and call edges
|
|
- a `coordinate` skill auto-loads when `.wit/` exists
|
|
|
|
Useful for SF:
|
|
|
|
- parallel SF workers should declare intent before editing
|
|
- conflict detection should eventually be symbol-aware, not only file-aware
|
|
- warnings can steer agents away from collisions without freezing work
|
|
- accepted interface contracts should be enforceable before commit
|
|
|
|
### skill-optimizer
|
|
|
|
Skill optimizer has the best pattern for making skills real:
|
|
|
|
- a case is a user-like task plus deterministic graders
|
|
- a suite is a case/model matrix
|
|
- references are copied into `/work`
|
|
- the agent sees only `/work`, not graders or hidden answers
|
|
- graders inspect files, artifacts, `answer.json`, `trace.jsonl`, and result
|
|
state
|
|
- failed trials preserve workspace for debugging
|
|
|
|
Useful for SF:
|
|
|
|
- auto-created skills need eval cases
|
|
- skill acceptance should be grader-backed, not vibes-backed
|
|
- negative cases should check that irrelevant skills were not loaded
|
|
- skill optimization should test across model modes/providers
|
|
|
|
### Plandex And Forge Loop
|
|
|
|
Plandex reinforces:
|
|
|
|
- chat/tell split
|
|
- configurable autonomy levels
|
|
- cumulative diff sandbox before applying changes
|
|
- model packs for planning vs execution
|
|
|
|
Forge Loop reinforces:
|
|
|
|
- R-numbered acceptance criteria
|
|
- task DAGs with tiered parallelism
|
|
- per-task worktrees
|
|
- per-task and session token budgets
|
|
- structural completion markers
|
|
- backpropagation from runtime failure to spec gap
|
|
- state on disk as the recovery source
|
|
|
|
SF already has many of these ideas. The part to tighten is the explicit product
|
|
surface: direct commands, visible modes, `/tasks`, schema-backed state, and
|
|
skill evals.
|
|
|
|
## Status And Mode Badge
|
|
|
|
The active state should always be visible, especially during full autonomy.
|
|
|
|
Recommended status line:
|
|
|
|
```text
|
|
SF build | autonomous | trusted | smart
|
|
```
|
|
|
|
Compact badge form:
|
|
|
|
```text
|
|
[B][A][T][S]
|
|
```
|
|
|
|
Preferred full labels in critical states:
|
|
|
|
```text
|
|
repair | autonomous | normal | smart
|
|
review | assisted | normal | deep
|
|
```
|
|
|
|
Do not use "autopilot" in SF UI. It may appear only as competitor context in
|
|
this research note.
|
|
|
|
## Implementation Pull-Through
|
|
|
|
Already directionally right:
|
|
|
|
- UOK lifecycle records carry `runControl`.
|
|
- UOK lifecycle records and execution-policy decisions carry
|
|
`permissionProfile`.
|
|
- Schedule command state uses `autonomous_dispatch`.
|
|
- SF has DB-backed state, recovery, verification, scheduling, captures,
|
|
forensics, projections, and self-reporting.
|
|
- SF has skills and project-specific skill paths.
|
|
- SF has parallel orchestration and remote-question infrastructure.
|
|
|
|
Still needed:
|
|
|
|
- ~~Remove `/sf` from docs/web/tests (Phase 2 deprecation)~~ ✓ Complete
|
|
|
|
Completed ✓ (RA.Aid Patterns — Phase 2):
|
|
|
|
- structured memory repositories (`memory-repository.js` — SQLite-backed key facts,
|
|
snippets, research notes, human inputs, work logs, decisions; content hash
|
|
deduplication; auto-summarization; prompt formatting; 11 tests pass)
|
|
- trajectory recording (`trajectory-recorder.js` — per-step tool/LLM/error
|
|
execution trace with costs, tokens, errors; session+unit scoped; exportable;
|
|
10 tests pass)
|
|
- trajectory command (`/trajectory` — step-by-step trace with `--all`, `--errors`,
|
|
`--tools`, `--llm`, `--limit=N` flags; wired into `commands/handlers/ops.js`)
|
|
- reasoning assist + memory integration (`reasoning-assist.js` loads key facts,
|
|
snippets, research notes from memory repository into pre-stage consultation prompt)
|
|
- compaction fix (`register-hooks.js` — never cancel compaction; provide custom
|
|
compaction summary with work state preservation instead)
|
|
|
|
Completed ✓ (Additional):
|
|
|
|
- schema-backed task/frontmatter fields (`task-frontmatter.js` — risk levels,
|
|
mutation scopes, verification types, plan approval states, task/scheduler
|
|
statuses; wired into `sf-db.js` `insertTaskSpecIfAbsent()`)
|
|
- subagent provider/model/permission inheritance audit
|
|
(`subagent-inheritance.js` — blocked providers, fast-mode heavy model blocking,
|
|
restricted destructive tool blocking; wired into `subagent/index.js`)
|
|
- remote steering as full-session steering surface (`remote-steering.js` —
|
|
parse/apply/format directives with 5s cooldown throttle)
|
|
- parallel worker intent/claim registry (`parallel-intent.js` — declareIntent,
|
|
checkIntentConflicts, releaseIntent, getActiveIntents with TTL)
|
|
- skill eval harness foundation (`skills/eval-harness.js` — createEvalCase,
|
|
runGrader with 30s timeout, runSkillEvals)
|
|
- terminal title mode indicator (`auto/session.js` — OSC escape sequence +
|
|
`process.title`, format: `SF[workMode|runControl|permissionProfile|modelMode]`)
|
|
- self-feedback → workMode auto-transition (`self-feedback-drain.js` —
|
|
high/critical feedback dispatches auto-switch to `repair` with reason
|
|
`"self-feedback-drain"`)
|
|
- UOK events carry workMode + modelMode (`uok/kernel.js` — lifecycleFlags include
|
|
both; audit envelope payload includes both)
|
|
- enhanced `/steer` with mode transitions (`/steer mode <m> [scope]`,
|
|
`/steer trust <p> [scope]`, `/steer model-mode <m> [scope]`)
|
|
- `/sf` prefix deprecation warning (Phase 1 — accept both forms, warn once per
|
|
session)
|
|
- centralized metrics system (`metrics-central.js` — Prometheus-compatible
|
|
Counter/Gauge/Histogram with session scoping, DB persistence, retry logic,
|
|
cost/token tracking; wired into subagent-inheritance + mode transitions)
|
|
- explicit stage commands (`/research`, `/plan`, `/implement` — set workMode and
|
|
dispatch corresponding phase)
|
|
- cost command (`/cost` — queries metrics-central DB + legacy ledger)
|
|
- reasoning assist foundation (`reasoning-assist.js` — pre-stage expert
|
|
consultation prompt builder, context loading, guidance injection; wired into
|
|
`auto/phases.js` dispatch path)
|
|
|
|
Completed ✓:
|
|
|
|
- make `workMode` durable state (SQLite session_mode_state table + AutoSession persistence)
|
|
- add direct mode/control/trust/model-mode commands
|
|
- make `--autonomous` chain into direct `/autonomous`
|
|
- add visible mode/status surface for TUI and web (header badge + /status)
|
|
- expose autonomous continuation limits in settings and status (mode badge shows runControl)
|
|
- add `/tasks` as the unified background work surface with durable task state,
|
|
ephemeral running state, retries, blockers, checkpoints, budget, and steering
|
|
- make `repair` a first-class workflow over doctor
|
|
- add policy-aware project skill suggestion/generation (auto-create flow)
|
|
- enhanced `/steer` with mode/trust/model-mode transitions
|
|
- TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P)
|
|
- minimal auto-mode header/footer (badge visible during autonomy)
|
|
- `/sf` namespace removed from command registration; direct command roots only
|
|
- parallel worker intent/claim registry (declareIntent, checkIntentConflicts, releaseIntent)
|
|
- skill eval harness foundation (createEvalCase, runGrader, runSkillEvals)
|
|
- terminal title mode indicator (tmux/terminal tab visibility)
|
|
|
|
## Direct Command Decision
|
|
|
|
SF is the system, not a plugin namespace.
|
|
|
|
Use:
|
|
|
|
```text
|
|
/status
|
|
/autonomous
|
|
/doctor
|
|
/rate
|
|
/session-report
|
|
/parallel
|
|
/remote
|
|
/tasks
|
|
```
|
|
|
|
`/sf` is not registered in the TUI or browser command surface.
|
|
|
|
Shell machine surface remains:
|
|
|
|
```text
|
|
sf headless autonomous
|
|
sf headless --autonomous ...
|
|
```
|
|
|
|
The target model is simple: direct commands for humans, headless commands for
|
|
machines, durable state for autonomous execution, and explicit axes for mode,
|
|
control, trust, model posture, and surface.
|
|
|
|
## Runtime Target: Node 26
|
|
|
|
SF treats Node 26.1+ as the runtime baseline. There is no compatibility path
|
|
for older Node versions in SF-owned runtime code.
|
|
|
|
Source notes checked 2026-05-08:
|
|
|
|
- Node 25 is a short-lived current line. It is useful as a compatibility probe,
|
|
but not a target.
|
|
- Node 26 is current now, LTS-bound, and useful for SF's own runtime model.
|
|
- Bun is closer to Node every release and supports many Node APIs plus
|
|
Node-API, but its compatibility target and partial API areas do not match
|
|
SF's risk surface yet.
|
|
- Deno supports Node/npm compatibility, package.json, local node_modules, and
|
|
Node-API addons with FFI permission, but that means SF would still be running
|
|
a Node-compatibility workload.
|
|
- LLRT is experimental and serverless-oriented, not a local CLI/runtime fit.
|
|
|
|
### Why Node 26 Makes SF Stronger
|
|
|
|
Node 26 is not just "newer Node." It gives SF a better platform for long-running
|
|
agent work:
|
|
|
|
- `Temporal` is enabled by default.
|
|
- V8 14.6 is the JavaScript engine baseline.
|
|
- Undici 8 is the HTTP/fetch baseline.
|
|
- Node 26 removes and deprecates more legacy APIs, so it hardens SF against old
|
|
loader, stream, HTTP, crypto, and dependency assumptions.
|
|
|
|
### Temporal Is More Than Better Dates
|
|
|
|
Temporal gives SF the vocabulary it already needs for durable autonomous work.
|
|
|
|
Important Temporal concepts:
|
|
|
|
- `Temporal.Instant`: an exact point in history. Use for journal events,
|
|
checkpoint timestamps, lock leases, provider call start/end, and trace order.
|
|
- `Temporal.ZonedDateTime`: an exact instant plus time zone and calendar. Use
|
|
for reminders, schedules, adoption reviews, audits, and "run this at local
|
|
business time" semantics.
|
|
- `Temporal.PlainDate`: a calendar date without time or time zone. Use for
|
|
daily reports, milestone review dates, and human-facing due dates.
|
|
- `Temporal.PlainTime`: a wall-clock time without date or zone. Use for
|
|
recurring "at 09:00" style policies.
|
|
- `Temporal.PlainDateTime`: a date and wall-clock time before binding it to a
|
|
zone. Use only when the zone is deliberately chosen later.
|
|
- `Temporal.Duration`: a typed amount of time. Use for budgets, leases,
|
|
cooldowns, retry delays, schedule offsets, and age checks.
|
|
|
|
That split matters because SF currently has many different meanings hidden
|
|
behind timestamps and strings:
|
|
|
|
- exact event ordering
|
|
- local user reminders
|
|
- project schedule dates
|
|
- lease expiry
|
|
- retry backoff
|
|
- adoption review windows
|
|
- elapsed runtime
|
|
- "next business day" style planning
|
|
|
|
`Date` collapses those into one weak type. Temporal lets SF store and validate
|
|
the real intent.
|
|
|
|
### SF Runtime Places That Should Use Temporal
|
|
|
|
Use Temporal first in the areas where wrong time semantics create real
|
|
operational mistakes:
|
|
|
|
- `sf schedule`: due dates, relative offsets, local-time reminders, audit
|
|
windows, and recurrence-ready storage.
|
|
- autonomous locks and leases: exact `Instant` plus typed `Duration`, not
|
|
implicit millisecond math scattered through code.
|
|
- journals and traces: exact event instants with stable ordering and explicit
|
|
serialization.
|
|
- session reports: elapsed durations and grouped daily summaries without local
|
|
timezone drift.
|
|
- adoption reviews and decision audits: calendar dates and wall-clock reminders
|
|
that survive DST and timezone changes.
|
|
- background work surface: task age, stale-running detection, retry-after, and
|
|
next-action time should be typed.
|
|
|
|
**Implementation Status:** `temporal-foundation.js` is a native-only Node 26
|
|
wrapper with safe constructors (`instantFromISO`, `durationFromObject`,
|
|
`plainDateFromISO`), serialization, deserialization, and validation. It throws
|
|
clearly when native Temporal is unavailable instead of using compatibility
|
|
shims.
|
|
|
|
### Temporal Design Rule For SF
|
|
|
|
Store the semantic type, not just the formatted string:
|
|
|
|
```text
|
|
event happened exactly now -> Instant
|
|
run at 09:00 in Europe/Oslo -> ZonedDateTime or PlainTime + timeZone
|
|
review on 2026-06-01 -> PlainDate
|
|
retry after 30 minutes -> Duration
|
|
lease expires at exact timestamp -> Instant
|
|
```
|
|
|
|
Serialization should stay explicit and boring:
|
|
|
|
- store ISO strings plus a field that says which Temporal type they represent
|
|
- include timezone when wall-clock semantics matter
|
|
- do not infer local timezone at read time unless the record explicitly asks
|
|
for it
|
|
- validate schedule and lease records at DB boundaries
|
|
|
|
### Node 26 Adoption Path
|
|
|
|
Target policy:
|
|
|
|
```text
|
|
current compatibility floor: Node 26.1+
|
|
internal target runtime: Node 26.1+
|
|
canonical baseline: Node 26.1+
|
|
Node 25: skip except quick probes
|
|
```
|
|
|
|
### Runtime Alternatives
|
|
|
|
Other JavaScript runtimes are useful comparators, but none should replace Node
|
|
as SF's primary runtime right now.
|
|
|
|
SF's current runtime shape is Node-native:
|
|
|
|
- npm workspaces and `package-lock.json`
|
|
- Next.js standalone web host
|
|
- Vitest and Node test-runner compatibility scripts
|
|
- Rust N-API `.node` addons
|
|
- `node-pty` native assets in the web host
|
|
- `node:` built-ins across CLI, scripts, packages, and web services
|
|
- child process, TTY, stream, module loader, and extension-loader behavior
|
|
- installed runtime sync into `~/.sf/agent`
|
|
|
|
#### Bun
|
|
|
|
Bun is the strongest speed and developer-experience competitor.
|
|
|
|
Useful:
|
|
|
|
- fast package install and script startup
|
|
- broad Node API compatibility
|
|
- built-in TypeScript, test runner, shell, SQLite, YAML, TOML, JSONL, and other
|
|
convenience APIs
|
|
- Node-API support is substantial enough to use as a compatibility probe
|
|
|
|
Not primary for SF:
|
|
|
|
- Bun's own docs say compatibility reflects Node v23, while SF is targeting
|
|
Node 26.
|
|
- Some core APIs are partial or behaviorally different: `child_process`, module
|
|
loader hooks, `node:v8`, `node:test`, `node:sqlite`, `worker_threads`, and
|
|
inspector/debugger areas are not exact Node.
|
|
- SF's highest-risk paths are exactly the places where "almost Node" can hurt:
|
|
TTY, child processes, native addons, Next standalone output, loaders, and
|
|
extension runtime.
|
|
|
|
Decision: use Bun only for optional speed probes or isolated tooling. Do not
|
|
make it the SF runtime until full `npm test`, web build, native build, smoke
|
|
tests, and installed extension runtime all pass under Bun without special
|
|
cases.
|
|
|
|
#### Deno
|
|
|
|
Deno has the best security and integrated-toolchain story.
|
|
|
|
Useful:
|
|
|
|
- explicit permissions model
|
|
- first-class TypeScript and web standards
|
|
- npm/package.json compatibility
|
|
- Node-API support when local `node_modules` and FFI permission are enabled
|
|
- good target for thinking about sandboxing and permission profiles
|
|
|
|
Not primary for SF:
|
|
|
|
- Deno still becomes a Node-compatibility mode for a repo like SF.
|
|
- Deno docs recommend local `node_modules` for frameworks like Next.js and for
|
|
Node-API addons, which means SF would keep most Node/npm complexity anyway.
|
|
- Native addons require local `node_modules` plus `--allow-ffi`.
|
|
- The value would be security posture and packaging experiments, not simpler
|
|
runtime execution.
|
|
|
|
Decision: study Deno for permission-profile design and maybe future packaged
|
|
headless workers. Do not switch the core SF runtime to Deno.
|
|
|
|
#### LLRT, WinterJS, Edge Runtimes
|
|
|
|
These are not fits for SF's primary runtime.
|
|
|
|
Useful:
|
|
|
|
- serverless cold-start research
|
|
- constrained worker/edge execution ideas
|
|
- tiny isolated helper tasks
|
|
|
|
Not primary for SF:
|
|
|
|
- SF is a long-running local CLI/runtime, not a small stateless Lambda handler.
|
|
- SF needs native addons, process control, TTY, filesystem state, git, shell,
|
|
Next web host, and Node-compatible package behavior.
|
|
- LLRT is explicitly experimental and evaluation-oriented.
|
|
|
|
Decision: ignore as primary runtime. Only revisit for isolated future worker
|
|
surfaces.
|
|
|
|
### Runtime Decision
|
|
|
|
Node 26 is the target because SF is a Node-native agent runtime, not a generic
|
|
JavaScript app.
|
|
|
|
Use alternatives this way:
|
|
|
|
```text
|
|
Node 26 -> primary runtime and baseline
|
|
Bun -> speed/compatibility probe, not runtime
|
|
Deno -> permission/sandbox design reference, not runtime
|
|
LLRT -> ignore except tiny serverless worker research
|
|
```
|
|
|
|
The rule is simple: if a runtime cannot run the exact SF stack without special
|
|
cases, it is not stronger for SF. Node 26 makes the existing SF stack stronger;
|
|
alternative runtimes mostly make a different stack.
|
|
|
|
Required Node 26 gate:
|
|
|
|
```text
|
|
node@26 --version
|
|
npm run lint
|
|
npm run typecheck:extensions
|
|
npm run build
|
|
npm test
|
|
sf --version
|
|
sf --help
|
|
sf --print "ping"
|
|
```
|
|
|
|
SF already requires Node 26.1+ in `engines.node`; the remaining work is to keep
|
|
the gates green under Node 26 and replace fragile `Date`/millisecond logic with
|
|
Temporal in the schedule, lease, journal, and background task surfaces.
|
|
|
|
---
|
|
|
|
## Appendix A: Related Source Files
|
|
|
|
This section maps the concepts in this document to actual code in the repo.
|
|
|
|
### A.1 Operating Model (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/operating-model.js`
|
|
|
|
Already exports canonical vocabulary:
|
|
|
|
```js
|
|
export const RUN_CONTROL_MODES = ["manual", "assisted", "autonomous"];
|
|
export const PERMISSION_PROFILES = ["restricted", "normal", "trusted", "unrestricted"];
|
|
```
|
|
|
|
Tests: `src/resources/extensions/sf/tests/operating-model.test.mjs`
|
|
|
|
**Gap:** No `workMode` or `modelMode` constants yet. Add to this file.
|
|
|
|
### A.2 Execution Policy (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/execution-policy.js`
|
|
|
|
Maps permission profiles to concrete tool restrictions:
|
|
|
|
```js
|
|
EXECUTION_POLICY_PROFILES = {
|
|
restricted: { filesystem: "read-mostly", network: "read-only", git: "read-only", mutation: "planning-artifacts-only" },
|
|
normal: { filesystem: "workspace-write", network: "allowed", git: "normal", mutation: "workspace" },
|
|
trusted: { filesystem: "workspace-write", network: "allowed", git: "normal", mutation: "workspace" },
|
|
unrestricted: { filesystem: "danger-full-access", network: "allowed", git: "dangerous", mutation: "host" }
|
|
};
|
|
```
|
|
|
|
**Status:** Wired to tool-call boundaries via `bootstrap/register-hooks.js` `tool_call` hook. `classifyExecutionPolicyCall()` reads `session.permissionProfile` to block destructive commands when `restricted`/`normal`. Enforcement is unified at the hook level.
|
|
|
|
### A.3 Auto Session State (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/auto/session.js`
|
|
|
|
`AutoSession` class holds:
|
|
- `active`, `paused`, `stepMode`, `canAskUser`
|
|
- `currentUnit`, `currentMilestoneId`
|
|
- `autoModeStartModel`, `currentUnitModel`
|
|
|
|
**Status:** `workMode`, `runControl`, `permissionProfile`, `modelMode`, `surface`, and `modeUpdatedAt` are all durable properties on `AutoSession`. Persisted to SQLite `session_mode_state` table on every transition. Loaded from DB on construction.
|
|
|
|
### A.4 Command Registration (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/commands/index.js`
|
|
|
|
Registers direct commands via `pi.registerCommand()`:
|
|
|
|
```js
|
|
for (const command of DIRECT_SF_COMMANDS) {
|
|
pi.registerCommand(command.cmd, { ... });
|
|
}
|
|
```
|
|
|
|
**File:** `src/resources/extensions/sf/commands/catalog.js`
|
|
|
|
Defines `TOP_LEVEL_SUBCOMMANDS` and `DIRECT_SF_COMMANDS`.
|
|
|
|
**Status:** Direct commands implemented (`/mode`, `/control`, `/trust`, `/model-mode`, `/repair`, `/tasks`, `/skills`). `/sf` is not registered; the shell executable remains `sf`.
|
|
|
|
### A.5 TUI Extension (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf-tui/index.js`
|
|
|
|
Registers shortcuts:
|
|
- `Ctrl+Alt+H` — prompt history
|
|
- `Ctrl+Shift+H` — prompt history fallback
|
|
- `Ctrl+Alt+M` — marketplace
|
|
|
|
**File:** `src/resources/extensions/sf-tui/header.js`
|
|
|
|
Renders header with project name, branch, model. No mode badge yet.
|
|
|
|
**File:** `src/resources/extensions/sf-tui/footer.js`
|
|
|
|
Renders footer with git status, cost, context usage. No mode badge yet.
|
|
|
|
**File:** `src/resources/extensions/sf-tui/extension-manifest.json`
|
|
|
|
Declares hooks: `session_start`, `session_switch`, `before_agent_start`, `tool_result`, `agent_start`, `agent_end`.
|
|
|
|
**Status:** Mode badge implemented in TUI header and footer. Compact `[B∞TS]` form at <80 cols, full `build · autonomous · trusted · smart` at ≥80 cols. Paused state dims all badge parts and shows `P!` (compact) or `paused ·` (full) prefix. `renderModeBadge` exported from header.js and shared with footer via `FOOTER_THEME` adapter. `getMode()` surfaces `session.paused` on the returned mode object.
|
|
|
|
### A.6 UOK Parity Report (Already Uses runControl)
|
|
|
|
**File:** `src/resources/extensions/sf/tests/uok-parity-report.test.mjs`
|
|
|
|
Tests verify `runControl` and `permissionProfile` in UOK events:
|
|
|
|
```js
|
|
assert.equal(events[0].runControl, "autonomous");
|
|
assert.equal(events[0].permissionProfile, "normal");
|
|
```
|
|
|
|
**Status:** `workMode` and `modelMode` added to AutoSession. Journal logging emits `mode-transition` events. UOK kernel includes both in `lifecycleFlags` and audit envelope payload.
|
|
|
|
### A.7 Routing History (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/routing-history.js`
|
|
|
|
Tracks model tier success/failure per task pattern.
|
|
|
|
**Status:** Connected. `modelModeToTier()` / `tierToModelMode()` bridge in `operating-model.js`. `classifyUnitComplexity()` signature includes `modelMode`. `deep` floors at `heavy`, `fast` caps at `light`.
|
|
|
|
### A.8 Doctor System (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/doctor.js`
|
|
**File:** `src/resources/extensions/sf/doctor-proactive.js`
|
|
**File:** `src/resources/extensions/sf/doctor-checks.js`
|
|
|
|
Health checks, auto-fix, proactive monitoring.
|
|
|
|
**Status:** `/repair` command switches to `repair` work mode and runs doctor fix. Auto-transitions to repair allowed when health gates fail.
|
|
|
|
### A.9 Self-Feedback (Already Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/self-feedback.js`
|
|
|
|
Records anomalies, blocking entries, version-bump resolution.
|
|
|
|
**Status:** Connected. `self-feedback-drain.js` auto-transitions to `repair` workMode when high/critical self-feedback is dispatched for inline-fix. Reason: `"self-feedback-drain"`.
|
|
|
|
### A.10 Skills (Partially Exists)
|
|
|
|
**File:** `src/resources/extensions/sf/skill-discovery.js`
|
|
**File:** `src/resources/extensions/sf/skill-health.js`
|
|
**File:** `src/resources/extensions/sf/skill-telemetry.js`
|
|
|
|
Skill loading, health monitoring, telemetry.
|
|
|
|
**Status:** `.agents/skills/` directory structure implemented with YAML frontmatter parser, validation, skill loader, and auto-creation flow. Auto-creation detects patterns from activity logs (≥3 occurrences) and generates skills with a SQLite-backed cooldown. Sample skills created: `forge-command-surface`, `forge-autonomous-runtime`.
|
|
|
|
---
|
|
|
|
## Appendix B: Implementation Priority
|
|
|
|
| Priority | Item | Files to Touch | Effort |
|
|
|----------|------|----------------|--------|
|
|
| P0 | Add `workMode` + `modelMode` to `operating-model.js` | `operating-model.js`, `operating-model.test.mjs` | Small ✓ |
|
|
| P0 | Add `workMode` to `AutoSession` | `auto/session.js`, `auto.js` | Small ✓ |
|
|
| P0 | Add mode badge to TUI header | `sf-tui/header.js`, `sf-tui/index.js` | Small ✓ |
|
|
| P0 | Add mode-switching shortcuts | `sf-tui/index.js`, `extension-manifest.json` | Small ✓ |
|
|
| P0 | Remove `/sf` namespace registration | `commands/catalog.js`, `commands/index.js` | Medium ✓ |
|
|
| P1 | Add `/mode`, `/control`, `/trust`, `/model-mode` commands | `commands/handlers/*.js`, `commands/catalog.js` | Medium ✓ |
|
|
| P1 | Wire `execution-policy.js` to tool boundaries | `execution-policy.js`, `bootstrap/register-hooks.js` | Medium ✓ |
|
|
| P1 | Add `/tasks` background work surface | `commands/handlers/tasks.js` | Medium ✓ |
|
|
| P1 | Make `repair` first-class work mode | `commands/handlers/ops.js`, `commands/handlers/core.js` | Medium ✓ |
|
|
| P2 | Add `.agents/skills/` structure | `skills/*.js`, `.agents/skills/` | Medium ✓ |
|
|
| P2 | Add skill YAML frontmatter parser | `skills/frontmatter.js` | Small ✓ |
|
|
| P2 | Add skill eval harness | `skills/eval-harness.js`, eval templates | Medium ✓ |
|
|
| P2 | Adopt Temporal in `sf schedule` | `temporal-foundation.js` | Medium ✓ |
|
|
| P2 | Node 26 baseline | `temporal-foundation.js` native Temporal wrapper | Medium ✓ |
|
|
|
|
---
|
|
|
|
## Appendix C: Open Questions — Resolved
|
|
|
|
1. **Paused badge** → `[P!BATS]` in compact form; `paused · build · assisted · normal · smart` in full form. All parts dim when paused. Implemented in `renderModeBadge`.
|
|
2. **Mode per-session or per-project?** → Per-session. Mode is a runtime posture for the current work, not a project-level config.
|
|
3. **Badge in tmux/terminal window titles?** → Terminal title already handled via OSC escape in `auto/session.js`. Tmux requires users to set `set-option -g set-titles on` — SF does not inject tmux config.
|
|
4. **Mode transitions with sound/notification?** → No. A terminal tool has no appropriate sound channel. The badge is the sole visibility mechanism.
|
|
5. **`repair` auto-transition: ask by default for new projects?** → No. Auto-transition is correct behavior for autonomous runs. Only if `runControl` is `manual` or `assisted` is the transition blocked.
|
|
6. **Skill eval cases: run in CI or on-demand?** → On-demand only. Gate with `SF_SKILL_EVALS=1` env var. CI is too slow and model-dependent for skill evals.
|
|
7. **`/tasks`: TUI overlay or separate scrollable panel?** → Inline output (current). A full panel requires `pi-tui` overlay support that is not yet built.
|
|
8. **`modelMode` replace or supplement tier system?** → Supplement via `modelModeToTier()` bridge. Direct model selection overrides `modelMode`; `modelMode` guides routing when no explicit model is set.
|