Mikael Hugo c6ee7701b2 autoresearch: auto-fix format + organizeImports

Result: {"status": "keep", "diagnostics": 11, "errors": 0, "warnings": 11}

2026-05-08 14:28:22 +02:00

41 KiB

Raw Permalink Blame History

Agent Mode And Skills Notes For SF

Sources checked 2026-05-08:

GitHub Docs, "Allowing GitHub Copilot CLI to work autonomously" https://docs.github.com/en/copilot/concepts/agents/copilot-cli/autopilot
GitHub Docs, "GitHub Copilot CLI command reference" https://docs.github.com/en/copilot/reference/copilot-cli-reference/cli-command-reference
GitHub Copilot CLI product page https://github.com/features/copilot/cli
GitHub Changelog, "GitHub Copilot CLI is now generally available" https://github.blog/changelog/2026-02-25-github-copilot-cli-is-now-generally-available/
GitHub Changelog, "Copilot CLI now supports BYOK and local models" https://github.blog/changelog/2026-04-07-copilot-cli-now-supports-byok-and-local-models/
Factory Droid, "Autonomy Level" https://docs.factory.ai/cli/user-guides/auto-run
Factory Droid, "CLI Reference" https://docs.factory.ai/reference/cli-reference
Factory Droid, "Skills" https://docs.factory.ai/cli/configuration/skills
Amp manual https://ampcode.com/manual
Amp, "Agent Skills" https://ampcode.com/news/agent-skills

Competitive Patterns

Copilot CLI

Copilot CLI has the cleanest public "continue work" story:

plan first, then accept the plan and continue without step-by-step approval
continuation is separate from permission expansion
question suppression is separate from continuation
a runaway continuation cap is explicit
/fleet parallelizes with subagents
/remote steers a running session from another device
/tasks exposes background work
/session exposes session info, checkpoints, files, plans, cleanup, and pruning
/skills, /plugin, /mcp, and /agent are visible control surfaces
BYOK, local-model, and offline provider configuration are first-class

Useful shape:

copilot --autopilot --yolo --max-autopilot-continues 10 -p "YOUR PROMPT HERE"

SF should copy the separation, not the names:

continuation is run control
permission expansion is permission profile
question behavior is an escalation policy
runaway caps are explicit autonomous limits

Factory Droid

Factory Droid makes the most important distinction explicit: Autonomy Level is separate from interaction mode.

interaction mode: Auto vs Spec Mode
autonomy level: Off, Low, Medium, High
execution surface: interactive droid vs headless droid exec
tools and commands carry risk levels
command allowlists and denylists layer on top of autonomy
Spec Mode plans first; after approval, Droid exits Spec Mode and implements with the chosen autonomy level
droid exec is read-only by default and raises permissions with --auto low|medium|high
custom droids/subagents can have their own model/tool/autonomy policies
skills are reusable capabilities that can be user-invoked or invoked by the Droid when relevant

This validates SF's split between work mode, run control, and permission profile.

Amp

Amp is useful for the agent-shape and skills model:

modes are model/capability presets: smart, rush, deep
skills live in .agents/skills/ and user-level skill directories
skill content is lazily loaded only when relevant
skills can package instructions, scripts, resources, and tool/MCP config
subagents can be spawned automatically for isolated work, but they have isolated context and return only final summaries
Oracle and Librarian are specialized helper agents for second opinion and cross-repository research

Amp validates .agents/skills/ as the preferred repo-local skill path.

SF Model

SF should represent agent state as orthogonal axes, not one overloaded mode.

workMode: chat | plan | build | review | repair | research
runControl: manual | assisted | autonomous
permissionProfile: restricted | normal | trusted | unrestricted
modelMode: fast | smart | deep
surface: tui | web | headless | rpc

Note: repair is a workMode, not a separate subsystem. The /doctor command is the diagnostic engine; /repair switches workMode to repair.

Examples:

plan     | manual     | normal     | deep
build    | autonomous | trusted    | smart
repair   | assisted   | normal     | smart
research | autonomous | restricted | deep
review   | manual     | restricted | deep

Definitions:

workMode describes what kind of work SF is doing.
runControl describes who advances the loop.
permissionProfile describes what tool/file/network actions may proceed without approval.
modelMode describes speed/cost/reasoning posture.
surface describes how the user or automation is connected.

autonomous is not the whole mode. It is a run-control value.

Work Modes

`chat`

Default conversational mode for questions, explanations, and low-commitment exploration.

`plan`

Research, clarify, write/update specs, derive tasks, and produce an explicit acceptance point before implementation.

`build`

Implement, test, lint, typecheck, verify, and prepare commit-ready changes.

`review`

Inspect diffs, tests, risks, regressions, security issues, and missing evidence.

`repair`

Fix SF health, repo health, runtime drift, broken generated state, bad command surfaces, failing workflow infrastructure, stale locks, and broken installed runtime copies.

Doctor is not a permanent mode. Doctor is the diagnostic engine used by repair.

`research`

Longer-form codebase, competitor, design, API, or dependency research. This can use web search, local code exploration, cross-repo research, and helper agents.

Run Control

manual     user drives every step
assisted   SF executes one unit, then pauses
autonomous SF continues until done, blocked, interrupted, budget-hit, or limit-hit

Transitions:

/control manual
/control assisted
/control autonomous
/autonomous
/next
/pause
/stop

/autonomous is a direct command. Do not route through /sf autonomous.

Permission Profiles

restricted   read-only and explicitly allowlisted actions
normal       safe edits, non-destructive local commands
trusted      build/test/install/local commits and bounded repo automation
unrestricted high-risk orchestration only in intentionally trusted environments

This is SF's equivalent of Droid autonomy levels and Copilot permission expansion, but the names are SF-native and policy-oriented.

Rules:

Permission profile never implies autonomous continuation.
Autonomous continuation never implies broader permissions.
Denylists and safety gates override permission profile.
Risk decisions must be logged with the active work mode, run control, and permission profile.

Model Modes

fast   cheap/quick routing for small bounded tasks
smart  default balanced routing
deep   high-reasoning routing for planning, debugging, research, and review

This is SF's equivalent of Amp's rush, smart, and deep, but with names that match SF's tone and routing layer.

modelMode should guide routing; it should not replace explicit model selection.

Mode Switching

Mode switching must be first-class and visible.

Direct commands:

/mode chat
/mode plan
/mode build
/mode review
/mode repair
/mode research
/control manual
/control assisted
/control autonomous
/trust restricted
/trust normal
/trust trusted
/trust unrestricted
/model-mode fast
/model-mode smart
/model-mode deep

Combined forms:

/mode repair --autonomous --trust normal
/mode build --autonomous --trust trusted
/mode research --autonomous --trust restricted --model-mode deep

Autonomous steering:

/steer mode repair
/steer mode review after-current-unit
/steer trust restricted now
/steer model-mode deep for-next-unit

Transition scopes:

now: apply before the next dispatch point if no tool is active
after-current-tool: finish the active tool, then switch
after-current-unit: finish the current SF unit, then switch
next-milestone: switch after the current milestone completes

Autonomous mode changes should affect future decisions, not mutate an active tool call midway through execution.

Every transition should be logged:

{
  "from": {"workMode": "build", "runControl": "autonomous"},
  "to": {"workMode": "repair", "runControl": "autonomous"},
  "reason": "pre-dispatch health gate failed",
  "scope": "after-current-unit"
}

Plan To Autonomous Handoff

The primary user journey should be:

plan | manual | normal | deep
accept plan
build | autonomous | selected-permission-profile | smart

Required surfaces:

TUI: plan acceptance prompt includes "run autonomously"
Web: plan acceptance button includes "run autonomously"
Headless: --autonomous chains into direct /autonomous
RPC: machine event records the transition explicitly

This should not use /sf.

Repair Work Mode

repair is a workMode, not a separate subsystem.

Commands:

/doctor
/doctor fix
/doctor heal
/repair
/repair --autonomous

Semantics:

/doctor inspects health and reports.
/doctor fix applies deterministic repairs.
/doctor heal uses an LLM-assisted diagnostic flow for deeper issues.
/repair switches work mode to repair.
/repair --autonomous keeps repairing until clean, blocked, or limit-hit.

Automatic transitions:

build | autonomous | trusted | smart
-> repair | autonomous | normal | smart

This is allowed when health gates fail, installed runtime drift is detected, SF cannot dispatch safely, or repo workflow state is corrupted.

Skills

SF should use .agents/skills/ for repo-local skills.

.agents/skills/<skill-name>/
  SKILL.md
  scripts/
  schemas/
  checklists/
  mcp.json

Skill behavior should match the best Factory/Amp pattern:

skills are narrow reusable capabilities
users can invoke a skill directly when it is user-invocable
SF can lazily load a skill when relevant if model invocation is allowed
supporting files live beside the skill
dangerous skills are never model-invoked by default
project skills are committed with the repo
user skills live in user-level skill directories

Recommended frontmatter:

---
name: forge-command-surface
description: Use when changing SF slash commands, browser command parity, or headless command dispatch.
user-invocable: true
model-invocable: true
side-effects: code-edits
permission-profile: normal
---

Dangerous workflow:

---
name: production-deploy
description: Deploy production services after release gates pass.
user-invocable: true
model-invocable: false
side-effects: production-mutation
permission-profile: trusted
---

Background knowledge:

---
name: forge-autonomous-runtime
description: Explains SF autonomous loop, UOK gates, installed-runtime drift, and recovery paths.
user-invocable: false
model-invocable: true
side-effects: none
permission-profile: restricted
---

Automatic Skill Creation

SF should add repo-specific skills when it repeatedly rediscovers a useful pattern.

Flow:

Detect repeated repo-specific evidence: same files, same commands, same failure mode, same architectural rule, same verification path.
Propose a skill in manual/restricted contexts.
Generate or update a project skill automatically only when policy allows it.
Record source evidence in .sf state.
Keep the skill narrow and testable.
Commit the skill with the repo when accepted.

Examples for Forge:

forge-command-surface
forge-web-mode
forge-autonomous-runtime
forge-release-verification
forge-installed-runtime-drift

Examples for DR:

dr-agent-windows
dr-portal-ui-and-handlers
dr-production-readiness
dr-systematic-debugging

This is the Hermes-agent direction: reusable operational knowledge becomes repo-local skills plus .sf evidence, not scattered markdown.

Background Work Surface

SF needs one coherent background work surface.

Direct command:

/tasks

It should show:

autonomous units (durable state: todo | in_progress | review | done | retrying | failed | cancelled)
parallel workers
scheduled autonomous dispatches
background shell sessions
stuck or resumable sessions
remote questions waiting for answers
current cost/budget state
last checkpoint and next action

Task lifecycle uses ORCH-style states. todo means ready to run, not "queued."

This complements, not replaces:

/status
/queue (milestone dispatch order, not task state)
/parallel status
/session-report
/logs
/forensics

Copilot's /tasks and /session are less powerful internally, but clearer as control surfaces. SF should keep its deeper state and expose it better.

Actual Source Pass: Awesome CLI Agent Repos

Checked locally under /tmp/sf-agent-research:

bradAGI/awesome-cli-coding-agents
plandex-ai/plandex
leonardcser/smelt
mikeyobrien/ralph-orchestrator
subsy/ralph-tui
oxgeneral/ORCH
LucasDuys/forge
ramarlina/agx
youwangd/SageCLI
jcast90/relay
basilisk-labs/agentplane
amaar-mc/wit
fastxyz/skill-optimizer
0xmariowu/AgentLint
ZENG3LD/gate4agent

arosstale/pi-builder was listed but the GitHub repository was not found when cloned on 2026-05-08.

Smelt

Smelt's source has four modes:

normal -> plan -> apply -> yolo

It also has separate reasoning effort:

off | low | medium | high | max

Useful:

mode cycling is explicit and configurable
permissions differ by mode
read-only commands are allowed, writes usually ask, deny wins
approval scopes are explicit: once, session, workspace
workspace approvals persist under a workspace hash

Do not copy:

yolo as a name
putting work kind and trust level into one mode axis

SF should keep Smelt's visible cycling and approval scopes, but preserve SF's separate axes: workMode, runControl, permissionProfile, and modelMode.

ORCH

ORCH has the cleanest small task state machine:

todo -> in_progress -> review -> done
                  \-> retrying -> in_progress
                  \-> failed
review -> todo
* -> cancelled

It also keeps runtime state separately:

running
claimed
retry_queue
total run/task/token/runtime stats

Useful for SF:

/tasks should show both durable task status and ephemeral running state
successful completion should pass through review, even when auto-approved
dependency blockers should be computed, not implied from ordering
retrying should be an explicit state, not hidden inside logs

AgentPlane

AgentPlane's strongest idea is schema-first task artifacts. Task README frontmatter includes:

risk_level
status
depends_on
task_kind
mutation_scope
risk_flags
blueprint_request
verify
plan_approval
verification
runner

Its workflow file also makes operational policy explicit:

workflow mode
status commit policy
workspace isolation
retry policy
scheduler concurrency
required evaluator checks
event log location

Useful for SF:

task artifacts should have schema-backed frontmatter, not loose markdown
plan approval and verification state deserve durable fields
mutation scope and risk flags should feed permissionProfile
workflow policy should be inspectable by /status and /tasks

Relay

Relay's useful concepts:

a channel is the workspace for one piece of work
tickets are parallelizable units with dependency DAGs, retry budgets, specialty tags, optional repo routing, and verification commands
decisions are first-class durable records
crosslink lets agents discover and message other sessions
complexity tiers drive approval behavior
CLI/TUI/GUI all read the same state

Useful for SF:

keep decisions as first-class records, not buried in summaries
remote steering should become full-session steering and cross-session messaging, not only remote questions
multi-repo work needs explicit repo routing on tasks
one state store should power TUI, web, headless, and RPC

Ralph

Ralph's hat system is useful as a coordination topology:

hats declare triggers, publishes, instructions, backend overrides, max activations, and disallowed tools
events flow through a bus
scope violations are detected when hats publish undeclared topics
exhaustion emits explicit events

Useful for SF:

specialized helpers should declare trigger/publish contracts
helper activation should have max activation limits
helper output should be checked against declared output topics
mode transitions can be modeled as events, not ad hoc flags

Sage

Sage's real value is runtime-neutral orchestration:

agents are processes
messages are files
tasks are templates with frontmatter
plans decompose into dependency waves
tasks in a wave execute in parallel
resume skips done tasks and resets stale running tasks
runtime fallback is explicit
bench-as-code compares actual agent CLIs on actual tasks

Useful for SF:

/tasks should be file/DB-backed enough that headless tools can read it without attaching to a live TUI
dependency waves should be visible in planning output
stale running work should be reset or surfaced clearly on resume
model/provider benchmarking should use actual SF workflows, not isolated model prompts

AGX

AGX has useful low-level patterns:

graph scheduler with hard, soft, failure, and always dependency conditions
max concurrent work slots
checkpoints with patch files and bounded history
deterministic verify gate before LLM fallback
repeated verification failure count that forces action

Useful for SF:

dependency edges should support more than "depends on success"
checkpoints should store patch references and bounded summaries
deterministic verification should always run before semantic/LLM review
repeated verify failures should force a mode transition to repair or review, not keep retrying indefinitely

Wit

Wit is the strongest coordination pattern for parallel edits:

agents declare intent before editing
agents acquire symbol-level locks
conflicts are warnings, not always hard blocks
contracts can be enforced by git hooks
Tree-sitter provides symbol ranges and call edges
a coordinate skill auto-loads when .wit/ exists

Useful for SF:

parallel SF workers should declare intent before editing
conflict detection should eventually be symbol-aware, not only file-aware
warnings can steer agents away from collisions without freezing work
accepted interface contracts should be enforceable before commit

skill-optimizer

Skill optimizer has the best pattern for making skills real:

a case is a user-like task plus deterministic graders
a suite is a case/model matrix
references are copied into /work
the agent sees only /work, not graders or hidden answers
graders inspect files, artifacts, answer.json, trace.jsonl, and result state
failed trials preserve workspace for debugging

Useful for SF:

auto-created skills need eval cases
skill acceptance should be grader-backed, not vibes-backed
negative cases should check that irrelevant skills were not loaded
skill optimization should test across model modes/providers

Plandex And Forge Loop

Plandex reinforces:

chat/tell split
configurable autonomy levels
cumulative diff sandbox before applying changes
model packs for planning vs execution

Forge Loop reinforces:

R-numbered acceptance criteria
task DAGs with tiered parallelism
per-task worktrees
per-task and session token budgets
structural completion markers
backpropagation from runtime failure to spec gap
state on disk as the recovery source

SF already has many of these ideas. The part to tighten is the explicit product surface: direct commands, visible modes, /tasks, schema-backed state, and skill evals.

Status And Mode Badge

The active state should always be visible, especially during full autonomy.

Recommended status line:

SF  build | autonomous | trusted | smart

Compact badge form:

[B][A][T][S]

Preferred full labels in critical states:

repair | autonomous | normal | smart
review | assisted | normal | deep

Do not use "autopilot" in SF UI. It may appear only as competitor context in this research note.

Implementation Pull-Through

Already directionally right:

UOK lifecycle records carry runControl.
UOK lifecycle records and execution-policy decisions carry permissionProfile.
Schedule command state uses autonomous_dispatch.
SF has DB-backed state, recovery, verification, scheduling, captures, forensics, projections, and self-reporting.
SF has skills and project-specific skill paths.
SF has parallel orchestration and remote-question infrastructure.

Still needed:

Remove /sf from docs/web/tests (Phase 2 deprecation)

Completed ✓ (RA.Aid Patterns — Phase 2):

structured memory repositories (memory-repository.js — SQLite-backed key facts, snippets, research notes, human inputs, work logs, decisions; content hash deduplication; auto-summarization; prompt formatting; 11 tests pass)
trajectory recording (trajectory-recorder.js — per-step tool/LLM/error execution trace with costs, tokens, errors; session+unit scoped; exportable; 10 tests pass)
trajectory command (/trajectory — step-by-step trace with --all, --errors, --tools, --llm, --limit=N flags; wired into commands/handlers/ops.js)
reasoning assist + memory integration (reasoning-assist.js loads key facts, snippets, research notes from memory repository into pre-stage consultation prompt)
compaction fix (register-hooks.js — never cancel compaction; provide custom compaction summary with work state preservation instead)

Completed ✓ (Additional):

schema-backed task/frontmatter fields (task-frontmatter.js — risk levels, mutation scopes, verification types, plan approval states, task/scheduler statuses; wired into sf-db.js insertTaskSpecIfAbsent())
subagent provider/model/permission inheritance audit (subagent-inheritance.js — blocked providers, fast-mode heavy model blocking, restricted destructive tool blocking; wired into subagent/index.js)
remote steering as full-session steering surface (remote-steering.js — parse/apply/format directives with 5s cooldown throttle)
parallel worker intent/claim registry (parallel-intent.js — declareIntent, checkIntentConflicts, releaseIntent, getActiveIntents with TTL)
skill eval harness foundation (skills/eval-harness.js — createEvalCase, runGrader with 30s timeout, runSkillEvals)
terminal title mode indicator (auto/session.js — OSC escape sequence + process.title, format: SF[workMode|runControl|permissionProfile|modelMode])
self-feedback → workMode auto-transition (self-feedback-drain.js — high/critical feedback dispatches auto-switch to repair with reason "self-feedback-drain")
UOK events carry workMode + modelMode (uok/kernel.js — lifecycleFlags include both; audit envelope payload includes both)
enhanced /steer with mode transitions (/steer mode <m> [scope], /steer trust <p> [scope], /steer model-mode <m> [scope])
/sf prefix deprecation warning (Phase 1 — accept both forms, warn once per session)
centralized metrics system (metrics-central.js — Prometheus-compatible Counter/Gauge/Histogram with session scoping, DB persistence, retry logic, cost/token tracking; wired into subagent-inheritance + mode transitions)
explicit stage commands (/research, /plan, /implement — set workMode and dispatch corresponding phase)
cost command (/cost — queries metrics-central DB + legacy ledger)
reasoning assist foundation (reasoning-assist.js — pre-stage expert consultation prompt builder, context loading, guidance injection; wired into auto/phases.js dispatch path)

Completed ✓:

make workMode durable state (SQLite session_mode_state table + AutoSession persistence)
add direct mode/control/trust/model-mode commands
make --autonomous chain into direct /autonomous
add visible mode/status surface for TUI and web (header badge + /status)
expose autonomous continuation limits in settings and status (mode badge shows runControl)
add /tasks as the unified background work surface with durable task state, ephemeral running state, retries, blockers, checkpoints, budget, and steering
make repair a first-class workflow over doctor
add policy-aware project skill suggestion/generation (auto-create flow)
enhanced /steer with mode/trust/model-mode transitions
TUI keyboard shortcuts for mode cycling (Ctrl+Shift+M/R/A/S/P)
minimal auto-mode header/footer (badge visible during autonomy)
/sf namespace removed from command registration; direct command roots only
parallel worker intent/claim registry (declareIntent, checkIntentConflicts, releaseIntent)
skill eval harness foundation (createEvalCase, runGrader, runSkillEvals)
terminal title mode indicator (tmux/terminal tab visibility)

Direct Command Decision

SF is the system, not a plugin namespace.

Use:

/status
/autonomous
/doctor
/rate
/session-report
/parallel
/remote
/tasks

/sf is not registered in the TUI or browser command surface.

Shell machine surface remains:

sf headless autonomous
sf headless --autonomous ...

The target model is simple: direct commands for humans, headless commands for machines, durable state for autonomous execution, and explicit axes for mode, control, trust, model posture, and surface.

Runtime Target: Node 26

SF treats Node 26.1+ as the runtime baseline. There is no compatibility path for older Node versions in SF-owned runtime code.

Source notes checked 2026-05-08:

Node 25 is a short-lived current line. It is useful as a compatibility probe, but not a target.
Node 26 is current now, LTS-bound, and useful for SF's own runtime model.
Bun is closer to Node every release and supports many Node APIs plus Node-API, but its compatibility target and partial API areas do not match SF's risk surface yet.
Deno supports Node/npm compatibility, package.json, local node_modules, and Node-API addons with FFI permission, but that means SF would still be running a Node-compatibility workload.
LLRT is experimental and serverless-oriented, not a local CLI/runtime fit.

Why Node 26 Makes SF Stronger

Node 26 is not just "newer Node." It gives SF a better platform for long-running agent work:

Temporal is enabled by default.
V8 14.6 is the JavaScript engine baseline.
Undici 8 is the HTTP/fetch baseline.
Node 26 removes and deprecates more legacy APIs, so it hardens SF against old loader, stream, HTTP, crypto, and dependency assumptions.

Temporal Is More Than Better Dates

Temporal gives SF the vocabulary it already needs for durable autonomous work.

Important Temporal concepts:

Temporal.Instant: an exact point in history. Use for journal events, checkpoint timestamps, lock leases, provider call start/end, and trace order.
Temporal.ZonedDateTime: an exact instant plus time zone and calendar. Use for reminders, schedules, adoption reviews, audits, and "run this at local business time" semantics.
Temporal.PlainDate: a calendar date without time or time zone. Use for daily reports, milestone review dates, and human-facing due dates.
Temporal.PlainTime: a wall-clock time without date or zone. Use for recurring "at 09:00" style policies.
Temporal.PlainDateTime: a date and wall-clock time before binding it to a zone. Use only when the zone is deliberately chosen later.
Temporal.Duration: a typed amount of time. Use for budgets, leases, cooldowns, retry delays, schedule offsets, and age checks.

That split matters because SF currently has many different meanings hidden behind timestamps and strings:

exact event ordering
local user reminders
project schedule dates
lease expiry
retry backoff
adoption review windows
elapsed runtime
"next business day" style planning

Date collapses those into one weak type. Temporal lets SF store and validate the real intent.

SF Runtime Places That Should Use Temporal

Use Temporal first in the areas where wrong time semantics create real operational mistakes:

sf schedule: due dates, relative offsets, local-time reminders, audit windows, and recurrence-ready storage.
autonomous locks and leases: exact Instant plus typed Duration, not implicit millisecond math scattered through code.
journals and traces: exact event instants with stable ordering and explicit serialization.
session reports: elapsed durations and grouped daily summaries without local timezone drift.
adoption reviews and decision audits: calendar dates and wall-clock reminders that survive DST and timezone changes.
background work surface: task age, stale-running detection, retry-after, and next-action time should be typed.

Implementation Status: temporal-foundation.js is a native-only Node 26 wrapper with safe constructors (instantFromISO, durationFromObject, plainDateFromISO), serialization, deserialization, and validation. It throws clearly when native Temporal is unavailable instead of using compatibility shims.

Temporal Design Rule For SF

Store the semantic type, not just the formatted string:

event happened exactly now       -> Instant
run at 09:00 in Europe/Oslo      -> ZonedDateTime or PlainTime + timeZone
review on 2026-06-01             -> PlainDate
retry after 30 minutes           -> Duration
lease expires at exact timestamp -> Instant

Serialization should stay explicit and boring:

store ISO strings plus a field that says which Temporal type they represent
include timezone when wall-clock semantics matter
do not infer local timezone at read time unless the record explicitly asks for it
validate schedule and lease records at DB boundaries

Node 26 Adoption Path

Target policy:

current compatibility floor: Node 26.1+
internal target runtime:     Node 26.1+
canonical baseline:          Node 26.1+
Node 25:                     skip except quick probes

Runtime Alternatives

Other JavaScript runtimes are useful comparators, but none should replace Node as SF's primary runtime right now.

SF's current runtime shape is Node-native:

npm workspaces and package-lock.json
Next.js standalone web host
Vitest and Node test-runner compatibility scripts
Rust N-API .node addons
node-pty native assets in the web host
node: built-ins across CLI, scripts, packages, and web services
child process, TTY, stream, module loader, and extension-loader behavior
installed runtime sync into ~/.sf/agent

Bun

Bun is the strongest speed and developer-experience competitor.

Useful:

fast package install and script startup
broad Node API compatibility
built-in TypeScript, test runner, shell, SQLite, YAML, TOML, JSONL, and other convenience APIs
Node-API support is substantial enough to use as a compatibility probe

Not primary for SF:

Bun's own docs say compatibility reflects Node v23, while SF is targeting Node 26.
Some core APIs are partial or behaviorally different: child_process, module loader hooks, node:v8, node:test, node:sqlite, worker_threads, and inspector/debugger areas are not exact Node.
SF's highest-risk paths are exactly the places where "almost Node" can hurt: TTY, child processes, native addons, Next standalone output, loaders, and extension runtime.

Decision: use Bun only for optional speed probes or isolated tooling. Do not make it the SF runtime until full npm test, web build, native build, smoke tests, and installed extension runtime all pass under Bun without special cases.

Deno

Deno has the best security and integrated-toolchain story.

Useful:

explicit permissions model
first-class TypeScript and web standards
npm/package.json compatibility
Node-API support when local node_modules and FFI permission are enabled
good target for thinking about sandboxing and permission profiles

Not primary for SF:

Deno still becomes a Node-compatibility mode for a repo like SF.
Deno docs recommend local node_modules for frameworks like Next.js and for Node-API addons, which means SF would keep most Node/npm complexity anyway.
Native addons require local node_modules plus --allow-ffi.
The value would be security posture and packaging experiments, not simpler runtime execution.

Decision: study Deno for permission-profile design and maybe future packaged headless workers. Do not switch the core SF runtime to Deno.

LLRT, WinterJS, Edge Runtimes

These are not fits for SF's primary runtime.

Useful:

serverless cold-start research
constrained worker/edge execution ideas
tiny isolated helper tasks

Not primary for SF:

SF is a long-running local CLI/runtime, not a small stateless Lambda handler.
SF needs native addons, process control, TTY, filesystem state, git, shell, Next web host, and Node-compatible package behavior.
LLRT is explicitly experimental and evaluation-oriented.

Decision: ignore as primary runtime. Only revisit for isolated future worker surfaces.

Runtime Decision

Node 26 is the target because SF is a Node-native agent runtime, not a generic JavaScript app.

Use alternatives this way:

Node 26 -> primary runtime and baseline
Bun     -> speed/compatibility probe, not runtime
Deno    -> permission/sandbox design reference, not runtime
LLRT    -> ignore except tiny serverless worker research

The rule is simple: if a runtime cannot run the exact SF stack without special cases, it is not stronger for SF. Node 26 makes the existing SF stack stronger; alternative runtimes mostly make a different stack.

Required Node 26 gate:

node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"

SF already requires Node 26.1+ in engines.node; the remaining work is to keep the gates green under Node 26 and replace fragile Date/millisecond logic with Temporal in the schedule, lease, journal, and background task surfaces.

This section maps the concepts in this document to actual code in the repo.

A.1 Operating Model (Already Exists)

File: src/resources/extensions/sf/operating-model.js

Already exports canonical vocabulary:

export const RUN_CONTROL_MODES = ["manual", "assisted", "autonomous"];
export const PERMISSION_PROFILES = ["restricted", "normal", "trusted", "unrestricted"];

Tests: src/resources/extensions/sf/tests/operating-model.test.mjs

Gap: No workMode or modelMode constants yet. Add to this file.

A.2 Execution Policy (Already Exists)

File: src/resources/extensions/sf/execution-policy.js

Maps permission profiles to concrete tool restrictions:

EXECUTION_POLICY_PROFILES = {
  restricted: { filesystem: "read-mostly", network: "read-only", git: "read-only", mutation: "planning-artifacts-only" },
  normal:     { filesystem: "workspace-write", network: "allowed", git: "normal", mutation: "workspace" },
  trusted:    { filesystem: "workspace-write", network: "allowed", git: "normal", mutation: "workspace" },
  unrestricted: { filesystem: "danger-full-access", network: "allowed", git: "dangerous", mutation: "host" }
};

Status: Wired to tool-call boundaries via bootstrap/register-hooks.js tool_call hook. classifyExecutionPolicyCall() reads session.permissionProfile to block destructive commands when restricted/normal. Enforcement is unified at the hook level.

A.3 Auto Session State (Already Exists)

File: src/resources/extensions/sf/auto/session.js

AutoSession class holds:

active, paused, stepMode, canAskUser
currentUnit, currentMilestoneId
autoModeStartModel, currentUnitModel

Status: workMode, runControl, permissionProfile, modelMode, surface, and modeUpdatedAt are all durable properties on AutoSession. Persisted to SQLite session_mode_state table on every transition. Loaded from DB on construction.

A.4 Command Registration (Already Exists)

File: src/resources/extensions/sf/commands/index.js

Registers direct commands via pi.registerCommand():

for (const command of DIRECT_SF_COMMANDS) {
  pi.registerCommand(command.cmd, { ... });
}

File: src/resources/extensions/sf/commands/catalog.js

Defines TOP_LEVEL_SUBCOMMANDS and DIRECT_SF_COMMANDS.

Status: Direct commands implemented (/mode, /control, /trust, /model-mode, /repair, /tasks, /skills). /sf is not registered; the shell executable remains sf.

A.5 TUI Extension (Already Exists)

File: src/resources/extensions/sf-tui/index.js

Registers shortcuts:

Ctrl+Alt+H — prompt history
Ctrl+Shift+H — prompt history fallback
Ctrl+Alt+M — marketplace

File: src/resources/extensions/sf-tui/header.js

Renders header with project name, branch, model. No mode badge yet.

File: src/resources/extensions/sf-tui/footer.js

Renders footer with git status, cost, context usage. No mode badge yet.

File: src/resources/extensions/sf-tui/extension-manifest.json

Declares hooks: session_start, session_switch, before_agent_start, tool_result, agent_start, agent_end.

Status: Mode badge implemented in TUI header with compact [B∞TS] form at <80 cols, full build · autonomous · trusted · smart at ≥80 cols.

A.6 UOK Parity Report (Already Uses runControl)

File: src/resources/extensions/sf/tests/uok-parity-report.test.mjs

Tests verify runControl and permissionProfile in UOK events:

assert.equal(events[0].runControl, "autonomous");
assert.equal(events[0].permissionProfile, "normal");

Status: workMode and modelMode added to AutoSession. Journal logging emits mode-transition events. UOK kernel includes both in lifecycleFlags and audit envelope payload.

A.7 Routing History (Already Exists)

File: src/resources/extensions/sf/routing-history.js

Tracks model tier success/failure per task pattern.

Status: Connected. modelModeToTier() / tierToModelMode() bridge in operating-model.js. classifyUnitComplexity() signature includes modelMode. deep floors at heavy, fast caps at light.

A.8 Doctor System (Already Exists)

File: src/resources/extensions/sf/doctor.js File: src/resources/extensions/sf/doctor-proactive.js File: src/resources/extensions/sf/doctor-checks.js

Health checks, auto-fix, proactive monitoring.

Status: /repair command switches to repair work mode and runs doctor fix. Auto-transitions to repair allowed when health gates fail.

A.9 Self-Feedback (Already Exists)

File: src/resources/extensions/sf/self-feedback.js

Records anomalies, blocking entries, version-bump resolution.

Status: Connected. self-feedback-drain.js auto-transitions to repair workMode when high/critical self-feedback is dispatched for inline-fix. Reason: "self-feedback-drain".

A.10 Skills (Partially Exists)

File: src/resources/extensions/sf/skill-discovery.js File: src/resources/extensions/sf/skill-health.js File: src/resources/extensions/sf/skill-telemetry.js

Skill loading, health monitoring, telemetry.

Status: .agents/skills/ directory structure implemented with YAML frontmatter parser, validation, skill loader, and auto-creation flow. Auto-creation detects patterns from activity logs (≥3 occurrences) and generates skills with a SQLite-backed cooldown. Sample skills created: forge-command-surface, forge-autonomous-runtime.

Appendix B: Implementation Priority

Priority	Item	Files to Touch	Effort
P0	Add `workMode` + `modelMode` to `operating-model.js`	`operating-model.js`, `operating-model.test.mjs`	Small ✓
P0	Add `workMode` to `AutoSession`	`auto/session.js`, `auto.js`	Small ✓
P0	Add mode badge to TUI header	`sf-tui/header.js`, `sf-tui/index.js`	Small ✓
P0	Add mode-switching shortcuts	`sf-tui/index.js`, `extension-manifest.json`	Small ✓
P0	Remove `/sf` namespace registration	`commands/catalog.js`, `commands/index.js`	Medium ✓
P1	Add `/mode`, `/control`, `/trust`, `/model-mode` commands	`commands/handlers/*.js`, `commands/catalog.js`	Medium ✓
P1	Wire `execution-policy.js` to tool boundaries	`execution-policy.js`, `bootstrap/register-hooks.js`	Medium ✓
P1	Add `/tasks` background work surface	`commands/handlers/tasks.js`	Medium ✓
P1	Make `repair` first-class work mode	`commands/handlers/ops.js`, `commands/handlers/core.js`	Medium ✓
P2	Add `.agents/skills/` structure	`skills/*.js`, `.agents/skills/`	Medium ✓
P2	Add skill YAML frontmatter parser	`skills/frontmatter.js`	Small ✓
P2	Add skill eval harness	`skills/eval-harness.js`, eval templates	Medium ✓
P2	Adopt Temporal in `sf schedule`	`temporal-foundation.js`	Medium ✓
P2	Node 26 baseline	`temporal-foundation.js` native Temporal wrapper	Medium ✓

Appendix C: Open Questions

Should paused autonomous show previous badge dimmed, or [P] for paused?
Should mode be per-session or per-project? (Current: per-session)
Should badge appear in tmux/terminal window titles?
Should mode transitions have sound/notification?
Should repair auto-transition be ask by default for new projects?
Should skill eval cases run in CI or only on-demand?
Should /tasks be a TUI overlay or a separate scrollable panel?
Should modelMode replace or supplement the existing tier system (light/standard/heavy)? (Current: modelMode supplements tiers via modelModeToTier() bridge)

41 KiB Raw Permalink Blame History

Agent Mode And Skills Notes For SF

Competitive Patterns

Copilot CLI

Factory Droid

Amp

SF Model

Work Modes

chat

plan

build

review

repair

research

Run Control

Permission Profiles

Model Modes

Mode Switching

Plan To Autonomous Handoff

Repair Work Mode

Skills

Automatic Skill Creation

Background Work Surface

Actual Source Pass: Awesome CLI Agent Repos

Smelt

ORCH

AgentPlane

Relay

Ralph

Sage

AGX

Wit

skill-optimizer

Plandex And Forge Loop

Status And Mode Badge

Implementation Pull-Through

Direct Command Decision

Runtime Target: Node 26

Why Node 26 Makes SF Stronger

Temporal Is More Than Better Dates

SF Runtime Places That Should Use Temporal

Temporal Design Rule For SF

Node 26 Adoption Path

Runtime Alternatives

Bun

Deno

LLRT, WinterJS, Edge Runtimes

Runtime Decision

Appendix A: Related Source Files

A.1 Operating Model (Already Exists)

A.2 Execution Policy (Already Exists)

A.3 Auto Session State (Already Exists)

A.4 Command Registration (Already Exists)

A.5 TUI Extension (Already Exists)

A.6 UOK Parity Report (Already Uses runControl)

A.7 Routing History (Already Exists)

A.8 Doctor System (Already Exists)

A.9 Self-Feedback (Already Exists)

A.10 Skills (Partially Exists)

Appendix B: Implementation Priority

Appendix C: Open Questions

41 KiB

Raw Permalink Blame History

`chat`

`plan`

`build`

`review`

`repair`

`research`