docs(sf): record node 26 runtime target

This commit is contained in:
Mikael Hugo 2026-05-08 01:56:55 +02:00
parent d640aa0949
commit 760564dbfb

View file

@ -455,6 +455,256 @@ This complements, not replaces:
Copilot's `/tasks` and `/session` are less powerful internally, but clearer as
control surfaces. SF should keep its deeper state and expose it better.
## Actual Source Pass: Awesome CLI Agent Repos
Checked locally under `/tmp/sf-agent-research`:
- `bradAGI/awesome-cli-coding-agents`
- `plandex-ai/plandex`
- `leonardcser/smelt`
- `mikeyobrien/ralph-orchestrator`
- `subsy/ralph-tui`
- `oxgeneral/ORCH`
- `LucasDuys/forge`
- `ramarlina/agx`
- `youwangd/SageCLI`
- `jcast90/relay`
- `basilisk-labs/agentplane`
- `amaar-mc/wit`
- `fastxyz/skill-optimizer`
- `0xmariowu/AgentLint`
- `ZENG3LD/gate4agent`
`arosstale/pi-builder` was listed but the GitHub repository was not found when
cloned on 2026-05-08.
### Smelt
Smelt's source has four modes:
```text
normal -> plan -> apply -> yolo
```
It also has separate reasoning effort:
```text
off | low | medium | high | max
```
Useful:
- mode cycling is explicit and configurable
- permissions differ by mode
- read-only commands are allowed, writes usually ask, deny wins
- approval scopes are explicit: once, session, workspace
- workspace approvals persist under a workspace hash
Do not copy:
- `yolo` as a name
- putting work kind and trust level into one mode axis
SF should keep Smelt's visible cycling and approval scopes, but preserve SF's
separate axes: `workMode`, `runControl`, `permissionProfile`, and `modelMode`.
### ORCH
ORCH has the cleanest small task state machine:
```text
todo -> in_progress -> review -> done
\-> retrying -> in_progress
\-> failed
review -> todo
* -> cancelled
```
It also keeps runtime state separately:
- `running`
- `claimed`
- `retry_queue`
- total run/task/token/runtime stats
Useful for SF:
- `/tasks` should show both durable task status and ephemeral running state
- successful completion should pass through review, even when auto-approved
- dependency blockers should be computed, not implied from ordering
- retrying should be an explicit state, not hidden inside logs
### AgentPlane
AgentPlane's strongest idea is schema-first task artifacts. Task README
frontmatter includes:
- `risk_level`
- `status`
- `depends_on`
- `task_kind`
- `mutation_scope`
- `risk_flags`
- `blueprint_request`
- `verify`
- `plan_approval`
- `verification`
- `runner`
Its workflow file also makes operational policy explicit:
- workflow mode
- status commit policy
- workspace isolation
- retry policy
- scheduler concurrency
- required evaluator checks
- event log location
Useful for SF:
- task artifacts should have schema-backed frontmatter, not loose markdown
- plan approval and verification state deserve durable fields
- mutation scope and risk flags should feed `permissionProfile`
- workflow policy should be inspectable by `/status` and `/tasks`
### Relay
Relay's useful concepts:
- a channel is the workspace for one piece of work
- tickets are parallelizable units with dependency DAGs, retry budgets,
specialty tags, optional repo routing, and verification commands
- decisions are first-class durable records
- crosslink lets agents discover and message other sessions
- complexity tiers drive approval behavior
- CLI/TUI/GUI all read the same state
Useful for SF:
- keep decisions as first-class records, not buried in summaries
- remote steering should become full-session steering and cross-session
messaging, not only remote questions
- multi-repo work needs explicit repo routing on tasks
- one state store should power TUI, web, headless, and RPC
### Ralph
Ralph's hat system is useful as a coordination topology:
- hats declare triggers, publishes, instructions, backend overrides, max
activations, and disallowed tools
- events flow through a bus
- scope violations are detected when hats publish undeclared topics
- exhaustion emits explicit events
Useful for SF:
- specialized helpers should declare trigger/publish contracts
- helper activation should have max activation limits
- helper output should be checked against declared output topics
- mode transitions can be modeled as events, not ad hoc flags
### Sage
Sage's real value is runtime-neutral orchestration:
- agents are processes
- messages are files
- tasks are templates with frontmatter
- plans decompose into dependency waves
- tasks in a wave execute in parallel
- resume skips done tasks and resets stale running tasks
- runtime fallback is explicit
- bench-as-code compares actual agent CLIs on actual tasks
Useful for SF:
- `/tasks` should be file/DB-backed enough that headless tools can read it
without attaching to a live TUI
- dependency waves should be visible in planning output
- stale running work should be reset or surfaced clearly on resume
- model/provider benchmarking should use actual SF workflows, not isolated
model prompts
### AGX
AGX has useful low-level patterns:
- graph scheduler with hard, soft, failure, and always dependency conditions
- max concurrent work slots
- checkpoints with patch files and bounded history
- deterministic verify gate before LLM fallback
- repeated verification failure count that forces action
Useful for SF:
- dependency edges should support more than "depends on success"
- checkpoints should store patch references and bounded summaries
- deterministic verification should always run before semantic/LLM review
- repeated verify failures should force a mode transition to `repair` or
`review`, not keep retrying indefinitely
### Wit
Wit is the strongest coordination pattern for parallel edits:
- agents declare intent before editing
- agents acquire symbol-level locks
- conflicts are warnings, not always hard blocks
- contracts can be enforced by git hooks
- Tree-sitter provides symbol ranges and call edges
- a `coordinate` skill auto-loads when `.wit/` exists
Useful for SF:
- parallel SF workers should declare intent before editing
- conflict detection should eventually be symbol-aware, not only file-aware
- warnings can steer agents away from collisions without freezing work
- accepted interface contracts should be enforceable before commit
### skill-optimizer
Skill optimizer has the best pattern for making skills real:
- a case is a user-like task plus deterministic graders
- a suite is a case/model matrix
- references are copied into `/work`
- the agent sees only `/work`, not graders or hidden answers
- graders inspect files, artifacts, `answer.json`, `trace.jsonl`, and result
state
- failed trials preserve workspace for debugging
Useful for SF:
- auto-created skills need eval cases
- skill acceptance should be grader-backed, not vibes-backed
- negative cases should check that irrelevant skills were not loaded
- skill optimization should test across model modes/providers
### Plandex And Forge Loop
Plandex reinforces:
- chat/tell split
- configurable autonomy levels
- cumulative diff sandbox before applying changes
- model packs for planning vs execution
Forge Loop reinforces:
- R-numbered acceptance criteria
- task DAGs with tiered parallelism
- per-task worktrees
- per-task and session token budgets
- structural completion markers
- backpropagation from runtime failure to spec gap
- state on disk as the recovery source
SF already has many of these ideas. The part to tighten is the explicit product
surface: direct commands, visible modes, `/tasks`, schema-backed state, and
skill evals.
## Status And Mode Badge
The active state should always be visible, especially during full autonomy.
@ -502,9 +752,14 @@ Still needed:
- make `--autonomous` chain into direct `/autonomous`
- add visible mode/status surface for TUI and web
- expose autonomous continuation limits in settings and status
- add `/tasks` as the unified background work surface
- add `/tasks` as the unified background work surface with durable task state,
ephemeral running state, retries, blockers, checkpoints, budget, and steering
- make `repair` a first-class workflow over doctor
- add policy-aware project skill suggestion/generation
- add skill eval cases for generated project skills
- add schema-backed task/frontmatter fields for risk, mutation scope,
verification, plan approval, and runner status
- add intent/claim records for parallel workers before editing
- audit subagent provider/model/permission inheritance
- audit remote steering as a full-session steering surface, not only remote
question delivery
@ -546,3 +801,242 @@ sf headless --autonomous ...
The target model is simple: direct commands for humans, headless commands for
machines, durable state for autonomous execution, and explicit axes for mode,
control, trust, model posture, and surface.
## Runtime Target: Node 26
SF should treat Node 26 as the target runtime, with Node 24 kept as the current
compatibility floor until the Node 26 lane is proven clean.
Source notes checked 2026-05-08:
- Node 24 is the current LTS line and this repo already requires `>=24.15.0`.
- Node 25 is a short-lived current line. It is useful as a compatibility probe,
but not a target.
- Node 26 is the next meaningful target: current now, LTS-bound, and useful for
SF's own runtime model.
- Bun is closer to Node every release and supports many Node APIs plus
Node-API, but its compatibility target and partial API areas do not match
SF's risk surface yet.
- Deno supports Node/npm compatibility, package.json, local node_modules, and
Node-API addons with FFI permission, but that means SF would still be running
a Node-compatibility workload.
- LLRT is experimental and serverless-oriented, not a local CLI/runtime fit.
### Why Node 26 Makes SF Stronger
Node 26 is not just "newer Node." It gives SF a better platform for long-running
agent work:
- `Temporal` is enabled by default.
- V8 14.6 is the JavaScript engine baseline.
- Undici 8 is the HTTP/fetch baseline.
- Node 26 removes and deprecates more legacy APIs, so it hardens SF against old
loader, stream, HTTP, crypto, and dependency assumptions.
### Temporal Is More Than Better Dates
Temporal gives SF the vocabulary it already needs for durable autonomous work.
Important Temporal concepts:
- `Temporal.Instant`: an exact point in history. Use for journal events,
checkpoint timestamps, lock leases, provider call start/end, and trace order.
- `Temporal.ZonedDateTime`: an exact instant plus time zone and calendar. Use
for reminders, schedules, adoption reviews, audits, and "run this at local
business time" semantics.
- `Temporal.PlainDate`: a calendar date without time or time zone. Use for
daily reports, milestone review dates, and human-facing due dates.
- `Temporal.PlainTime`: a wall-clock time without date or zone. Use for
recurring "at 09:00" style policies.
- `Temporal.PlainDateTime`: a date and wall-clock time before binding it to a
zone. Use only when the zone is deliberately chosen later.
- `Temporal.Duration`: a typed amount of time. Use for budgets, leases,
cooldowns, retry delays, schedule offsets, and age checks.
That split matters because SF currently has many different meanings hidden
behind timestamps and strings:
- exact event ordering
- local user reminders
- project schedule dates
- lease expiry
- retry backoff
- adoption review windows
- elapsed runtime
- "next business day" style planning
`Date` collapses those into one weak type. Temporal lets SF store and validate
the real intent.
### SF Runtime Places That Should Use Temporal
Use Temporal first in the areas where wrong time semantics create real
operational mistakes:
- `sf schedule`: due dates, relative offsets, local-time reminders, audit
windows, and recurrence-ready storage.
- autonomous locks and leases: exact `Instant` plus typed `Duration`, not
implicit millisecond math scattered through code.
- journals and traces: exact event instants with stable ordering and explicit
serialization.
- session reports: elapsed durations and grouped daily summaries without local
timezone drift.
- adoption reviews and decision audits: calendar dates and wall-clock reminders
that survive DST and timezone changes.
- background work surface: task age, stale-running detection, retry-after, and
next-action time should be typed.
### Temporal Design Rule For SF
Store the semantic type, not just the formatted string:
```text
event happened exactly now -> Instant
run at 09:00 in Europe/Oslo -> ZonedDateTime or PlainTime + timeZone
review on 2026-06-01 -> PlainDate
retry after 30 minutes -> Duration
lease expires at exact timestamp -> Instant
```
Serialization should stay explicit and boring:
- store ISO strings plus a field that says which Temporal type they represent
- include timezone when wall-clock semantics matter
- do not infer local timezone at read time unless the record explicitly asks
for it
- validate schedule and lease records at DB boundaries
### Node 26 Adoption Path
Target policy:
```text
current compatibility floor: Node 24.15+
internal target runtime: Node 26
canonical future baseline: Node 26 after canary is clean
Node 25: skip except quick probes
```
### Runtime Alternatives
Other JavaScript runtimes are useful comparators, but none should replace Node
as SF's primary runtime right now.
SF's current runtime shape is Node-native:
- npm workspaces and `package-lock.json`
- Next.js standalone web host
- Vitest and Node test-runner compatibility scripts
- Rust N-API `.node` addons
- `node-pty` native assets in the web host
- `node:` built-ins across CLI, scripts, packages, and web services
- child process, TTY, stream, module loader, and extension-loader behavior
- installed runtime sync into `~/.sf/agent`
#### Bun
Bun is the strongest speed and developer-experience competitor.
Useful:
- fast package install and script startup
- broad Node API compatibility
- built-in TypeScript, test runner, shell, SQLite, YAML, TOML, JSONL, and other
convenience APIs
- Node-API support is substantial enough to use as a compatibility probe
Not primary for SF:
- Bun's own docs say compatibility reflects Node v23, while SF is targeting
Node 26.
- Some core APIs are partial or behaviorally different: `child_process`, module
loader hooks, `node:v8`, `node:test`, `node:sqlite`, `worker_threads`, and
inspector/debugger areas are not exact Node.
- SF's highest-risk paths are exactly the places where "almost Node" can hurt:
TTY, child processes, native addons, Next standalone output, loaders, and
extension runtime.
Decision: use Bun only for optional speed probes or isolated tooling. Do not
make it the SF runtime until full `npm test`, web build, native build, smoke
tests, and installed extension runtime all pass under Bun without special
cases.
#### Deno
Deno has the best security and integrated-toolchain story.
Useful:
- explicit permissions model
- first-class TypeScript and web standards
- npm/package.json compatibility
- Node-API support when local `node_modules` and FFI permission are enabled
- good target for thinking about sandboxing and permission profiles
Not primary for SF:
- Deno still becomes a Node-compatibility mode for a repo like SF.
- Deno docs recommend local `node_modules` for frameworks like Next.js and for
Node-API addons, which means SF would keep most Node/npm complexity anyway.
- Native addons require local `node_modules` plus `--allow-ffi`.
- The value would be security posture and packaging experiments, not simpler
runtime execution.
Decision: study Deno for permission-profile design and maybe future packaged
headless workers. Do not switch the core SF runtime to Deno.
#### LLRT, WinterJS, Edge Runtimes
These are not fits for SF's primary runtime.
Useful:
- serverless cold-start research
- constrained worker/edge execution ideas
- tiny isolated helper tasks
Not primary for SF:
- SF is a long-running local CLI/runtime, not a small stateless Lambda handler.
- SF needs native addons, process control, TTY, filesystem state, git, shell,
Next web host, and Node-compatible package behavior.
- LLRT is explicitly experimental and evaluation-oriented.
Decision: ignore as primary runtime. Only revisit for isolated future worker
surfaces.
### Runtime Decision
Node 26 is the target because SF is a Node-native agent runtime, not a generic
JavaScript app.
Use alternatives this way:
```text
Node 26 -> primary internal target and future baseline
Bun -> speed/compatibility probe, not runtime
Deno -> permission/sandbox design reference, not runtime
LLRT -> ignore except tiny serverless worker research
```
The rule is simple: if a runtime cannot run the exact SF stack without special
cases, it is not stronger for SF. Node 26 makes the existing SF stack stronger;
alternative runtimes mostly make a different stack.
Required Node 26 gate:
```text
node@26 --version
npm run lint
npm run typecheck:extensions
npm run build
npm test
sf --version
sf --help
sf --print "ping"
```
If Node 26 passes those gates, SF should run itself on Node 26 internally even
before raising public `engines.node`. Once stable, raise the repo baseline and
start replacing fragile `Date`/millisecond logic with Temporal in the schedule,
lease, journal, and background task surfaces.