docs(sf): record node 26 runtime target
This commit is contained in:
parent
d640aa0949
commit
760564dbfb
1 changed files with 495 additions and 1 deletions
|
|
@ -455,6 +455,256 @@ This complements, not replaces:
|
|||
Copilot's `/tasks` and `/session` are less powerful internally, but clearer as
|
||||
control surfaces. SF should keep its deeper state and expose it better.
|
||||
|
||||
## Actual Source Pass: Awesome CLI Agent Repos
|
||||
|
||||
Checked locally under `/tmp/sf-agent-research`:
|
||||
|
||||
- `bradAGI/awesome-cli-coding-agents`
|
||||
- `plandex-ai/plandex`
|
||||
- `leonardcser/smelt`
|
||||
- `mikeyobrien/ralph-orchestrator`
|
||||
- `subsy/ralph-tui`
|
||||
- `oxgeneral/ORCH`
|
||||
- `LucasDuys/forge`
|
||||
- `ramarlina/agx`
|
||||
- `youwangd/SageCLI`
|
||||
- `jcast90/relay`
|
||||
- `basilisk-labs/agentplane`
|
||||
- `amaar-mc/wit`
|
||||
- `fastxyz/skill-optimizer`
|
||||
- `0xmariowu/AgentLint`
|
||||
- `ZENG3LD/gate4agent`
|
||||
|
||||
`arosstale/pi-builder` was listed but the GitHub repository was not found when
|
||||
cloned on 2026-05-08.
|
||||
|
||||
### Smelt
|
||||
|
||||
Smelt's source has four modes:
|
||||
|
||||
```text
|
||||
normal -> plan -> apply -> yolo
|
||||
```
|
||||
|
||||
It also has separate reasoning effort:
|
||||
|
||||
```text
|
||||
off | low | medium | high | max
|
||||
```
|
||||
|
||||
Useful:
|
||||
|
||||
- mode cycling is explicit and configurable
|
||||
- permissions differ by mode
|
||||
- read-only commands are allowed, writes usually ask, deny wins
|
||||
- approval scopes are explicit: once, session, workspace
|
||||
- workspace approvals persist under a workspace hash
|
||||
|
||||
Do not copy:
|
||||
|
||||
- `yolo` as a name
|
||||
- putting work kind and trust level into one mode axis
|
||||
|
||||
SF should keep Smelt's visible cycling and approval scopes, but preserve SF's
|
||||
separate axes: `workMode`, `runControl`, `permissionProfile`, and `modelMode`.
|
||||
|
||||
### ORCH
|
||||
|
||||
ORCH has the cleanest small task state machine:
|
||||
|
||||
```text
|
||||
todo -> in_progress -> review -> done
|
||||
\-> retrying -> in_progress
|
||||
\-> failed
|
||||
review -> todo
|
||||
* -> cancelled
|
||||
```
|
||||
|
||||
It also keeps runtime state separately:
|
||||
|
||||
- `running`
|
||||
- `claimed`
|
||||
- `retry_queue`
|
||||
- total run/task/token/runtime stats
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- `/tasks` should show both durable task status and ephemeral running state
|
||||
- successful completion should pass through review, even when auto-approved
|
||||
- dependency blockers should be computed, not implied from ordering
|
||||
- retrying should be an explicit state, not hidden inside logs
|
||||
|
||||
### AgentPlane
|
||||
|
||||
AgentPlane's strongest idea is schema-first task artifacts. Task README
|
||||
frontmatter includes:
|
||||
|
||||
- `risk_level`
|
||||
- `status`
|
||||
- `depends_on`
|
||||
- `task_kind`
|
||||
- `mutation_scope`
|
||||
- `risk_flags`
|
||||
- `blueprint_request`
|
||||
- `verify`
|
||||
- `plan_approval`
|
||||
- `verification`
|
||||
- `runner`
|
||||
|
||||
Its workflow file also makes operational policy explicit:
|
||||
|
||||
- workflow mode
|
||||
- status commit policy
|
||||
- workspace isolation
|
||||
- retry policy
|
||||
- scheduler concurrency
|
||||
- required evaluator checks
|
||||
- event log location
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- task artifacts should have schema-backed frontmatter, not loose markdown
|
||||
- plan approval and verification state deserve durable fields
|
||||
- mutation scope and risk flags should feed `permissionProfile`
|
||||
- workflow policy should be inspectable by `/status` and `/tasks`
|
||||
|
||||
### Relay
|
||||
|
||||
Relay's useful concepts:
|
||||
|
||||
- a channel is the workspace for one piece of work
|
||||
- tickets are parallelizable units with dependency DAGs, retry budgets,
|
||||
specialty tags, optional repo routing, and verification commands
|
||||
- decisions are first-class durable records
|
||||
- crosslink lets agents discover and message other sessions
|
||||
- complexity tiers drive approval behavior
|
||||
- CLI/TUI/GUI all read the same state
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- keep decisions as first-class records, not buried in summaries
|
||||
- remote steering should become full-session steering and cross-session
|
||||
messaging, not only remote questions
|
||||
- multi-repo work needs explicit repo routing on tasks
|
||||
- one state store should power TUI, web, headless, and RPC
|
||||
|
||||
### Ralph
|
||||
|
||||
Ralph's hat system is useful as a coordination topology:
|
||||
|
||||
- hats declare triggers, publishes, instructions, backend overrides, max
|
||||
activations, and disallowed tools
|
||||
- events flow through a bus
|
||||
- scope violations are detected when hats publish undeclared topics
|
||||
- exhaustion emits explicit events
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- specialized helpers should declare trigger/publish contracts
|
||||
- helper activation should have max activation limits
|
||||
- helper output should be checked against declared output topics
|
||||
- mode transitions can be modeled as events, not ad hoc flags
|
||||
|
||||
### Sage
|
||||
|
||||
Sage's real value is runtime-neutral orchestration:
|
||||
|
||||
- agents are processes
|
||||
- messages are files
|
||||
- tasks are templates with frontmatter
|
||||
- plans decompose into dependency waves
|
||||
- tasks in a wave execute in parallel
|
||||
- resume skips done tasks and resets stale running tasks
|
||||
- runtime fallback is explicit
|
||||
- bench-as-code compares actual agent CLIs on actual tasks
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- `/tasks` should be file/DB-backed enough that headless tools can read it
|
||||
without attaching to a live TUI
|
||||
- dependency waves should be visible in planning output
|
||||
- stale running work should be reset or surfaced clearly on resume
|
||||
- model/provider benchmarking should use actual SF workflows, not isolated
|
||||
model prompts
|
||||
|
||||
### AGX
|
||||
|
||||
AGX has useful low-level patterns:
|
||||
|
||||
- graph scheduler with hard, soft, failure, and always dependency conditions
|
||||
- max concurrent work slots
|
||||
- checkpoints with patch files and bounded history
|
||||
- deterministic verify gate before LLM fallback
|
||||
- repeated verification failure count that forces action
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- dependency edges should support more than "depends on success"
|
||||
- checkpoints should store patch references and bounded summaries
|
||||
- deterministic verification should always run before semantic/LLM review
|
||||
- repeated verify failures should force a mode transition to `repair` or
|
||||
`review`, not keep retrying indefinitely
|
||||
|
||||
### Wit
|
||||
|
||||
Wit is the strongest coordination pattern for parallel edits:
|
||||
|
||||
- agents declare intent before editing
|
||||
- agents acquire symbol-level locks
|
||||
- conflicts are warnings, not always hard blocks
|
||||
- contracts can be enforced by git hooks
|
||||
- Tree-sitter provides symbol ranges and call edges
|
||||
- a `coordinate` skill auto-loads when `.wit/` exists
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- parallel SF workers should declare intent before editing
|
||||
- conflict detection should eventually be symbol-aware, not only file-aware
|
||||
- warnings can steer agents away from collisions without freezing work
|
||||
- accepted interface contracts should be enforceable before commit
|
||||
|
||||
### skill-optimizer
|
||||
|
||||
Skill optimizer has the best pattern for making skills real:
|
||||
|
||||
- a case is a user-like task plus deterministic graders
|
||||
- a suite is a case/model matrix
|
||||
- references are copied into `/work`
|
||||
- the agent sees only `/work`, not graders or hidden answers
|
||||
- graders inspect files, artifacts, `answer.json`, `trace.jsonl`, and result
|
||||
state
|
||||
- failed trials preserve workspace for debugging
|
||||
|
||||
Useful for SF:
|
||||
|
||||
- auto-created skills need eval cases
|
||||
- skill acceptance should be grader-backed, not vibes-backed
|
||||
- negative cases should check that irrelevant skills were not loaded
|
||||
- skill optimization should test across model modes/providers
|
||||
|
||||
### Plandex And Forge Loop
|
||||
|
||||
Plandex reinforces:
|
||||
|
||||
- chat/tell split
|
||||
- configurable autonomy levels
|
||||
- cumulative diff sandbox before applying changes
|
||||
- model packs for planning vs execution
|
||||
|
||||
Forge Loop reinforces:
|
||||
|
||||
- R-numbered acceptance criteria
|
||||
- task DAGs with tiered parallelism
|
||||
- per-task worktrees
|
||||
- per-task and session token budgets
|
||||
- structural completion markers
|
||||
- backpropagation from runtime failure to spec gap
|
||||
- state on disk as the recovery source
|
||||
|
||||
SF already has many of these ideas. The part to tighten is the explicit product
|
||||
surface: direct commands, visible modes, `/tasks`, schema-backed state, and
|
||||
skill evals.
|
||||
|
||||
## Status And Mode Badge
|
||||
|
||||
The active state should always be visible, especially during full autonomy.
|
||||
|
|
@ -502,9 +752,14 @@ Still needed:
|
|||
- make `--autonomous` chain into direct `/autonomous`
|
||||
- add visible mode/status surface for TUI and web
|
||||
- expose autonomous continuation limits in settings and status
|
||||
- add `/tasks` as the unified background work surface
|
||||
- add `/tasks` as the unified background work surface with durable task state,
|
||||
ephemeral running state, retries, blockers, checkpoints, budget, and steering
|
||||
- make `repair` a first-class workflow over doctor
|
||||
- add policy-aware project skill suggestion/generation
|
||||
- add skill eval cases for generated project skills
|
||||
- add schema-backed task/frontmatter fields for risk, mutation scope,
|
||||
verification, plan approval, and runner status
|
||||
- add intent/claim records for parallel workers before editing
|
||||
- audit subagent provider/model/permission inheritance
|
||||
- audit remote steering as a full-session steering surface, not only remote
|
||||
question delivery
|
||||
|
|
@ -546,3 +801,242 @@ sf headless --autonomous ...
|
|||
The target model is simple: direct commands for humans, headless commands for
|
||||
machines, durable state for autonomous execution, and explicit axes for mode,
|
||||
control, trust, model posture, and surface.
|
||||
|
||||
## Runtime Target: Node 26
|
||||
|
||||
SF should treat Node 26 as the target runtime, with Node 24 kept as the current
|
||||
compatibility floor until the Node 26 lane is proven clean.
|
||||
|
||||
Source notes checked 2026-05-08:
|
||||
|
||||
- Node 24 is the current LTS line and this repo already requires `>=24.15.0`.
|
||||
- Node 25 is a short-lived current line. It is useful as a compatibility probe,
|
||||
but not a target.
|
||||
- Node 26 is the next meaningful target: current now, LTS-bound, and useful for
|
||||
SF's own runtime model.
|
||||
- Bun is closer to Node every release and supports many Node APIs plus
|
||||
Node-API, but its compatibility target and partial API areas do not match
|
||||
SF's risk surface yet.
|
||||
- Deno supports Node/npm compatibility, package.json, local node_modules, and
|
||||
Node-API addons with FFI permission, but that means SF would still be running
|
||||
a Node-compatibility workload.
|
||||
- LLRT is experimental and serverless-oriented, not a local CLI/runtime fit.
|
||||
|
||||
### Why Node 26 Makes SF Stronger
|
||||
|
||||
Node 26 is not just "newer Node." It gives SF a better platform for long-running
|
||||
agent work:
|
||||
|
||||
- `Temporal` is enabled by default.
|
||||
- V8 14.6 is the JavaScript engine baseline.
|
||||
- Undici 8 is the HTTP/fetch baseline.
|
||||
- Node 26 removes and deprecates more legacy APIs, so it hardens SF against old
|
||||
loader, stream, HTTP, crypto, and dependency assumptions.
|
||||
|
||||
### Temporal Is More Than Better Dates
|
||||
|
||||
Temporal gives SF the vocabulary it already needs for durable autonomous work.
|
||||
|
||||
Important Temporal concepts:
|
||||
|
||||
- `Temporal.Instant`: an exact point in history. Use for journal events,
|
||||
checkpoint timestamps, lock leases, provider call start/end, and trace order.
|
||||
- `Temporal.ZonedDateTime`: an exact instant plus time zone and calendar. Use
|
||||
for reminders, schedules, adoption reviews, audits, and "run this at local
|
||||
business time" semantics.
|
||||
- `Temporal.PlainDate`: a calendar date without time or time zone. Use for
|
||||
daily reports, milestone review dates, and human-facing due dates.
|
||||
- `Temporal.PlainTime`: a wall-clock time without date or zone. Use for
|
||||
recurring "at 09:00" style policies.
|
||||
- `Temporal.PlainDateTime`: a date and wall-clock time before binding it to a
|
||||
zone. Use only when the zone is deliberately chosen later.
|
||||
- `Temporal.Duration`: a typed amount of time. Use for budgets, leases,
|
||||
cooldowns, retry delays, schedule offsets, and age checks.
|
||||
|
||||
That split matters because SF currently has many different meanings hidden
|
||||
behind timestamps and strings:
|
||||
|
||||
- exact event ordering
|
||||
- local user reminders
|
||||
- project schedule dates
|
||||
- lease expiry
|
||||
- retry backoff
|
||||
- adoption review windows
|
||||
- elapsed runtime
|
||||
- "next business day" style planning
|
||||
|
||||
`Date` collapses those into one weak type. Temporal lets SF store and validate
|
||||
the real intent.
|
||||
|
||||
### SF Runtime Places That Should Use Temporal
|
||||
|
||||
Use Temporal first in the areas where wrong time semantics create real
|
||||
operational mistakes:
|
||||
|
||||
- `sf schedule`: due dates, relative offsets, local-time reminders, audit
|
||||
windows, and recurrence-ready storage.
|
||||
- autonomous locks and leases: exact `Instant` plus typed `Duration`, not
|
||||
implicit millisecond math scattered through code.
|
||||
- journals and traces: exact event instants with stable ordering and explicit
|
||||
serialization.
|
||||
- session reports: elapsed durations and grouped daily summaries without local
|
||||
timezone drift.
|
||||
- adoption reviews and decision audits: calendar dates and wall-clock reminders
|
||||
that survive DST and timezone changes.
|
||||
- background work surface: task age, stale-running detection, retry-after, and
|
||||
next-action time should be typed.
|
||||
|
||||
### Temporal Design Rule For SF
|
||||
|
||||
Store the semantic type, not just the formatted string:
|
||||
|
||||
```text
|
||||
event happened exactly now -> Instant
|
||||
run at 09:00 in Europe/Oslo -> ZonedDateTime or PlainTime + timeZone
|
||||
review on 2026-06-01 -> PlainDate
|
||||
retry after 30 minutes -> Duration
|
||||
lease expires at exact timestamp -> Instant
|
||||
```
|
||||
|
||||
Serialization should stay explicit and boring:
|
||||
|
||||
- store ISO strings plus a field that says which Temporal type they represent
|
||||
- include timezone when wall-clock semantics matter
|
||||
- do not infer local timezone at read time unless the record explicitly asks
|
||||
for it
|
||||
- validate schedule and lease records at DB boundaries
|
||||
|
||||
### Node 26 Adoption Path
|
||||
|
||||
Target policy:
|
||||
|
||||
```text
|
||||
current compatibility floor: Node 24.15+
|
||||
internal target runtime: Node 26
|
||||
canonical future baseline: Node 26 after canary is clean
|
||||
Node 25: skip except quick probes
|
||||
```
|
||||
|
||||
### Runtime Alternatives
|
||||
|
||||
Other JavaScript runtimes are useful comparators, but none should replace Node
|
||||
as SF's primary runtime right now.
|
||||
|
||||
SF's current runtime shape is Node-native:
|
||||
|
||||
- npm workspaces and `package-lock.json`
|
||||
- Next.js standalone web host
|
||||
- Vitest and Node test-runner compatibility scripts
|
||||
- Rust N-API `.node` addons
|
||||
- `node-pty` native assets in the web host
|
||||
- `node:` built-ins across CLI, scripts, packages, and web services
|
||||
- child process, TTY, stream, module loader, and extension-loader behavior
|
||||
- installed runtime sync into `~/.sf/agent`
|
||||
|
||||
#### Bun
|
||||
|
||||
Bun is the strongest speed and developer-experience competitor.
|
||||
|
||||
Useful:
|
||||
|
||||
- fast package install and script startup
|
||||
- broad Node API compatibility
|
||||
- built-in TypeScript, test runner, shell, SQLite, YAML, TOML, JSONL, and other
|
||||
convenience APIs
|
||||
- Node-API support is substantial enough to use as a compatibility probe
|
||||
|
||||
Not primary for SF:
|
||||
|
||||
- Bun's own docs say compatibility reflects Node v23, while SF is targeting
|
||||
Node 26.
|
||||
- Some core APIs are partial or behaviorally different: `child_process`, module
|
||||
loader hooks, `node:v8`, `node:test`, `node:sqlite`, `worker_threads`, and
|
||||
inspector/debugger areas are not exact Node.
|
||||
- SF's highest-risk paths are exactly the places where "almost Node" can hurt:
|
||||
TTY, child processes, native addons, Next standalone output, loaders, and
|
||||
extension runtime.
|
||||
|
||||
Decision: use Bun only for optional speed probes or isolated tooling. Do not
|
||||
make it the SF runtime until full `npm test`, web build, native build, smoke
|
||||
tests, and installed extension runtime all pass under Bun without special
|
||||
cases.
|
||||
|
||||
#### Deno
|
||||
|
||||
Deno has the best security and integrated-toolchain story.
|
||||
|
||||
Useful:
|
||||
|
||||
- explicit permissions model
|
||||
- first-class TypeScript and web standards
|
||||
- npm/package.json compatibility
|
||||
- Node-API support when local `node_modules` and FFI permission are enabled
|
||||
- good target for thinking about sandboxing and permission profiles
|
||||
|
||||
Not primary for SF:
|
||||
|
||||
- Deno still becomes a Node-compatibility mode for a repo like SF.
|
||||
- Deno docs recommend local `node_modules` for frameworks like Next.js and for
|
||||
Node-API addons, which means SF would keep most Node/npm complexity anyway.
|
||||
- Native addons require local `node_modules` plus `--allow-ffi`.
|
||||
- The value would be security posture and packaging experiments, not simpler
|
||||
runtime execution.
|
||||
|
||||
Decision: study Deno for permission-profile design and maybe future packaged
|
||||
headless workers. Do not switch the core SF runtime to Deno.
|
||||
|
||||
#### LLRT, WinterJS, Edge Runtimes
|
||||
|
||||
These are not fits for SF's primary runtime.
|
||||
|
||||
Useful:
|
||||
|
||||
- serverless cold-start research
|
||||
- constrained worker/edge execution ideas
|
||||
- tiny isolated helper tasks
|
||||
|
||||
Not primary for SF:
|
||||
|
||||
- SF is a long-running local CLI/runtime, not a small stateless Lambda handler.
|
||||
- SF needs native addons, process control, TTY, filesystem state, git, shell,
|
||||
Next web host, and Node-compatible package behavior.
|
||||
- LLRT is explicitly experimental and evaluation-oriented.
|
||||
|
||||
Decision: ignore as primary runtime. Only revisit for isolated future worker
|
||||
surfaces.
|
||||
|
||||
### Runtime Decision
|
||||
|
||||
Node 26 is the target because SF is a Node-native agent runtime, not a generic
|
||||
JavaScript app.
|
||||
|
||||
Use alternatives this way:
|
||||
|
||||
```text
|
||||
Node 26 -> primary internal target and future baseline
|
||||
Bun -> speed/compatibility probe, not runtime
|
||||
Deno -> permission/sandbox design reference, not runtime
|
||||
LLRT -> ignore except tiny serverless worker research
|
||||
```
|
||||
|
||||
The rule is simple: if a runtime cannot run the exact SF stack without special
|
||||
cases, it is not stronger for SF. Node 26 makes the existing SF stack stronger;
|
||||
alternative runtimes mostly make a different stack.
|
||||
|
||||
Required Node 26 gate:
|
||||
|
||||
```text
|
||||
node@26 --version
|
||||
npm run lint
|
||||
npm run typecheck:extensions
|
||||
npm run build
|
||||
npm test
|
||||
sf --version
|
||||
sf --help
|
||||
sf --print "ping"
|
||||
```
|
||||
|
||||
If Node 26 passes those gates, SF should run itself on Node 26 internally even
|
||||
before raising public `engines.node`. Once stable, raise the repo baseline and
|
||||
start replacing fragile `Date`/millisecond logic with Temporal in the schedule,
|
||||
lease, journal, and background task surfaces.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue