docs(sf): record node 26 runtime target

2026-05-08 01:56:55 +02:00 · 2026-05-08 01:56:55 +02:00 · 760564dbfb
commit 760564dbfb
parent d640aa0949
1 changed files with 495 additions and 1 deletions
--- a/copilot-thoughts.md
+++ b/copilot-thoughts.md
@ -455,6 +455,256 @@ This complements, not replaces:
 Copilot's `/tasks` and `/session` are less powerful internally, but clearer as
 control surfaces. SF should keep its deeper state and expose it better.

+## Actual Source Pass: Awesome CLI Agent Repos
+
+Checked locally under `/tmp/sf-agent-research`:
+
+- `bradAGI/awesome-cli-coding-agents`
+- `plandex-ai/plandex`
+- `leonardcser/smelt`
+- `mikeyobrien/ralph-orchestrator`
+- `subsy/ralph-tui`
+- `oxgeneral/ORCH`
+- `LucasDuys/forge`
+- `ramarlina/agx`
+- `youwangd/SageCLI`
+- `jcast90/relay`
+- `basilisk-labs/agentplane`
+- `amaar-mc/wit`
+- `fastxyz/skill-optimizer`
+- `0xmariowu/AgentLint`
+- `ZENG3LD/gate4agent`
+
+`arosstale/pi-builder` was listed but the GitHub repository was not found when
+cloned on 2026-05-08.
+
+### Smelt
+
+Smelt's source has four modes:
+
+```text
+normal -> plan -> apply -> yolo
+```
+
+It also has separate reasoning effort:
+
+```text
+off | low | medium | high | max
+```
+
+Useful:
+
+- mode cycling is explicit and configurable
+- permissions differ by mode
+- read-only commands are allowed, writes usually ask, deny wins
+- approval scopes are explicit: once, session, workspace
+- workspace approvals persist under a workspace hash
+
+Do not copy:
+
+- `yolo` as a name
+- putting work kind and trust level into one mode axis
+
+SF should keep Smelt's visible cycling and approval scopes, but preserve SF's
+separate axes: `workMode`, `runControl`, `permissionProfile`, and `modelMode`.
+
+### ORCH
+
+ORCH has the cleanest small task state machine:
+
+```text
+todo -> in_progress -> review -> done
+                  \-> retrying -> in_progress
+                  \-> failed
+review -> todo
+* -> cancelled
+```
+
+It also keeps runtime state separately:
+
+- `running`
+- `claimed`
+- `retry_queue`
+- total run/task/token/runtime stats
+
+Useful for SF:
+
+- `/tasks` should show both durable task status and ephemeral running state
+- successful completion should pass through review, even when auto-approved
+- dependency blockers should be computed, not implied from ordering
+- retrying should be an explicit state, not hidden inside logs
+
+### AgentPlane
+
+AgentPlane's strongest idea is schema-first task artifacts. Task README
+frontmatter includes:
+
+- `risk_level`
+- `status`
+- `depends_on`
+- `task_kind`
+- `mutation_scope`
+- `risk_flags`
+- `blueprint_request`
+- `verify`
+- `plan_approval`
+- `verification`
+- `runner`
+
+Its workflow file also makes operational policy explicit:
+
+- workflow mode
+- status commit policy
+- workspace isolation
+- retry policy
+- scheduler concurrency
+- required evaluator checks
+- event log location
+
+Useful for SF:
+
+- task artifacts should have schema-backed frontmatter, not loose markdown
+- plan approval and verification state deserve durable fields
+- mutation scope and risk flags should feed `permissionProfile`
+- workflow policy should be inspectable by `/status` and `/tasks`
+
+### Relay
+
+Relay's useful concepts:
+
+- a channel is the workspace for one piece of work
+- tickets are parallelizable units with dependency DAGs, retry budgets,
+  specialty tags, optional repo routing, and verification commands
+- decisions are first-class durable records
+- crosslink lets agents discover and message other sessions
+- complexity tiers drive approval behavior
+- CLI/TUI/GUI all read the same state
+
+Useful for SF:
+
+- keep decisions as first-class records, not buried in summaries
+- remote steering should become full-session steering and cross-session
+  messaging, not only remote questions
+- multi-repo work needs explicit repo routing on tasks
+- one state store should power TUI, web, headless, and RPC
+
+### Ralph
+
+Ralph's hat system is useful as a coordination topology:
+
+- hats declare triggers, publishes, instructions, backend overrides, max
+  activations, and disallowed tools
+- events flow through a bus
+- scope violations are detected when hats publish undeclared topics
+- exhaustion emits explicit events
+
+Useful for SF:
+
+- specialized helpers should declare trigger/publish contracts
+- helper activation should have max activation limits
+- helper output should be checked against declared output topics
+- mode transitions can be modeled as events, not ad hoc flags
+
+### Sage
+
+Sage's real value is runtime-neutral orchestration:
+
+- agents are processes
+- messages are files
+- tasks are templates with frontmatter
+- plans decompose into dependency waves
+- tasks in a wave execute in parallel
+- resume skips done tasks and resets stale running tasks
+- runtime fallback is explicit
+- bench-as-code compares actual agent CLIs on actual tasks
+
+Useful for SF:
+
+- `/tasks` should be file/DB-backed enough that headless tools can read it
+  without attaching to a live TUI
+- dependency waves should be visible in planning output
+- stale running work should be reset or surfaced clearly on resume
+- model/provider benchmarking should use actual SF workflows, not isolated
+  model prompts
+
+### AGX
+
+AGX has useful low-level patterns:
+
+- graph scheduler with hard, soft, failure, and always dependency conditions
+- max concurrent work slots
+- checkpoints with patch files and bounded history
+- deterministic verify gate before LLM fallback
+- repeated verification failure count that forces action
+
+Useful for SF:
+
+- dependency edges should support more than "depends on success"
+- checkpoints should store patch references and bounded summaries
+- deterministic verification should always run before semantic/LLM review
+- repeated verify failures should force a mode transition to `repair` or
+  `review`, not keep retrying indefinitely
+
+### Wit
+
+Wit is the strongest coordination pattern for parallel edits:
+
+- agents declare intent before editing
+- agents acquire symbol-level locks
+- conflicts are warnings, not always hard blocks
+- contracts can be enforced by git hooks
+- Tree-sitter provides symbol ranges and call edges
+- a `coordinate` skill auto-loads when `.wit/` exists
+
+Useful for SF:
+
+- parallel SF workers should declare intent before editing
+- conflict detection should eventually be symbol-aware, not only file-aware
+- warnings can steer agents away from collisions without freezing work
+- accepted interface contracts should be enforceable before commit
+
+### skill-optimizer
+
+Skill optimizer has the best pattern for making skills real:
+
+- a case is a user-like task plus deterministic graders
+- a suite is a case/model matrix
+- references are copied into `/work`
+- the agent sees only `/work`, not graders or hidden answers
+- graders inspect files, artifacts, `answer.json`, `trace.jsonl`, and result
+  state
+- failed trials preserve workspace for debugging
+
+Useful for SF:
+
+- auto-created skills need eval cases
+- skill acceptance should be grader-backed, not vibes-backed
+- negative cases should check that irrelevant skills were not loaded
+- skill optimization should test across model modes/providers
+
+### Plandex And Forge Loop
+
+Plandex reinforces:
+
+- chat/tell split
+- configurable autonomy levels
+- cumulative diff sandbox before applying changes
+- model packs for planning vs execution
+
+Forge Loop reinforces:
+
+- R-numbered acceptance criteria
+- task DAGs with tiered parallelism
+- per-task worktrees
+- per-task and session token budgets
+- structural completion markers
+- backpropagation from runtime failure to spec gap
+- state on disk as the recovery source
+
+SF already has many of these ideas. The part to tighten is the explicit product
+surface: direct commands, visible modes, `/tasks`, schema-backed state, and
+skill evals.
+
 ## Status And Mode Badge

 The active state should always be visible, especially during full autonomy.
@ -502,9 +752,14 @@ Still needed:
 - make `--autonomous` chain into direct `/autonomous`
 - add visible mode/status surface for TUI and web
 - expose autonomous continuation limits in settings and status
- add `/tasks` as the unified background work surface
+- add `/tasks` as the unified background work surface with durable task state,
+  ephemeral running state, retries, blockers, checkpoints, budget, and steering
 - make `repair` a first-class workflow over doctor
 - add policy-aware project skill suggestion/generation
+- add skill eval cases for generated project skills
+- add schema-backed task/frontmatter fields for risk, mutation scope,
+  verification, plan approval, and runner status
+- add intent/claim records for parallel workers before editing
 - audit subagent provider/model/permission inheritance
 - audit remote steering as a full-session steering surface, not only remote
  question delivery
@ -546,3 +801,242 @@ sf headless --autonomous ...
 The target model is simple: direct commands for humans, headless commands for
 machines, durable state for autonomous execution, and explicit axes for mode,
 control, trust, model posture, and surface.
+
+## Runtime Target: Node 26
+
+SF should treat Node 26 as the target runtime, with Node 24 kept as the current
+compatibility floor until the Node 26 lane is proven clean.
+
+Source notes checked 2026-05-08:
+
+- Node 24 is the current LTS line and this repo already requires `>=24.15.0`.
+- Node 25 is a short-lived current line. It is useful as a compatibility probe,
+  but not a target.
+- Node 26 is the next meaningful target: current now, LTS-bound, and useful for
+  SF's own runtime model.
+- Bun is closer to Node every release and supports many Node APIs plus
+  Node-API, but its compatibility target and partial API areas do not match
+  SF's risk surface yet.
+- Deno supports Node/npm compatibility, package.json, local node_modules, and
+  Node-API addons with FFI permission, but that means SF would still be running
+  a Node-compatibility workload.
+- LLRT is experimental and serverless-oriented, not a local CLI/runtime fit.
+
+### Why Node 26 Makes SF Stronger
+
+Node 26 is not just "newer Node." It gives SF a better platform for long-running
+agent work:
+
+- `Temporal` is enabled by default.
+- V8 14.6 is the JavaScript engine baseline.
+- Undici 8 is the HTTP/fetch baseline.
+- Node 26 removes and deprecates more legacy APIs, so it hardens SF against old
+  loader, stream, HTTP, crypto, and dependency assumptions.
+
+### Temporal Is More Than Better Dates
+
+Temporal gives SF the vocabulary it already needs for durable autonomous work.
+
+Important Temporal concepts:
+
+- `Temporal.Instant`: an exact point in history. Use for journal events,
+  checkpoint timestamps, lock leases, provider call start/end, and trace order.
+- `Temporal.ZonedDateTime`: an exact instant plus time zone and calendar. Use
+  for reminders, schedules, adoption reviews, audits, and "run this at local
+  business time" semantics.
+- `Temporal.PlainDate`: a calendar date without time or time zone. Use for
+  daily reports, milestone review dates, and human-facing due dates.
+- `Temporal.PlainTime`: a wall-clock time without date or zone. Use for
+  recurring "at 09:00" style policies.
+- `Temporal.PlainDateTime`: a date and wall-clock time before binding it to a
+  zone. Use only when the zone is deliberately chosen later.
+- `Temporal.Duration`: a typed amount of time. Use for budgets, leases,
+  cooldowns, retry delays, schedule offsets, and age checks.
+
+That split matters because SF currently has many different meanings hidden
+behind timestamps and strings:
+
+- exact event ordering
+- local user reminders
+- project schedule dates
+- lease expiry
+- retry backoff
+- adoption review windows
+- elapsed runtime
+- "next business day" style planning
+
+`Date` collapses those into one weak type. Temporal lets SF store and validate
+the real intent.
+
+### SF Runtime Places That Should Use Temporal
+
+Use Temporal first in the areas where wrong time semantics create real
+operational mistakes:
+
+- `sf schedule`: due dates, relative offsets, local-time reminders, audit
+  windows, and recurrence-ready storage.
+- autonomous locks and leases: exact `Instant` plus typed `Duration`, not
+  implicit millisecond math scattered through code.
+- journals and traces: exact event instants with stable ordering and explicit
+  serialization.
+- session reports: elapsed durations and grouped daily summaries without local
+  timezone drift.
+- adoption reviews and decision audits: calendar dates and wall-clock reminders
+  that survive DST and timezone changes.
+- background work surface: task age, stale-running detection, retry-after, and
+  next-action time should be typed.
+
+### Temporal Design Rule For SF
+
+Store the semantic type, not just the formatted string:
+
+```text
+event happened exactly now       -> Instant
+run at 09:00 in Europe/Oslo      -> ZonedDateTime or PlainTime + timeZone
+review on 2026-06-01             -> PlainDate
+retry after 30 minutes           -> Duration
+lease expires at exact timestamp -> Instant
+```
+
+Serialization should stay explicit and boring:
+
+- store ISO strings plus a field that says which Temporal type they represent
+- include timezone when wall-clock semantics matter
+- do not infer local timezone at read time unless the record explicitly asks
+  for it
+- validate schedule and lease records at DB boundaries
+
+### Node 26 Adoption Path
+
+Target policy:
+
+```text
+current compatibility floor: Node 24.15+
+internal target runtime:     Node 26
+canonical future baseline:   Node 26 after canary is clean
+Node 25:                     skip except quick probes
+```
+
+### Runtime Alternatives
+
+Other JavaScript runtimes are useful comparators, but none should replace Node
+as SF's primary runtime right now.
+
+SF's current runtime shape is Node-native:
+
+- npm workspaces and `package-lock.json`
+- Next.js standalone web host
+- Vitest and Node test-runner compatibility scripts
+- Rust N-API `.node` addons
+- `node-pty` native assets in the web host
+- `node:` built-ins across CLI, scripts, packages, and web services
+- child process, TTY, stream, module loader, and extension-loader behavior
+- installed runtime sync into `~/.sf/agent`
+
+#### Bun
+
+Bun is the strongest speed and developer-experience competitor.
+
+Useful:
+
+- fast package install and script startup
+- broad Node API compatibility
+- built-in TypeScript, test runner, shell, SQLite, YAML, TOML, JSONL, and other
+  convenience APIs
+- Node-API support is substantial enough to use as a compatibility probe
+
+Not primary for SF:
+
+- Bun's own docs say compatibility reflects Node v23, while SF is targeting
+  Node 26.
+- Some core APIs are partial or behaviorally different: `child_process`, module
+  loader hooks, `node:v8`, `node:test`, `node:sqlite`, `worker_threads`, and
+  inspector/debugger areas are not exact Node.
+- SF's highest-risk paths are exactly the places where "almost Node" can hurt:
+  TTY, child processes, native addons, Next standalone output, loaders, and
+  extension runtime.
+
+Decision: use Bun only for optional speed probes or isolated tooling. Do not
+make it the SF runtime until full `npm test`, web build, native build, smoke
+tests, and installed extension runtime all pass under Bun without special
+cases.
+
+#### Deno
+
+Deno has the best security and integrated-toolchain story.
+
+Useful:
+
+- explicit permissions model
+- first-class TypeScript and web standards
+- npm/package.json compatibility
+- Node-API support when local `node_modules` and FFI permission are enabled
+- good target for thinking about sandboxing and permission profiles
+
+Not primary for SF:
+
+- Deno still becomes a Node-compatibility mode for a repo like SF.
+- Deno docs recommend local `node_modules` for frameworks like Next.js and for
+  Node-API addons, which means SF would keep most Node/npm complexity anyway.
+- Native addons require local `node_modules` plus `--allow-ffi`.
+- The value would be security posture and packaging experiments, not simpler
+  runtime execution.
+
+Decision: study Deno for permission-profile design and maybe future packaged
+headless workers. Do not switch the core SF runtime to Deno.
+
+#### LLRT, WinterJS, Edge Runtimes
+
+These are not fits for SF's primary runtime.
+
+Useful:
+
+- serverless cold-start research
+- constrained worker/edge execution ideas
+- tiny isolated helper tasks
+
+Not primary for SF:
+
+- SF is a long-running local CLI/runtime, not a small stateless Lambda handler.
+- SF needs native addons, process control, TTY, filesystem state, git, shell,
+  Next web host, and Node-compatible package behavior.
+- LLRT is explicitly experimental and evaluation-oriented.
+
+Decision: ignore as primary runtime. Only revisit for isolated future worker
+surfaces.
+
+### Runtime Decision
+
+Node 26 is the target because SF is a Node-native agent runtime, not a generic
+JavaScript app.
+
+Use alternatives this way:
+
+```text
+Node 26 -> primary internal target and future baseline
+Bun     -> speed/compatibility probe, not runtime
+Deno    -> permission/sandbox design reference, not runtime
+LLRT    -> ignore except tiny serverless worker research
+```
+
+The rule is simple: if a runtime cannot run the exact SF stack without special
+cases, it is not stronger for SF. Node 26 makes the existing SF stack stronger;
+alternative runtimes mostly make a different stack.
+
+Required Node 26 gate:
+
+```text
+node@26 --version
+npm run lint
+npm run typecheck:extensions
+npm run build
+npm test
+sf --version
+sf --help
+sf --print "ping"
+```
+
+If Node 26 passes those gates, SF should run itself on Node 26 internally even
+before raising public `engines.node`. Once stable, raise the repo baseline and
+start replacing fragile `Date`/millisecond logic with Temporal in the schedule,
+lease, journal, and background task surfaces.