singularity-forge/docs/dev/ADR-015-flight-recorder.md
2026-04-29 17:44:30 +02:00

5.8 KiB

ADR-015: Flight recorder via charmbracelet/x/vcr

Date: 2026-04-29 Status: proposed (deferred — capture for staged execution)

Context

sf today writes:

  • .sf/event-log.jsonl — structured event stream (phase changes, tool calls, errors).
  • .sf/traces/*.jsonl — per-unit trace spans.
  • .sf/audit/ — historical state snapshots.

These are all structured event streams. They're great for programmatic analysis but they don't record what the auto-loop looked like on the operator's terminal — the actual TUI frames, the stream of tool output, the agent's thinking, the live progress indicators.

When something goes wrong in production (the auto-loop appears to hang, an agent generates surprising output, a hook misbehaves), the operator wants to replay the session — see what was on screen at minute 14 — not reconstruct it from JSON.

charmbracelet/x/vcr records terminal output as a sequence of frames and replays them deterministically. It's the right substrate for a flight recorder.

Decision

  • Language: Go. Standalone service or library; integrates with sf via shared filesystem (writes recordings to .sf/recordings/).
  • Recording substrate: charmbracelet/x/vcr — captures ANSI/VT frames into a portable file format with timestamps.
  • Trigger: every auto-loop unit dispatch records by default. Recording is opt-out per project via .sf/config.toml ([telemetry] flight_recorder = false).
  • Storage: .sf/recordings/{unit-id}.vcr, with a retention policy (default 30 days, configurable). Old recordings auto-expire on the next sweep.
  • Replay: sf replay <unit-id> — opens the recording in a TUI player; supports pause, scrub, frame-step, search-by-text.
  • Format: vcr-native. No reinventing.

Alternatives Considered

  • asciinema — well-known terminal recorder, mature tooling, JSON-based format.
    • Rejected: asciinema runs as a subprocess wrapping the shell. Integrating with sf's auto-loop (which is the driver, not a child of the recorder) requires inverting the model. vcr is library-shaped — sf calls into it.
  • vhs — Charm's CLI video recorder, used for demos.
    • Rejected: vhs is for scripted demos, not live capture. Wrong tool.
  • Re-render from .sf/event-log.jsonl — replay events through pi-tui to reproduce the frames.
    • Rejected: requires keeping pi-tui forever, and rendering depends on terminal geometry that may differ from the original. Frame-accurate replay is not the same as event replay; both have value but they're different products.
  • Build a custom recorder.
    • Rejected: vcr exists. NIH-don't.

Consequences

Positive

  • Frame-accurate post-mortem — when a unit fails or the auto-loop hangs, the operator sees exactly what was on screen, including timing.
  • Onboarding artefact — recordings of "what does sf do?" become shareable demos without scripting.
  • Audit trail for destructive ops — admin actions in the future Charm TUI client (ADR-017) and Singularity Memory admin UI (ADR-014) can be recorded for security audit.
  • Light couplingvcr is a Go library; sf's TS core invokes a small Go recorder process per unit dispatch. No tight integration with the agent loop.

Negative

  • Disk usage — recordings are bigger than event logs (frame data vs. structured records). Mitigated by retention policy. Estimate: ~1MB per 10-minute unit at typical TUI density.
  • Operator-only — frame replay isn't useful in headless contexts. Headless dispatches should disable recording (SF_FLIGHT_RECORDER=0 env).
  • Polyglot crosses one more boundary — sf core (TS) writes recordings via a Go subprocess. Same shape as ADR-013 (TS↔Go via stdio); manageable.

Risks and mitigations

  • Risk: vcr API churn — it's in charmbracelet/x (experimental).
    • Mitigation: pin a version; abstract recording behind an interface so a future swap is contained.
  • Risk: Recording overhead measurably slows the auto-loop.
    • Mitigation: benchmark before enabling-by-default. If overhead > 5%, ship as opt-in only.
  • Risk: Sensitive data (tokens, paths, secrets) leaks into recordings.
    • Mitigation: same redaction layer as event-log.jsonl — applied at the frame level before write. Enforce via a redaction filter applied to the VT stream.

Out of Scope

  • Audio recording. Terminal frames only.
  • Cross-host recording — each host records its own units; flight-recorder doesn't try to stitch SSH-worker output onto orchestrator-side replay. (Each unit attempt has a worker_host; replay is per-host.)
  • Live remote viewing of an in-progress recording — that's a different feature (could be Wish + Bubble Tea showing a "live" view of the auto-loop). Track separately if wanted.

Sequencing

When Action
Tier 2/3 — after federation primitives land Build a thin Go recorder process; sf core spawns one per unit dispatch.
Tier 3 sf replay <unit-id> command — TUI player using Bubble Tea.
Tier 3 Redaction filter parity with event-log.jsonl.
Tier 4 (nice-to-have) Retention policy auto-sweep; recording bundle export (sf recording export <unit-id>.vcr.tar.gz for sharing).

Out of Scope (continued — feature-creep guardrails)

  • AI-assisted summarisation of recordings ("show me what failed in the last 5 unit attempts") — possible later via fantasy + recording metadata, but explicitly not v1.
  • Web-based replay UI — server-rendered replay is a separate product surface; v1 is local TUI only.

References

  • charmbracelet/x/vcr — terminal recording library.
  • SPEC.md §19 — Observability (where structured event logs and traces live).
  • ADR-016 — Charm AI stack adoption (frames why Go for new services).
  • ADR-017 — Charm TUI client (future replay UI consumer).