5.8 KiB
ADR-015: Flight recorder via charmbracelet/x/vcr
Date: 2026-04-29 Status: proposed (deferred — capture for staged execution)
Context
sf today writes:
.sf/event-log.jsonl— structured event stream (phase changes, tool calls, errors)..sf/traces/*.jsonl— per-unit trace spans..sf/audit/— historical state snapshots.
These are all structured event streams. They're great for programmatic analysis but they don't record what the auto-loop looked like on the operator's terminal — the actual TUI frames, the stream of tool output, the agent's thinking, the live progress indicators.
When something goes wrong in production (the auto-loop appears to hang, an agent generates surprising output, a hook misbehaves), the operator wants to replay the session — see what was on screen at minute 14 — not reconstruct it from JSON.
charmbracelet/x/vcr records terminal output as a sequence of frames and replays them deterministically. It's the right substrate for a flight recorder.
Decision
- Language: Go. Standalone service or library; integrates with sf via shared filesystem (writes recordings to
.sf/recordings/). - Recording substrate:
charmbracelet/x/vcr— captures ANSI/VT frames into a portable file format with timestamps. - Trigger: every auto-loop unit dispatch records by default. Recording is opt-out per project via
.sf/config.toml([telemetry] flight_recorder = false). - Storage:
.sf/recordings/{unit-id}.vcr, with a retention policy (default 30 days, configurable). Old recordings auto-expire on the next sweep. - Replay:
sf replay <unit-id>— opens the recording in a TUI player; supports pause, scrub, frame-step, search-by-text. - Format: vcr-native. No reinventing.
Alternatives Considered
asciinema— well-known terminal recorder, mature tooling, JSON-based format.- Rejected: asciinema runs as a subprocess wrapping the shell. Integrating with sf's auto-loop (which is the driver, not a child of the recorder) requires inverting the model.
vcris library-shaped — sf calls into it.
- Rejected: asciinema runs as a subprocess wrapping the shell. Integrating with sf's auto-loop (which is the driver, not a child of the recorder) requires inverting the model.
vhs— Charm's CLI video recorder, used for demos.- Rejected:
vhsis for scripted demos, not live capture. Wrong tool.
- Rejected:
- Re-render from
.sf/event-log.jsonl— replay events through pi-tui to reproduce the frames.- Rejected: requires keeping pi-tui forever, and rendering depends on terminal geometry that may differ from the original. Frame-accurate replay is not the same as event replay; both have value but they're different products.
- Build a custom recorder.
- Rejected:
vcrexists. NIH-don't.
- Rejected:
Consequences
Positive
- Frame-accurate post-mortem — when a unit fails or the auto-loop hangs, the operator sees exactly what was on screen, including timing.
- Onboarding artefact — recordings of "what does sf do?" become shareable demos without scripting.
- Audit trail for destructive ops — admin actions in the future Charm TUI client (ADR-017) and Singularity Memory admin UI (ADR-014) can be recorded for security audit.
- Light coupling —
vcris a Go library; sf's TS core invokes a small Go recorder process per unit dispatch. No tight integration with the agent loop.
Negative
- Disk usage — recordings are bigger than event logs (frame data vs. structured records). Mitigated by retention policy. Estimate: ~1MB per 10-minute unit at typical TUI density.
- Operator-only — frame replay isn't useful in headless contexts. Headless dispatches should disable recording (
SF_FLIGHT_RECORDER=0env). - Polyglot crosses one more boundary — sf core (TS) writes recordings via a Go subprocess. Same shape as ADR-013 (TS↔Go via stdio); manageable.
Risks and mitigations
- Risk:
vcrAPI churn — it's incharmbracelet/x(experimental).- Mitigation: pin a version; abstract recording behind an interface so a future swap is contained.
- Risk: Recording overhead measurably slows the auto-loop.
- Mitigation: benchmark before enabling-by-default. If overhead > 5%, ship as opt-in only.
- Risk: Sensitive data (tokens, paths, secrets) leaks into recordings.
- Mitigation: same redaction layer as
event-log.jsonl— applied at the frame level before write. Enforce via a redaction filter applied to the VT stream.
- Mitigation: same redaction layer as
Out of Scope
- Audio recording. Terminal frames only.
- Cross-host recording — each host records its own units; flight-recorder doesn't try to stitch SSH-worker output onto orchestrator-side replay. (Each unit attempt has a
worker_host; replay is per-host.) - Live remote viewing of an in-progress recording — that's a different feature (could be Wish + Bubble Tea showing a "live" view of the auto-loop). Track separately if wanted.
Sequencing
| When | Action |
|---|---|
| Tier 2/3 — after federation primitives land | Build a thin Go recorder process; sf core spawns one per unit dispatch. |
| Tier 3 | sf replay <unit-id> command — TUI player using Bubble Tea. |
| Tier 3 | Redaction filter parity with event-log.jsonl. |
| Tier 4 (nice-to-have) | Retention policy auto-sweep; recording bundle export (sf recording export <unit-id> → .vcr.tar.gz for sharing). |
Out of Scope (continued — feature-creep guardrails)
- AI-assisted summarisation of recordings ("show me what failed in the last 5 unit attempts") — possible later via fantasy + recording metadata, but explicitly not v1.
- Web-based replay UI — server-rendered replay is a separate product surface; v1 is local TUI only.
References
charmbracelet/x/vcr— terminal recording library.SPEC.md§19 — Observability (where structured event logs and traces live).ADR-016— Charm AI stack adoption (frames why Go for new services).ADR-017— Charm TUI client (future replay UI consumer).