9.9 KiB
Metrics Central vs RA.Aid Architecture Review
Date: 2026-05-07
Reviewer: Claude Code (SF)
Scope: metrics-central.js and its wiring, compared against RA.Aid patterns
RA.Aid Architecture Summary
RA.Aid is a Python-based autonomous coding agent with these key architectural decisions:
| Layer | Pattern |
|---|---|
| State | Peewee ORM over SQLite (.ra-aid/pk.db), WAL mode, contextvars for connection scoping |
| Agents | LangGraph agents (research → planning → implementation) with explicit stage boundaries |
| Memory | Key facts, key snippets, research notes, trajectories — all DB-backed with repositories |
| Trajectory | Every tool call recorded: tool_name, parameters, result, cost, tokens, is_error, error_message |
| Config | JSON config file + runtime config repository with defaults |
| Shell | Interactive approval with cowboy_mode bypass, trajectory logging, timeout handling |
| Reasoning | Optional expert model consultation before each stage (reasoning_assist) |
| Recovery | Fallback handlers, retry with backoff, agent thread manager |
RA.Aid's Observability Model
RA.Aid doesn't have a separate metrics system. Instead, observability is embedded in the trajectory:
- Every tool execution →
Trajectoryrecord with cost, tokens, timing - Every stage transition →
Trajectoryrecord withrecord_type="stage_transition" - Every human input →
HumanInputrecord linked to trajectories - Every error →
Trajectorywithis_error=true,error_type,error_details
This is event-sourced observability: the DB is the single source of truth for both state AND metrics.
Our Metrics-Central.js Design
What We Built
A Prometheus-compatible metrics collector with:
- Counter, Gauge, Histogram types
- In-memory aggregation with 60s flush to
.sf/runtime/sf-metrics.prom - Pre-defined metric metadata registry
- Wiring into subagent inheritance and mode transitions
Design Decisions and Their Trade-offs
| Decision | Rationale | RA.Aid Comparison |
|---|---|---|
| Prometheus text format | Compatible with existing exposition, scrapeable by Grafana | RA.Aid uses DB queries; we support both |
| In-memory aggregation | Zero dependencies, fast | RA.Aid queries DB directly; we add a layer |
| 60s flush interval | Batch writes, reduce I/O | RA.Aid writes per event; we batch |
| Separate from trajectory/audit | Metrics are aggregated views, not individual events | RA.Aid conflates events and metrics |
| Metric metadata registry | Pre-defined help text and labels | RA.Aid uses Peewee model definitions |
The Review: 5 Lenses
Lens 1: Data Model Consistency
RA.Aid Pattern: Single SQLite DB with typed models. Trajectory is the universal event log.
Our Pattern: Dual persistence:
- SQLite for operational state (UOK, sessions, tasks)
- Prometheus text file for metrics exposition
- JSONL for event durability
Verdict: ⚠️ NEEDS WORK
We have THREE observability sinks (SQLite, Prometheus file, JSONL) where RA.Aid has one. This creates:
- Risk of inconsistency between
sf-metrics.promandsf.db - No unified query surface for "show me all subagent blocks in the last hour"
- Metrics file is write-only; no read path for programmatic consumption
Recommendation: Add a metrics table to sf.db that mirrors the Prometheus data model. The text file becomes a projection, not a source of truth.
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')),
labels TEXT, -- JSON object
value REAL NOT NULL,
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
session_id TEXT
);
Lens 2: Event-Sourced vs Aggregated
RA.Aid Pattern: Every event is a row. Aggregation happens at query time.
Our Pattern: Aggregation happens at write time. Individual events are lost.
Verdict: ✅ ACCEPTABLE for metrics, but incomplete for observability
For counters and gauges, aggregation is correct. But for debugging "why was this subagent blocked?", we need the individual event, not just sf_subagent_dispatch_blocked{reason="provider"} 5.
Recommendation: Keep metrics-central for aggregated Prometheus output, but ALSO emit individual events to the audit/trajectory system. The metric is the summary; the trajectory is the detail.
Lens 3: Context and Session Scoping
RA.Aid Pattern: Every record has a session_id foreign key. Contextvars scope the DB connection.
Our Pattern: Metrics are global to the process. No session scoping.
Verdict: ❌ GAP
Our metrics can't answer: "How many subagent dispatches were blocked in session X?" This is critical for:
- Per-session cost attribution
- Debugging why a specific run failed
- Multi-tenant scenarios (if SF ever serves multiple users)
Recommendation: Add session_id label to all metrics. Use ctx.sessionId or getAutoSession().currentTraceId.
Lens 4: Cost and Token Tracking
RA.Aid Pattern: Every trajectory record has current_cost, input_tokens, output_tokens.
Our Pattern: No cost/token metrics in metrics-central yet.
Verdict: ❌ MISSING
RA.Aid tracks cost per tool call. We track cost in metrics.js (SQLite + JSONL) but not in metrics-central. This means:
- No Prometheus-compatible cost metrics
- No cost alerts from Grafana
- No cost attribution by work mode or permission profile
Recommendation: Add cost/token metrics:
"sf_cost_total": { help: "Total cost in USD", labels: ["work_mode", "model_id"] },
"sf_tokens_input_total": { help: "Total input tokens", labels: ["model_id"] },
"sf_tokens_output_total": { help: "Total output tokens", labels: ["model_id"] },
Lens 5: Error Handling and Resilience
RA.Aid Pattern: Every error is caught, logged, and stored in the trajectory with full context.
Our Pattern: flushMetrics() catches and logs with logWarning(). No retry.
Verdict: ⚠️ ACCEPTABLE but could be stronger
Our flush failure is best-effort, which matches RA.Aid's philosophy. But RA.Aid also:
- Reopens closed DB connections automatically
- Has fallback handlers for agent failures
- Records error details in the trajectory
Recommendation:
- Add retry with exponential backoff for flush failures
- If flush fails 3 times, emit a
metrics_flush_failedcounter - On process exit, attempt a final synchronous flush
Specific Code Review Findings
Finding 1: Unused Import
import { isDbAvailable } from "./sf-db.js";
This is imported but never used. The JSDoc mentions "Optional SQLite persistence" but it's not implemented.
Fix: Either implement DB persistence or remove the import.
Finding 2: Histogram Bucket Sorting
this.buckets = [...buckets].sort((a, b) => a - b);
This mutates the input array (creates a copy first, so safe). But Prometheus expects buckets in ascending order, which is guaranteed.
Verdict: ✅ Correct.
Finding 3: Label Key Serialization
_key(labels) {
return this.labelNames.map((k) => `${k}=${labels[k] ?? ""}`).join(",");
}
If a label value contains = or ,, the key parsing will break.
Fix: Add escaping or use a structured key format (e.g., JSON).
Finding 4: No Validation on Metric Names
export function recordCounter(name, labels = {}, amount = 1) {
const meta = getMetricMeta(name);
getRegistry().counter(name, meta.help, Object.keys(labels)).inc(labels, amount);
}
If name contains spaces or invalid Prometheus characters, the output will be malformed.
Fix: Add validateMetricName(name) that rejects invalid characters.
Finding 5: Timer Unref
if (_flushTimer.unref) _flushTimer.unref();
This is correct for Node.js but may not work in all environments (e.g., Bun).
Verdict: ✅ Acceptable with fallback.
Overall Assessment
| Dimension | Grade | Notes |
|---|---|---|
| Correctness | B+ | Prometheus output is valid, but label escaping needs work |
| Completeness | B | Missing cost/token metrics, session scoping, DB persistence |
| Consistency with SF | A | Fits the extension model, uses existing patterns |
| Consistency with RA.Aid | C | RA.Aid would prefer event-sourced over aggregated |
| Production Readiness | B | Needs retry, validation, and DB projection before GA |
Priority Fixes
- P0: Add
session_idlabel to all metrics - P0: Remove unused
isDbAvailableimport or implement DB persistence - P1: Add cost/token metrics
- P1: Fix label value escaping
- P1: Add metric name validation
- P2: Add retry with backoff for flush failures
- P2: Add final flush on process exit
- P2: Consider a
metricstable insf.dbas source of truth
RA.Aid Patterns Worth Adopting
- Trajectory-style event logging: Every metric should have a corresponding event in the audit/trajectory system
- Session-scoped connections: All observability should be filterable by session
- Per-tool cost tracking: Every tool call should record cost and tokens
- Error detail preservation: When metrics indicate failure, the detail should be queryable
Conclusion
metrics-central.js is a solid Prometheus-compatible metrics layer that fills a real gap in SF's observability. However, it prioritizes exposition format over observability depth. RA.Aid's trajectory model is superior for debugging and audit because it preserves every event.
The right path forward:
- Keep metrics-central for Prometheus output (Grafana compatibility)
- Add a
metricstable tosf.dbfor queryable aggregation - Ensure every metric has a corresponding audit/trajectory event
- Add session scoping and cost tracking
This gives us the best of both worlds: Prometheus for dashboards, SQLite for queries, and trajectory for debugging.