4.9 KiB
Metrics-Central.js Fixes Applied
Date: 2026-05-07 Scope: Address 4 gaps identified in RA.Aid comparison review
Fixes Applied
1. ✅ Session Scoping
Problem: Metrics were global to the process. No session filtering.
Fix:
- Added
_sessionIdmodule-level variable initMetricsCentral(basePath, { sessionId, dbAdapter })accepts session IDrecordCounter()andrecordGauge()auto-injectsession_idlabel if not presentqueryMetrics(db, sessionId, name, limit)for DB queries filtered by session
Test: session_id_auto_injected — verifies session_id appears in Prometheus output
2. ✅ Cost/Token Metrics
Problem: No cost/token tracking in metrics-central. RA.Aid tracks per-trajectory.
Fix:
- Added
recordCost(unitId, modelId, inputTokens, outputTokens, cost, workMode)function - New metrics in METRIC_META:
sf_cost_total— cumulative cost per unit/model/modesf_tokens_input_total— input tokens per modelsf_tokens_output_total— output tokens per modelsf_cost_last— gauge for last recorded cost
Test: cost_metrics_tracked — verifies all 4 cost metrics are emitted
3. ✅ DB Persistence
Problem: isDbAvailable imported but unused. No SQLite persistence.
Fix:
initMetricsCentral(basePath, { dbAdapter })accepts DB adapterensureMetricsTable(db)createsmetricstable with indexespersistMetricsToDb(registry, sessionId, db)flushes counters/gauges/histograms to DBflushMetrics()now writes to both Prometheus file AND SQLitequeryMetrics(db, sessionId, name, limit)for programmatic queries
Schema:
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')),
labels TEXT, -- JSON object
value REAL NOT NULL,
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
session_id TEXT
);
CREATE INDEX idx_metrics_name ON metrics(name);
CREATE INDEX idx_metrics_session ON metrics(session_id);
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);
Test: queryMetrics_returns_empty_without_db — graceful fallback when no DB
4. ✅ Retry on Flush Failure
Problem: flushMetrics() caught and logged with logWarning(). No retry.
Fix:
FLUSH_RETRY_MAX = 3attemptsFLUSH_RETRY_BASE_MS = 1000with exponential backoff (1s, 2s, 4s)_flushFailurescounter tracks consecutive failures- After max retries, emits
sf_metrics_flush_failed_totalcounter stopMetricsCentral()attempts final synchronous flush
Behavior:
Flush fail #1 → retry in 1s
Flush fail #2 → retry in 2s
Flush fail #3 → retry in 4s
Flush fail #4 → emit sf_metrics_flush_failed_total, give up
Bonus Fixes (Not in Original 4)
5. ✅ Label Value Escaping
Problem: = or , in label values broke key parsing.
Fix:
_escapeLabel(v)escapes\→\\,=→\=,,→\,_parseLabelKey(key)uses state machine parser instead ofsplit(',')- Labels sorted alphabetically for stable output
Test: label_escaping_handles_special_chars — { key: "a=b,c" } round-trips correctly
6. ✅ Metric Name Validation
Problem: Invalid Prometheus names (spaces, leading numbers) passed through.
Fix:
validateMetricName(name)enforces^[a-zA-Z_:][a-zA-Z0-9_:]*$- Throws
TypeErrorfor non-strings,Errorfor invalid patterns
Test: invalid_metric_name_rejected — spaces and leading numbers rejected
Test Results
Test Files 1 passed (1)
Tests 10 passed (10)
Full suite: 1029 passed, 5 pre-existing failures (unrelated worktree/staging tests), 1 skipped.
Remaining Gaps vs RA.Aid
| Gap | Status | Notes |
|---|---|---|
| Per-trajectory granularity | ❌ Still gap | Metrics are aggregated; individual events go to audit/trajectory |
| Cost CLI commands | ❌ Still gap | No sf cost --session or sf cost --all commands yet |
| Repository pattern | ❌ Still gap | Data access is functional, not class-based |
| Pydantic models | ❌ Still gap | No typed model layer |
| Expert model consultation | ❌ Still gap | No reasoning_assist equivalent |
| Token limiter | ❌ Still gap | No context window management |
| Model fallback on 429 | ✅ Already had | SF already switches models on rate-limit |
API Summary
// Initialize
initMetricsCentral("/project", {
sessionId: "sess-123",
dbAdapter: db,
flushIntervalMs: 60_000
});
// Record metrics
recordCounter("sf_gate_runs_total", { gate_id: "verify", outcome: "pass" });
recordGauge("sf_cost_guard_hourly_spend", 1.23);
recordHistogram("sf_gate_latency_ms", 150);
recordCost("unit-42", "claude-sonnet-4", 1500, 800, 0.045, "build");
// Query
const rows = queryMetrics(db, "sess-123", "sf_cost_total", 100);
// Shutdown
stopMetricsCentral(); // final flush + cleanup