singularity-forge/docs/records/2026-05-07-metrics-central-fixes-applied.md

4.9 KiB

Metrics-Central.js Fixes Applied

Date: 2026-05-07 Scope: Address 4 gaps identified in RA.Aid comparison review


Fixes Applied

1. Session Scoping

Problem: Metrics were global to the process. No session filtering.

Fix:

  • Added _sessionId module-level variable
  • initMetricsCentral(basePath, { sessionId, dbAdapter }) accepts session ID
  • recordCounter() and recordGauge() auto-inject session_id label if not present
  • queryMetrics(db, sessionId, name, limit) for DB queries filtered by session

Test: session_id_auto_injected — verifies session_id appears in Prometheus output


2. Cost/Token Metrics

Problem: No cost/token tracking in metrics-central. RA.Aid tracks per-trajectory.

Fix:

  • Added recordCost(unitId, modelId, inputTokens, outputTokens, cost, workMode) function
  • New metrics in METRIC_META:
    • sf_cost_total — cumulative cost per unit/model/mode
    • sf_tokens_input_total — input tokens per model
    • sf_tokens_output_total — output tokens per model
    • sf_cost_last — gauge for last recorded cost

Test: cost_metrics_tracked — verifies all 4 cost metrics are emitted


3. DB Persistence

Problem: isDbAvailable imported but unused. No SQLite persistence.

Fix:

  • initMetricsCentral(basePath, { dbAdapter }) accepts DB adapter
  • ensureMetricsTable(db) creates metrics table with indexes
  • persistMetricsToDb(registry, sessionId, db) flushes counters/gauges/histograms to DB
  • flushMetrics() now writes to both Prometheus file AND SQLite
  • queryMetrics(db, sessionId, name, limit) for programmatic queries

Schema:

CREATE TABLE metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    type TEXT NOT NULL CHECK(type IN ('counter', 'gauge', 'histogram')),
    labels TEXT,           -- JSON object
    value REAL NOT NULL,
    timestamp TEXT NOT NULL DEFAULT (datetime('now')),
    session_id TEXT
);
CREATE INDEX idx_metrics_name ON metrics(name);
CREATE INDEX idx_metrics_session ON metrics(session_id);
CREATE INDEX idx_metrics_timestamp ON metrics(timestamp);

Test: queryMetrics_returns_empty_without_db — graceful fallback when no DB


4. Retry on Flush Failure

Problem: flushMetrics() caught and logged with logWarning(). No retry.

Fix:

  • FLUSH_RETRY_MAX = 3 attempts
  • FLUSH_RETRY_BASE_MS = 1000 with exponential backoff (1s, 2s, 4s)
  • _flushFailures counter tracks consecutive failures
  • After max retries, emits sf_metrics_flush_failed_total counter
  • stopMetricsCentral() attempts final synchronous flush

Behavior:

Flush fail #1 → retry in 1s
Flush fail #2 → retry in 2s  
Flush fail #3 → retry in 4s
Flush fail #4 → emit sf_metrics_flush_failed_total, give up

Bonus Fixes (Not in Original 4)

5. Label Value Escaping

Problem: = or , in label values broke key parsing.

Fix:

  • _escapeLabel(v) escapes \\\, =\=, ,\,
  • _parseLabelKey(key) uses state machine parser instead of split(',')
  • Labels sorted alphabetically for stable output

Test: label_escaping_handles_special_chars{ key: "a=b,c" } round-trips correctly

6. Metric Name Validation

Problem: Invalid Prometheus names (spaces, leading numbers) passed through.

Fix:

  • validateMetricName(name) enforces ^[a-zA-Z_:][a-zA-Z0-9_:]*$
  • Throws TypeError for non-strings, Error for invalid patterns

Test: invalid_metric_name_rejected — spaces and leading numbers rejected


Test Results

Test Files  1 passed (1)
Tests       10 passed (10)

Full suite: 1029 passed, 5 pre-existing failures (unrelated worktree/staging tests), 1 skipped.


Remaining Gaps vs RA.Aid

Gap Status Notes
Per-trajectory granularity Still gap Metrics are aggregated; individual events go to audit/trajectory
Cost CLI commands Still gap No sf cost --session or sf cost --all commands yet
Repository pattern Still gap Data access is functional, not class-based
Pydantic models Still gap No typed model layer
Expert model consultation Still gap No reasoning_assist equivalent
Token limiter Still gap No context window management
Model fallback on 429 Already had SF already switches models on rate-limit

API Summary

// Initialize
initMetricsCentral("/project", { 
  sessionId: "sess-123", 
  dbAdapter: db,
  flushIntervalMs: 60_000 
});

// Record metrics
recordCounter("sf_gate_runs_total", { gate_id: "verify", outcome: "pass" });
recordGauge("sf_cost_guard_hourly_spend", 1.23);
recordHistogram("sf_gate_latency_ms", 150);
recordCost("unit-42", "claude-sonnet-4", 1500, 800, 0.045, "build");

// Query
const rows = queryMetrics(db, "sess-123", "sf_cost_total", 100);

// Shutdown
stopMetricsCentral(); // final flush + cleanup