singularity-forge/docs/records/2026-05-07-sf-vs-ra-aid-full-comparison.md

23 KiB
Raw Blame History

SF vs RA.Aid — Full Feature Comparison

Date: 2026-05-07 Scope: Complete feature-by-feature comparison across all subsystems


Executive Summary

Dimension SF RA.Aid Verdict
Architecture TypeScript monorepo, extension-based, DB-first Python, LangGraph agents, ORM-based Both valid; SF more modular
State Model SQLite + JSONL dual persistence SQLite (Peewee ORM) single source RA.Aid simpler; SF more durable
Agent Stages UOK gates (implicit) Explicit research → plan → implement RA.Aid clearer stage boundaries
Memory Key facts, snippets, notes, trajectory Key facts, snippets, notes, trajectory Parity
Cost Tracking Per-unit SQLite + JSONL ledger Per-trajectory DB records + CLI commands RA.Aid more queryable
Shell Safety Execution policy profiles + inheritance cowboy_mode + interactive approval SF more granular
Subagents Full subagent system with inheritance No subagent delegation SF wins
Mode System 5 work modes × 3 run controls × 4 permission profiles × 3 model modes --research-only, --research-and-plan-only, --hil, --chat SF far ahead
Web UI Next.js TUI + headless + RPC FastAPI server (optional) SF more complete
Testing Vitest, 144+ tests pytest SF more tested
Observability Prometheus metrics + journal + audit Trajectory DB + cost CLI Different philosophies
Skills System .agents/skills/ with YAML frontmatter No skill system SF wins
Recovery Crash recovery, verification retry, rethink Fallback handler, retry with backoff Parity
MCP MCP client only No MCP SF wins

1. Architecture & State Model

SF

singularity-forge/
├── src/resources/extensions/sf/     # Core extension
│   ├── uok/                         # UOK kernel (safety)
│   ├── auto/                        # Autonomous mode state
│   ├── commands/                    # CLI command handlers
│   ├── skills/                      # Skill system
│   └── metrics-central.js           # Prometheus metrics
├── packages/                        # npm workspaces
│   ├── pi-tui/                      # Terminal UI
│   ├── pi-ai/                       # AI provider abstraction
│   └── ...
├── web/                             # Next.js web UI
└── .sf/                             # Project-local state
    ├── sf.db                        # SQLite (schema v43)
    ├── runtime/                     # Working files
    └── sessions/                    # Per-session state

State Philosophy: DB-first with JSONL durability. SQLite is the queryable source of truth; JSONL is the append-only audit log.

RA.Aid

ra_aid/
├── agents/                          # LangGraph agents
│   ├── research_agent.py
│   ├── planning_agent.py
│   └── implementation_agent.py
├── database/                        # Peewee ORM
│   ├── models.py                    # Trajectory, Session, KeyFact, ...
│   ├── connection.py                # SQLite with WAL
│   └── repositories/                # Repository pattern
├── tools/                           # Tool implementations
├── prompts/                         # Prompt templates
└── .ra-aid/                         # Project-local state
    └── pk.db                        # SQLite database

State Philosophy: Single SQLite database with Peewee ORM. Everything is a model: sessions, human inputs, trajectories, key facts, snippets, research notes.

Comparison

Aspect SF RA.Aid
ORM Raw SQLite (better-sqlite3) Peewee (higher-level)
Schema Evolution Manual versioned migrations Peewee migrate
Query Surface Direct SQL + tool wrappers Repository pattern + Pydantic models
Session Isolation Per-session files in ~/.sf/sessions/ Single DB with session_id FK
Cross-Process SQLite WAL + file-based locks Peewee connection pooling
Backup/Export JSONL ledger + DB file DB file only

Verdict: SF's dual persistence (DB + JSONL) is more durable for audit trails. RA.Aid's ORM is more ergonomic for queries.


2. Agent Stage Boundaries

SF: UOK Gate System

SF doesn't have explicit "research agent" / "planning agent" / "implementation agent". Instead, it has:

  • UOK Kernel: Unified Orchestration Kernel that manages unit execution
  • Gates: Pass/fail checkpoints between phases
  • Work Modes: chatplanbuildreviewrepairresearch
  • Run Control: manualassistedautonomous

The stage boundary is implicit in the work mode + unit type combination.

RA.Aid: Explicit Agent Pipeline

# Main flow in __main__.py
if is_informational_query() or args.research_only:
    run_research_agent(...)        # Stage 1
else:
    run_research_agent(...)        # Stage 1
    if not args.research_and_plan_only:
        run_planning_agent(...)    # Stage 2
        run_task_implementation_agent(...)  # Stage 3

Each agent is a separate LangGraph agent with its own:

  • Prompt template
  • Tool set
  • Memory/checkpointer
  • Optional expert reasoning assistance

Comparison

Aspect SF RA.Aid
Stage Definition Work mode + unit type Explicit agent function
Prompt Separation Single prompt with mode injection Separate prompt per agent
Tool Separation All tools available, gated by policy Different tools per agent
Memory Separation Shared session state Separate MemorySaver per agent
Expert Consultation Model mode routing Explicit reasoning_assist prompt
Stage Skipping /mode command --research-only, --research-and-plan-only

Verdict: RA.Aid's explicit pipeline is clearer for users. SF's implicit gates are more flexible but harder to reason about.


3. Memory System

SF

Memory Type Storage Access
Key Facts SQLite (key_facts table) get_key_facts() / add_key_fact()
Code Snippets SQLite (code_snippets table) get_code_snippets()
Research Notes SQLite (research_notes table) get_research_notes()
Trajectory JSONL (uok-audit.jsonl) + SQLite uok/audit.js
Prompt History JSONL (~/.sf/agent/prompt-history.jsonl) prompt-history.js
Work Log SQLite (work_log table) get_work_log()

RA.Aid

Memory Type Storage Access
Key Facts SQLite (key_fact table) KeyFactRepository
Key Snippets SQLite (key_snippet table) KeySnippetRepository
Research Notes SQLite (research_note table) ResearchNoteRepository
Trajectory SQLite (trajectory table) TrajectoryRepository
Human Input SQLite (human_input table) HumanInputRepository
Work Log SQLite (work_log table) WorkLogRepository
Related Files SQLite (related_files table) RelatedFilesRepository

Comparison

Aspect SF RA.Aid
Storage Mixed (SQLite + JSONL) Unified (SQLite only)
Queryability SQL + JSONL grep SQL only
Repository Pattern Ad hoc functions Formal repository classes
Pydantic Models No Yes (TrajectoryModel, etc.)
Garbage Collection Manual Automatic (garbage_collect())
Session Scoping Per-session files session_id foreign key

Verdict: RA.Aid's unified repository pattern is cleaner. SF's dual persistence is more audit-friendly.


4. Cost Tracking

SF

// metrics.js — per-unit cost tracking
export function recordTokenUsage(unitId, modelId, inputTokens, outputTokens, cost) {
  // Writes to SQLite + JSONL
}

// Usage:
recordTokenUsage("unit-123", "claude-sonnet-4", 1500, 800, 0.045);
  • Per-unit cost in SQLite
  • JSONL ledger for durability
  • Dashboard integration via sf cost command
  • No session-level aggregation

RA.Aid

# Trajectory record with cost
trajectory_repo.create(
    tool_name="llm_call",
    current_cost=0.045,
    input_tokens=1500,
    output_tokens=800,
    record_type="model_usage"
)

# Session-level aggregation
session_totals = trajectory_repo.get_session_usage_totals(session_id)
# Returns: {"total_cost": 1.23, "total_tokens": 45000, ...}

# CLI commands:
#   ra-aid last-cost    # Latest session
#   ra-aid all-costs    # All sessions
  • Per-trajectory cost in DB
  • SQL aggregation for session totals
  • Built-in CLI commands for cost queries

Comparison

Aspect SF RA.Aid
Granularity Per-unit Per-trajectory (finer)
Aggregation Manual SQL SUM
CLI Query sf cost (basic) ra-aid last-cost, ra-aid all-costs
Budget Limits Cost guard gate --max-cost, --max-tokens
Show Cost TUI overlay --show-cost flag

Verdict: RA.Aid's cost tracking is more mature with built-in aggregation and CLI queries.


5. Shell Safety & Execution Policy

SF

// execution-policy.js
const PROFILES = {
  restricted: {  // No destructive tools
    allowDestructive: false,
    allowBash: false,
    allowWrite: false,
  },
  normal: {      // Read-only + planning writes
    allowDestructive: false,
    allowBash: true,  // But classified commands blocked
    allowWrite: true, // But source mutations gated
  },
  trusted: {     // Most tools allowed
    allowDestructive: true,
    allowBash: true,
    allowWrite: true,
  },
  unrestricted: { // Everything
    allowDestructive: true,
    allowBash: true,
    allowWrite: true,
  },
};

// Subagent inheritance enforces parent policy
validateSubagentDispatch(envelope, proposal);
  • 4 permission profiles
  • Subagent inheritance (parent → child)
  • Execution policy tool_call hook
  • Destructive command classifier

RA.Aid

# tools/shell.py
cowboy_mode = get_config_repository().get("cowboy_mode", False)

if not cowboy_mode:
    response = Prompt.ask(
        "Execute this command? (y=yes, n=no, c=enable cowboy mode)",
        choices=["y", "n", "c"],
        default="y",
    )
    if response == "n":
        return {"success": False, "output": "Cancelled"}
    elif response == "c":
        get_config_repository().set("cowboy_mode", True)
  • Binary: cowboy_mode on/off
  • Interactive approval per command
  • No subagent delegation (no inheritance needed)

Comparison

Aspect SF RA.Aid
Policy Granularity 4 profiles + model mode + work mode Binary (cowboy_mode)
Approval UX Policy-driven automatic Interactive per-command
Subagent Inheritance Full envelope propagation N/A (no subagents)
Destructive Classification Static list + dynamic analysis None
Audit Trail Journal + metrics Trajectory

Verdict: SF's execution policy is far more sophisticated. RA.Aid's cowboy_mode is simpler but less safe.


6. Subagent System

SF

Full subagent system with:

  • Modes: single, chain, parallel, debate, background
  • Inheritance: Parent mode state propagates to children via env vars
  • Validation: Subagent dispatch blocked if it violates parent policy
  • Coordination: Parallel intent registry prevents conflicting work
// subagent-inheritance.js
export function validateSubagentDispatch(envelope, proposal) {
  // Block if provider not allowed
  // Block if heavy model in fast mode
  // Block if destructive tools in restricted mode
}

RA.Aid

No subagent system. RA.Aid is a single-agent system. It does not dispatch child agents.

Comparison

Aspect SF RA.Aid
Subagent Modes 5 modes None
Inheritance Full mode envelope N/A
Parallel Work Parallel intent registry N/A
Debate Mode Advocate + challenger N/A

Verdict: SF has a significant advantage for complex multi-agent workflows.


7. Mode System

SF

Orthogonal axes:

  • Work Mode: chat | plan | build | review | repair | research
  • Run Control: manual | assisted | autonomous
  • Permission Profile: restricted | normal | trusted | unrestricted
  • Model Mode: fast | smart | deep
  • Surface: tui | web | headless | rpc
// Direct commands
/mode build
/control autonomous
/trust trusted
/model-mode deep

// TUI shortcuts
Ctrl+Shift+M  // Cycle work mode
Ctrl+Shift+A  // Autonomous
Ctrl+Shift+P  // Cycle permission

RA.Aid

Flags:

  • --research-only: Research only, no implementation
  • --research-and-plan-only: Research + plan, then exit
  • --hil: Human-in-the-loop
  • --chat: Chat mode (implies --hil)
  • --cowboy-mode: Skip shell approval
ra-aid -m "task" --research-only
ra-aid -m "task" --research-and-plan-only
ra-aid -m "task" --hil --chat

Comparison

Aspect SF RA.Aid
Work Mode 6 modes with transitions 2 flags (research-only, research-and-plan-only)
Run Control 3 levels Implicit (hil/chat vs default)
Permission 4 profiles 1 flag (cowboy-mode)
Model Routing 3 modes (fast/smart/deep) Per-task provider/model flags
Surface 4 surfaces 2 (CLI, server)
Keyboard Shortcuts 8 shortcuts None
Mode Persistence SQLite + terminal title In-memory only

Verdict: SF's mode system is far more sophisticated and user-friendly.


8. Web UI

SF

  • TUI: Terminal UI with color bands, emojis, mode badges, cost overlay
  • Web: Next.js app with real-time updates
  • Headless: JSON/JSONL output for automation
  • RPC: gRPC/JSON-RPC for external control
sf tui          # Terminal UI
sf web          # Start web server
sf headless     # JSON output
sf rpc          # RPC server

RA.Aid

  • CLI: Rich console output with panels
  • Server: FastAPI server (optional)
ra-aid -m "task"           # CLI
ra-aid --server            # FastAPI on :1818

Comparison

Aspect SF RA.Aid
Terminal UI Full TUI with mode badges Rich panels
Web Interface Next.js FastAPI
Headless/Machine JSON/JSONL event stream None
Real-time Updates WebSocket HTTP polling
Multi-session Session manager Single session

Verdict: SF has a more complete multi-surface architecture.


9. Testing

SF

  • Runner: Vitest
  • Count: 144+ tests across 12 suites
  • Coverage: V8 provider, 40/40/20/20 thresholds
  • Types: Unit + integration + smoke + live
npm test              # All tests
npm run test:unit     # Unit only
npm run test:integration  # Integration
npm run test:smoke    # Smoke tests
npm run test:live     # Live tests (need env)

RA.Aid

  • Runner: pytest
  • Count: Unknown (not inspected)
  • Coverage: Unknown
  • Types: Unit tests
pytest tests/

Comparison

Aspect SF RA.Aid
Test Runner Vitest pytest
Test Count 144+ Unknown
Coverage Enforced in CI Unknown
Integration Tests Yes Unknown
Smoke Tests Yes Unknown
Live Tests Yes Unknown

Verdict: SF appears to have more comprehensive testing infrastructure.


10. Observability

SF

System Purpose Format
metrics-central.js Aggregated metrics Prometheus text
uok/audit.js Per-unit audit trail JSONL
journal.js Mode transitions, decisions SQLite
self-feedback.js Inline self-correction SQLite
TUI footer Real-time cost/context ANSI text

RA.Aid

System Purpose Format
Trajectory Universal event log SQLite (Peewee)
Cost CLI Session cost queries JSON
Work Log Human-readable activity SQLite
Console panels Real-time status Rich text

Comparison

Aspect SF RA.Aid
Metrics Format Prometheus None (DB queries)
Event Granularity Per-unit + per-metric Per-trajectory
Queryability SQL + Prometheus SQL only
Dashboard Ready Yes (Grafana) No
Real-time Display TUI footer Console panels

Verdict: SF is better for external observability (Prometheus). RA.Aid is better for internal debugging (unified trajectory).


11. Skills System

SF

# .agents/skills/my-skill/SKILL.md
---
name: my-skill
user-invocable: true
model-invocable: true
side-effects: none
permission-profile: normal
---
# Skill documentation...
  • YAML frontmatter
  • Hierarchical discovery
  • Permission filtering
  • Work-mode relevance
  • Eval harness

RA.Aid

No skill system. RA.Aid has custom tools (--custom-tools) but no structured skill framework.

Comparison

Aspect SF RA.Aid
Skill Definition YAML frontmatter Python module
Discovery Hierarchical .agents/skills/ --custom-tools flag
Permissions Per-skill profile None
Eval Built-in harness None
Auto-creation Pattern detection None

Verdict: SF has a significant advantage for structured skill management.


12. Recovery & Resilience

SF

Mechanism Purpose
Crash recovery Resume from checkpoint after failure
Verification retry Re-run failed verification gates
Rethink Inject rethink prompt on stuck detection
Circuit breaker Exponential backoff on gate failures
Cost guard Block expensive operations
Writer tokens Prevent concurrent writes
Parity system Detect and recover from drift

RA.Aid

Mechanism Purpose
Fallback handler Switch to alternative models on failure
Retry with backoff Re-run failed agent invocations
Token limiter Remove old messages to prevent overflow
Recursion limit Prevent infinite loops

Comparison

Aspect SF RA.Aid
Checkpoint/Resume Yes No
Model Fallback Yes (on 429/rate-limit) Yes
Token Management No Yes (limiter)
Circuit Breaker Yes No
Cost Guard Yes No (budget only)
Concurrent Write Prevention Yes (writer tokens) No

Verdict: Different strengths. SF better for operational resilience; RA.Aid better for model resilience.


13. MCP Integration

SF

  • MCP Client: Full MCP client with tool discovery, resource listing, OAuth
  • MCP Server Guard: Explicitly forbidden (test enforces this)
// No SF MCP server — client only
pi.registerMcpClient("filesystem", { ... });

RA.Aid

No MCP integration. RA.Aid uses LangChain tools directly.

Comparison

Aspect SF RA.Aid
MCP Client Yes No
MCP Server Explicitly forbidden N/A
Tool Discovery Dynamic from MCP servers Static tool definitions

Verdict: SF is ahead for MCP ecosystem integration.


14. Provider Abstraction

SF

// pi-ai package
const provider = await resolveProvider("anthropic", "claude-sonnet-4");
const response = await provider.complete(prompt, { thinking: true });
  • Abstract provider interface
  • Model mode routing (fast/smart/deep)
  • Temperature/thinking level management
  • Provider allowlists/blocklists

RA.Aid

# llm.py
model = initialize_llm(provider, model, temperature=temperature)
response = model.invoke(prompt)
  • LiteLLM for provider abstraction
  • Per-task provider/model override
  • Temperature support
  • Expert model consultation

Comparison

Aspect SF RA.Aid
Abstraction Layer Custom (pi-ai) LiteLLM
Model Routing Mode-based (fast/smart/deep) Explicit flags
Expert Model No Yes (reasoning_assist)
Temperature Yes Yes
Thinking Level Yes No

Verdict: RA.Aid's expert model consultation is a unique feature. SF's mode-based routing is more automatic.


15. Documentation & Prompt Engineering

SF

  • AGENTS.md: Project-specific instructions
  • CLAUDE.md: Claude-specific guidance
  • PDD: Purpose-Driven Development fields
  • Skills: .agents/skills/ with structured prompts
  • Prompt History: Per-project JSONL

RA.Aid

  • Prompt Templates: Separate files per agent
  • Expert Prompts: Optional expert consultation
  • Human Prompts: HIL sections
  • Custom Tools: Dynamic tool injection

Comparison

Aspect SF RA.Aid
Prompt Organization Skills + PDD Agent-specific files
Expert Consultation Model mode routing Explicit reasoning_assist
Human-in-the-loop Permission profiles --hil flag
Custom Tools Skill system --custom-tools flag
Prompt Versioning Git-tracked skills Package-bundled

Verdict: SF's skill system is more structured. RA.Aid's expert consultation is more dynamic.


Overall Assessment

SF Strengths

  1. Mode system: 5 axes of control vs RA.Aid's binary flags
  2. Subagent system: Full delegation with inheritance
  3. Skills system: Structured, evaluable, discoverable
  4. MCP integration: Client-only, ecosystem-ready
  5. Execution policy: Granular permission profiles
  6. Observability: Prometheus-compatible metrics
  7. Multi-surface: TUI + web + headless + RPC

RA.Aid Strengths

  1. Explicit pipeline: Clear research → plan → implement flow
  2. Expert consultation: Dynamic reasoning assistance
  3. Cost tracking: Built-in aggregation and CLI queries
  4. Repository pattern: Clean data access
  5. Fallback handling: SF already has model switching on 429/rate-limit
  6. Token limiting: Prevent context overflow
  7. Simplicity: Easier to understand and modify

Where SF Should Borrow from RA.Aid

  1. Explicit stage boundaries: Add /research, /plan, /implement commands that mirror RA.Aid's agent pipeline
  2. Expert consultation: Add optional "expert model" for reasoning assistance before complex operations
  3. Cost CLI: Add sf cost --session, sf cost --all commands
  4. Repository pattern: Formalize data access with repository classes
  5. Token limiting: Add context window management
  6. Fallback handler: SF already has model fallback on 429/rate-limit errors

Where RA.Aid Should Borrow from SF

  1. Mode system: Add work modes, permission profiles, model modes
  2. Subagent system: Add delegation for parallel work
  3. Execution policy: Replace cowboy_mode with granular profiles
  4. Skills system: Add structured skill framework
  5. MCP integration: Add MCP client support
  6. UOK gates: Add safety checkpoints between stages
  7. Observability: Add Prometheus metrics

Conclusion

SF and RA.Aid are complementary rather than competitive:

  • SF is a platform: modular, multi-surface, safety-first, designed for complex multi-agent workflows
  • RA.Aid is a tool: focused, simple, explicit, designed for single-agent coding tasks

The ideal system would combine:

  • SF's mode system + subagent system + skills system
  • RA.Aid's explicit pipeline + expert consultation + cost tracking
  • Both projects' DB-first state philosophy