Mikael Hugo 15269f4176 sf snapshot: uncommitted changes after 202m inactivity

2026-05-08 13:31:08 +02:00

23 KiB

Raw Blame History

SF vs RA.Aid — Full Feature Comparison

Date: 2026-05-07 Scope: Complete feature-by-feature comparison across all subsystems

Executive Summary

Dimension	SF	RA.Aid	Verdict
Architecture	TypeScript monorepo, extension-based, DB-first	Python, LangGraph agents, ORM-based	Both valid; SF more modular
State Model	SQLite + JSONL dual persistence	SQLite (Peewee ORM) single source	RA.Aid simpler; SF more durable
Agent Stages	UOK gates (implicit)	Explicit research → plan → implement	RA.Aid clearer stage boundaries
Memory	Key facts, snippets, notes, trajectory	Key facts, snippets, notes, trajectory	Parity
Cost Tracking	Per-unit SQLite + JSONL ledger	Per-trajectory DB records + CLI commands	RA.Aid more queryable
Shell Safety	Execution policy profiles + inheritance	cowboy_mode + interactive approval	SF more granular
Subagents	Full subagent system with inheritance	No subagent delegation	SF wins
Mode System	5 work modes × 3 run controls × 4 permission profiles × 3 model modes	--research-only, --research-and-plan-only, --hil, --chat	SF far ahead
Web UI	Next.js TUI + headless + RPC	FastAPI server (optional)	SF more complete
Testing	Vitest, 144+ tests	pytest	SF more tested
Observability	Prometheus metrics + journal + audit	Trajectory DB + cost CLI	Different philosophies
Skills System	`.agents/skills/` with YAML frontmatter	No skill system	SF wins
Recovery	Crash recovery, verification retry, rethink	Fallback handler, retry with backoff	Parity
MCP	MCP client only	No MCP	SF wins

1. Architecture & State Model

SF

singularity-forge/
├── src/resources/extensions/sf/     # Core extension
│   ├── uok/                         # UOK kernel (safety)
│   ├── auto/                        # Autonomous mode state
│   ├── commands/                    # CLI command handlers
│   ├── skills/                      # Skill system
│   └── metrics-central.js           # Prometheus metrics
├── packages/                        # npm workspaces
│   ├── pi-tui/                      # Terminal UI
│   ├── pi-ai/                       # AI provider abstraction
│   └── ...
├── web/                             # Next.js web UI
└── .sf/                             # Project-local state
    ├── sf.db                        # SQLite (schema v43)
    ├── runtime/                     # Working files
    └── sessions/                    # Per-session state

State Philosophy: DB-first with JSONL durability. SQLite is the queryable source of truth; JSONL is the append-only audit log.

RA.Aid

ra_aid/
├── agents/                          # LangGraph agents
│   ├── research_agent.py
│   ├── planning_agent.py
│   └── implementation_agent.py
├── database/                        # Peewee ORM
│   ├── models.py                    # Trajectory, Session, KeyFact, ...
│   ├── connection.py                # SQLite with WAL
│   └── repositories/                # Repository pattern
├── tools/                           # Tool implementations
├── prompts/                         # Prompt templates
└── .ra-aid/                         # Project-local state
    └── pk.db                        # SQLite database

State Philosophy: Single SQLite database with Peewee ORM. Everything is a model: sessions, human inputs, trajectories, key facts, snippets, research notes.

Comparison

Aspect	SF	RA.Aid
ORM	Raw SQLite (better-sqlite3)	Peewee (higher-level)
Schema Evolution	Manual versioned migrations	Peewee migrate
Query Surface	Direct SQL + tool wrappers	Repository pattern + Pydantic models
Session Isolation	Per-session files in `~/.sf/sessions/`	Single DB with session_id FK
Cross-Process	SQLite WAL + file-based locks	Peewee connection pooling
Backup/Export	JSONL ledger + DB file	DB file only

Verdict: SF's dual persistence (DB + JSONL) is more durable for audit trails. RA.Aid's ORM is more ergonomic for queries.

2. Agent Stage Boundaries

SF: UOK Gate System

SF doesn't have explicit "research agent" / "planning agent" / "implementation agent". Instead, it has:

UOK Kernel: Unified Orchestration Kernel that manages unit execution
Gates: Pass/fail checkpoints between phases
Work Modes: chat → plan → build → review → repair → research
Run Control: manual → assisted → autonomous

The stage boundary is implicit in the work mode + unit type combination.

RA.Aid: Explicit Agent Pipeline

# Main flow in __main__.py
if is_informational_query() or args.research_only:
    run_research_agent(...)        # Stage 1
else:
    run_research_agent(...)        # Stage 1
    if not args.research_and_plan_only:
        run_planning_agent(...)    # Stage 2
        run_task_implementation_agent(...)  # Stage 3

Each agent is a separate LangGraph agent with its own:

Prompt template
Tool set
Memory/checkpointer
Optional expert reasoning assistance

Comparison

Aspect	SF	RA.Aid
Stage Definition	Work mode + unit type	Explicit agent function
Prompt Separation	Single prompt with mode injection	Separate prompt per agent
Tool Separation	All tools available, gated by policy	Different tools per agent
Memory Separation	Shared session state	Separate MemorySaver per agent
Expert Consultation	Model mode routing	Explicit reasoning_assist prompt
Stage Skipping	`/mode` command	`--research-only`, `--research-and-plan-only`

Verdict: RA.Aid's explicit pipeline is clearer for users. SF's implicit gates are more flexible but harder to reason about.

3. Memory System

SF

Memory Type	Storage	Access
Key Facts	SQLite (`key_facts` table)	`get_key_facts()` / `add_key_fact()`
Code Snippets	SQLite (`code_snippets` table)	`get_code_snippets()`
Research Notes	SQLite (`research_notes` table)	`get_research_notes()`
Trajectory	JSONL (`uok-audit.jsonl`) + SQLite	`uok/audit.js`
Prompt History	JSONL (`~/.sf/agent/prompt-history.jsonl`)	`prompt-history.js`
Work Log	SQLite (`work_log` table)	`get_work_log()`

RA.Aid

Memory Type	Storage	Access
Key Facts	SQLite (`key_fact` table)	`KeyFactRepository`
Key Snippets	SQLite (`key_snippet` table)	`KeySnippetRepository`
Research Notes	SQLite (`research_note` table)	`ResearchNoteRepository`
Trajectory	SQLite (`trajectory` table)	`TrajectoryRepository`
Human Input	SQLite (`human_input` table)	`HumanInputRepository`
Work Log	SQLite (`work_log` table)	`WorkLogRepository`
Related Files	SQLite (`related_files` table)	`RelatedFilesRepository`

Comparison

Aspect	SF	RA.Aid
Storage	Mixed (SQLite + JSONL)	Unified (SQLite only)
Queryability	SQL + JSONL grep	SQL only
Repository Pattern	Ad hoc functions	Formal repository classes
Pydantic Models	No	Yes (`TrajectoryModel`, etc.)
Garbage Collection	Manual	Automatic (`garbage_collect()`)
Session Scoping	Per-session files	`session_id` foreign key

Verdict: RA.Aid's unified repository pattern is cleaner. SF's dual persistence is more audit-friendly.

4. Cost Tracking

SF

// metrics.js — per-unit cost tracking
export function recordTokenUsage(unitId, modelId, inputTokens, outputTokens, cost) {
  // Writes to SQLite + JSONL
}

// Usage:
recordTokenUsage("unit-123", "claude-sonnet-4", 1500, 800, 0.045);

Per-unit cost in SQLite
JSONL ledger for durability
Dashboard integration via sf cost command
No session-level aggregation

RA.Aid

# Trajectory record with cost
trajectory_repo.create(
    tool_name="llm_call",
    current_cost=0.045,
    input_tokens=1500,
    output_tokens=800,
    record_type="model_usage"
)

# Session-level aggregation
session_totals = trajectory_repo.get_session_usage_totals(session_id)
# Returns: {"total_cost": 1.23, "total_tokens": 45000, ...}

# CLI commands:
#   ra-aid last-cost    # Latest session
#   ra-aid all-costs    # All sessions

Per-trajectory cost in DB
SQL aggregation for session totals
Built-in CLI commands for cost queries

Comparison

Aspect	SF	RA.Aid
Granularity	Per-unit	Per-trajectory (finer)
Aggregation	Manual	SQL SUM
CLI Query	`sf cost` (basic)	`ra-aid last-cost`, `ra-aid all-costs`
Budget Limits	Cost guard gate	`--max-cost`, `--max-tokens`
Show Cost	TUI overlay	`--show-cost` flag

Verdict: RA.Aid's cost tracking is more mature with built-in aggregation and CLI queries.

5. Shell Safety & Execution Policy

SF

// execution-policy.js
const PROFILES = {
  restricted: {  // No destructive tools
    allowDestructive: false,
    allowBash: false,
    allowWrite: false,
  },
  normal: {      // Read-only + planning writes
    allowDestructive: false,
    allowBash: true,  // But classified commands blocked
    allowWrite: true, // But source mutations gated
  },
  trusted: {     // Most tools allowed
    allowDestructive: true,
    allowBash: true,
    allowWrite: true,
  },
  unrestricted: { // Everything
    allowDestructive: true,
    allowBash: true,
    allowWrite: true,
  },
};

// Subagent inheritance enforces parent policy
validateSubagentDispatch(envelope, proposal);

4 permission profiles
Subagent inheritance (parent → child)
Execution policy tool_call hook
Destructive command classifier

RA.Aid

# tools/shell.py
cowboy_mode = get_config_repository().get("cowboy_mode", False)

if not cowboy_mode:
    response = Prompt.ask(
        "Execute this command? (y=yes, n=no, c=enable cowboy mode)",
        choices=["y", "n", "c"],
        default="y",
    )
    if response == "n":
        return {"success": False, "output": "Cancelled"}
    elif response == "c":
        get_config_repository().set("cowboy_mode", True)

Binary: cowboy_mode on/off
Interactive approval per command
No subagent delegation (no inheritance needed)

Comparison

Aspect	SF	RA.Aid
Policy Granularity	4 profiles + model mode + work mode	Binary (cowboy_mode)
Approval UX	Policy-driven automatic	Interactive per-command
Subagent Inheritance	Full envelope propagation	N/A (no subagents)
Destructive Classification	Static list + dynamic analysis	None
Audit Trail	Journal + metrics	Trajectory

Verdict: SF's execution policy is far more sophisticated. RA.Aid's cowboy_mode is simpler but less safe.

6. Subagent System

SF

Full subagent system with:

Modes: single, chain, parallel, debate, background
Inheritance: Parent mode state propagates to children via env vars
Validation: Subagent dispatch blocked if it violates parent policy
Coordination: Parallel intent registry prevents conflicting work

// subagent-inheritance.js
export function validateSubagentDispatch(envelope, proposal) {
  // Block if provider not allowed
  // Block if heavy model in fast mode
  // Block if destructive tools in restricted mode
}

RA.Aid

No subagent system. RA.Aid is a single-agent system. It does not dispatch child agents.

Comparison

Aspect	SF	RA.Aid
Subagent Modes	5 modes	None
Inheritance	Full mode envelope	N/A
Parallel Work	Parallel intent registry	N/A
Debate Mode	Advocate + challenger	N/A

Verdict: SF has a significant advantage for complex multi-agent workflows.

7. Mode System

SF

Orthogonal axes:

Work Mode: chat | plan | build | review | repair | research
Run Control: manual | assisted | autonomous
Permission Profile: restricted | normal | trusted | unrestricted
Model Mode: fast | smart | deep
Surface: tui | web | headless | rpc

// Direct commands
/mode build
/control autonomous
/trust trusted
/model-mode deep

// TUI shortcuts
Ctrl+Shift+M  // Cycle work mode
Ctrl+Shift+A  // Autonomous
Ctrl+Shift+P  // Cycle permission

RA.Aid

Flags:

--research-only: Research only, no implementation
--research-and-plan-only: Research + plan, then exit
--hil: Human-in-the-loop
--chat: Chat mode (implies --hil)
--cowboy-mode: Skip shell approval

ra-aid -m "task" --research-only
ra-aid -m "task" --research-and-plan-only
ra-aid -m "task" --hil --chat

Comparison

Aspect	SF	RA.Aid
Work Mode	6 modes with transitions	2 flags (research-only, research-and-plan-only)
Run Control	3 levels	Implicit (hil/chat vs default)
Permission	4 profiles	1 flag (cowboy-mode)
Model Routing	3 modes (fast/smart/deep)	Per-task provider/model flags
Surface	4 surfaces	2 (CLI, server)
Keyboard Shortcuts	8 shortcuts	None
Mode Persistence	SQLite + terminal title	In-memory only

Verdict: SF's mode system is far more sophisticated and user-friendly.

8. Web UI

SF

TUI: Terminal UI with color bands, emojis, mode badges, cost overlay
Web: Next.js app with real-time updates
Headless: JSON/JSONL output for automation
RPC: gRPC/JSON-RPC for external control

sf tui          # Terminal UI
sf web          # Start web server
sf headless     # JSON output
sf rpc          # RPC server

RA.Aid

CLI: Rich console output with panels
Server: FastAPI server (optional)

ra-aid -m "task"           # CLI
ra-aid --server            # FastAPI on :1818

Comparison

Aspect	SF	RA.Aid
Terminal UI	Full TUI with mode badges	Rich panels
Web Interface	Next.js	FastAPI
Headless/Machine	JSON/JSONL event stream	None
Real-time Updates	WebSocket	HTTP polling
Multi-session	Session manager	Single session

Verdict: SF has a more complete multi-surface architecture.

9. Testing

SF

Runner: Vitest
Count: 144+ tests across 12 suites
Coverage: V8 provider, 40/40/20/20 thresholds
Types: Unit + integration + smoke + live

npm test              # All tests
npm run test:unit     # Unit only
npm run test:integration  # Integration
npm run test:smoke    # Smoke tests
npm run test:live     # Live tests (need env)

RA.Aid

Runner: pytest
Count: Unknown (not inspected)
Coverage: Unknown
Types: Unit tests

pytest tests/

Comparison

Aspect	SF	RA.Aid
Test Runner	Vitest	pytest
Test Count	144+	Unknown
Coverage	Enforced in CI	Unknown
Integration Tests	Yes	Unknown
Smoke Tests	Yes	Unknown
Live Tests	Yes	Unknown

Verdict: SF appears to have more comprehensive testing infrastructure.

10. Observability

SF

System	Purpose	Format
metrics-central.js	Aggregated metrics	Prometheus text
uok/audit.js	Per-unit audit trail	JSONL
journal.js	Mode transitions, decisions	SQLite
self-feedback.js	Inline self-correction	SQLite
TUI footer	Real-time cost/context	ANSI text

RA.Aid

System	Purpose	Format
Trajectory	Universal event log	SQLite (Peewee)
Cost CLI	Session cost queries	JSON
Work Log	Human-readable activity	SQLite
Console panels	Real-time status	Rich text

Comparison

Aspect	SF	RA.Aid
Metrics Format	Prometheus	None (DB queries)
Event Granularity	Per-unit + per-metric	Per-trajectory
Queryability	SQL + Prometheus	SQL only
Dashboard Ready	Yes (Grafana)	No
Real-time Display	TUI footer	Console panels

Verdict: SF is better for external observability (Prometheus). RA.Aid is better for internal debugging (unified trajectory).

11. Skills System

SF

# .agents/skills/my-skill/SKILL.md
---
name: my-skill
user-invocable: true
model-invocable: true
side-effects: none
permission-profile: normal
---
# Skill documentation...

YAML frontmatter
Hierarchical discovery
Permission filtering
Work-mode relevance
Eval harness

RA.Aid

No skill system. RA.Aid has custom tools (--custom-tools) but no structured skill framework.

Comparison

Aspect	SF	RA.Aid
Skill Definition	YAML frontmatter	Python module
Discovery	Hierarchical `.agents/skills/`	`--custom-tools` flag
Permissions	Per-skill profile	None
Eval	Built-in harness	None
Auto-creation	Pattern detection	None

Verdict: SF has a significant advantage for structured skill management.

12. Recovery & Resilience

SF

Mechanism	Purpose
Crash recovery	Resume from checkpoint after failure
Verification retry	Re-run failed verification gates
Rethink	Inject rethink prompt on stuck detection
Circuit breaker	Exponential backoff on gate failures
Cost guard	Block expensive operations
Writer tokens	Prevent concurrent writes
Parity system	Detect and recover from drift

RA.Aid

Mechanism	Purpose
Fallback handler	Switch to alternative models on failure
Retry with backoff	Re-run failed agent invocations
Token limiter	Remove old messages to prevent overflow
Recursion limit	Prevent infinite loops

Comparison

Aspect	SF	RA.Aid
Checkpoint/Resume	Yes	No
Model Fallback	Yes (on 429/rate-limit)	Yes
Token Management	No	Yes (limiter)
Circuit Breaker	Yes	No
Cost Guard	Yes	No (budget only)
Concurrent Write Prevention	Yes (writer tokens)	No

Verdict: Different strengths. SF better for operational resilience; RA.Aid better for model resilience.

13. MCP Integration

SF

MCP Client: Full MCP client with tool discovery, resource listing, OAuth
MCP Server Guard: Explicitly forbidden (test enforces this)

// No SF MCP server — client only
pi.registerMcpClient("filesystem", { ... });

RA.Aid

No MCP integration. RA.Aid uses LangChain tools directly.

Comparison

Aspect	SF	RA.Aid
MCP Client	Yes	No
MCP Server	Explicitly forbidden	N/A
Tool Discovery	Dynamic from MCP servers	Static tool definitions

Verdict: SF is ahead for MCP ecosystem integration.

14. Provider Abstraction

SF

// pi-ai package
const provider = await resolveProvider("anthropic", "claude-sonnet-4");
const response = await provider.complete(prompt, { thinking: true });

Abstract provider interface
Model mode routing (fast/smart/deep)
Temperature/thinking level management
Provider allowlists/blocklists

RA.Aid

# llm.py
model = initialize_llm(provider, model, temperature=temperature)
response = model.invoke(prompt)

LiteLLM for provider abstraction
Per-task provider/model override
Temperature support
Expert model consultation

Comparison

Aspect	SF	RA.Aid
Abstraction Layer	Custom (pi-ai)	LiteLLM
Model Routing	Mode-based (fast/smart/deep)	Explicit flags
Expert Model	No	Yes (reasoning_assist)
Temperature	Yes	Yes
Thinking Level	Yes	No

Verdict: RA.Aid's expert model consultation is a unique feature. SF's mode-based routing is more automatic.

15. Documentation & Prompt Engineering

SF

AGENTS.md: Project-specific instructions
CLAUDE.md: Claude-specific guidance
PDD: Purpose-Driven Development fields
Skills: .agents/skills/ with structured prompts
Prompt History: Per-project JSONL

RA.Aid

Prompt Templates: Separate files per agent
Expert Prompts: Optional expert consultation
Human Prompts: HIL sections
Custom Tools: Dynamic tool injection

Comparison

Aspect	SF	RA.Aid
Prompt Organization	Skills + PDD	Agent-specific files
Expert Consultation	Model mode routing	Explicit reasoning_assist
Human-in-the-loop	Permission profiles	--hil flag
Custom Tools	Skill system	--custom-tools flag
Prompt Versioning	Git-tracked skills	Package-bundled

Verdict: SF's skill system is more structured. RA.Aid's expert consultation is more dynamic.

Overall Assessment

SF Strengths

Mode system: 5 axes of control vs RA.Aid's binary flags
Subagent system: Full delegation with inheritance
Skills system: Structured, evaluable, discoverable
MCP integration: Client-only, ecosystem-ready
Execution policy: Granular permission profiles
Observability: Prometheus-compatible metrics
Multi-surface: TUI + web + headless + RPC

RA.Aid Strengths

Explicit pipeline: Clear research → plan → implement flow
Expert consultation: Dynamic reasoning assistance
Cost tracking: Built-in aggregation and CLI queries
Repository pattern: Clean data access
~~Fallback handling~~: SF already has model switching on 429/rate-limit
Token limiting: Prevent context overflow
Simplicity: Easier to understand and modify

Where SF Should Borrow from RA.Aid

Explicit stage boundaries: Add /research, /plan, /implement commands that mirror RA.Aid's agent pipeline
Expert consultation: Add optional "expert model" for reasoning assistance before complex operations
Cost CLI: Add sf cost --session, sf cost --all commands
Repository pattern: Formalize data access with repository classes
Token limiting: Add context window management
~~Fallback handler~~: SF already has model fallback on 429/rate-limit errors

Where RA.Aid Should Borrow from SF

Mode system: Add work modes, permission profiles, model modes
Subagent system: Add delegation for parallel work
Execution policy: Replace cowboy_mode with granular profiles
Skills system: Add structured skill framework
MCP integration: Add MCP client support
UOK gates: Add safety checkpoints between stages
Observability: Add Prometheus metrics

Conclusion

SF and RA.Aid are complementary rather than competitive:

SF is a platform: modular, multi-surface, safety-first, designed for complex multi-agent workflows
RA.Aid is a tool: focused, simple, explicit, designed for single-agent coding tasks

The ideal system would combine:

SF's mode system + subagent system + skills system
RA.Aid's explicit pipeline + expert consultation + cost tracking
Both projects' DB-first state philosophy

23 KiB Raw Blame History Unescape Escape

SF vs RA.Aid — Full Feature Comparison

Executive Summary

1. Architecture & State Model

SF

RA.Aid

Comparison

2. Agent Stage Boundaries

SF: UOK Gate System

RA.Aid: Explicit Agent Pipeline

Comparison

3. Memory System

SF

RA.Aid

Comparison

4. Cost Tracking

SF

RA.Aid

Comparison

5. Shell Safety & Execution Policy

SF

RA.Aid

Comparison

6. Subagent System

SF

RA.Aid

Comparison

7. Mode System

SF

RA.Aid

Comparison

8. Web UI

SF

RA.Aid

Comparison

9. Testing

SF

RA.Aid

Comparison

10. Observability

SF

RA.Aid

Comparison

11. Skills System

SF

RA.Aid

Comparison

12. Recovery & Resilience

SF

RA.Aid

Comparison

13. MCP Integration

SF

RA.Aid

Comparison

14. Provider Abstraction

SF

RA.Aid

Comparison

15. Documentation & Prompt Engineering

SF

RA.Aid

Comparison

Overall Assessment

SF Strengths

RA.Aid Strengths

Where SF Should Borrow from RA.Aid

Where RA.Aid Should Borrow from SF

Conclusion

23 KiB

Raw Blame History