singularity-forge/docs/dev/ADR-014-singularity-knowledge-and-agent-platform.md

10 KiB
Raw Permalink Blame History

ADR-014: Singularity Knowledge + Agent Platform stack

Date: 2026-04-29 Status: proposed (deferred — capture for staged execution) Revised: 2026-05-02 — Phase 4 cancelled, see ADR-019

Context

Older SPEC notes define a cross-instance knowledge layer (Singularity Memory) and sketch persistent agents plus inter-agent messaging. Treat those notes as historical input, not current SF source of truth. sf instances today carry their own local memory store (memory-store.ts); persistent agents are not implemented at all.

Two trajectories converge:

  • Knowledge federates — anti-patterns, learnings, contracts should be reachable across sf instances and across other agent products on the tailnet (Hermes, OpenClaw, Claude Code, Cursor).
  • Persistent agents centralise — long-lived cross-project agents (code-reviewer with cross-project memory, memory-curator, security-auditor, build-watch) are too heavy and too cross-cutting to live per-project.

These two needs collapse into one service: the Singularity Knowledge + Agent Platform — a single Go server hosting the federated memory store and the central persistent-agent runtime. (Note: the persistent-agent runtime portion — Phase 4 — has since been cancelled by ADR-019. This ADR's active scope is the knowledge layer only, Phases 03.)

This ADR fixes the stack.

The implementation arm of this ADR lives in singularity-memory/MIGRATION.md.

Decision

  • Language: Go.
  • Storage backbone: Postgres + vchord (existing) — accessed from Go via pgx. No data migration; same schema, same vchord index.
  • Identity / auth / sync layer: charmbracelet/charm-server patterns — SSH-key identity, JWT issuance, encrypted KV for user-level prefs and config. Adopted as ported library code; not run as a sidecar.
  • Agent runtime: charmbracelet/fantasy — multi-provider LLM access (Anthropic, OpenAI, Google, Bedrock, OpenRouter, etc. via catwalk). Used for embeddings/summarisation today. (The original plan to grow this into a full central persistent-agent runtime — Phase 4 — is cancelled by ADR-019. fantasy is retained for embeddings/summarisation within the knowledge layer only.)
  • HTTP API: Go net/http + chi or echo router, serving the exact current OpenAPI contract.
  • HTTP API compatibility: preserve the current OpenAPI contract. SF remains an HTTP/RPC client of the knowledge layer and does not expose its workflow as an MCP server.
  • CLI scaffolding: charmbracelet/fang.
  • Observability: promwish-style Prometheus metrics, scraped from a shared metrics endpoint.
  • Admin UI (Phase 3): pony + ultraviolet for the view layer (reversed from earlier deferral; now adopted as a deliberate foundation bet — admin UI tolerates churn better than user-facing surfaces). Served over SSH via wish.

Alternatives Considered

Stack

  • Stay Python + FastAPI + Postgres. Status quo. Works today.
    • Rejected: misses the foundation bet for central persistent agents from the older SPEC notes. Building those on Python + raw OpenAI/Anthropic SDK calls means retrofitting fantasy-style agent semantics later — real refactor cost. The trigger to migrate isn't pain in the current server; it's foundation laying for what comes next.
  • Rust + axum + Postgres. Uniformly fast, but Charm's agentic ecosystem (fantasy, catwalk, wish, charm-server, the entire Bubble Tea family) is Go-native. Rust on the server side would mean reimplementing those abstractions or shelling out. Rejected — wrong ecosystem.
  • TypeScript + Node + Postgres. Keeps language alignment with sf core. But sf is moving toward parallel-build (ADR-016): TS in sf core, Go in new services. The Node ecosystem doesn't have an equivalent to fantasy + charm-server + Wish. Rejected.

Storage backbone

  • Replace Postgres + vchord with charm-server's native KV. charm-server is a personal/team encrypted KV; it's not a vector DB or BM25 index. We'd lose retrieval sophistication. Rejected.
  • Replace Postgres with sqlite-vec. Embeddable single-binary deployment is appealing, but BM25 quality on tsvector is hard to match without a full re-tune, and we'd be redoing data migration on top. Rejected for v1; revisit in a v2 retrieval ADR if the Go server needs to ship without Postgres.
  • Keep Postgres + vchord, connect via Go pgx. ← chosen. Battle-tested retrieval, zero data migration, focus the migration on language/runtime/agent-platform changes only.

Agent runtime

  • Direct SDK calls (anthropic-sdk-go, openai-go, go-genai). Simplest for today's narrow LLM use (embeddings + summarisation). But future central persistent agents need agent-loop semantics (multi-turn, tool calls); building those on raw SDKs reinvents fantasy's abstractions. Rejected — foundation bet. (Phase 4 is now cancelled by ADR-019, so the persistent-agent motivation no longer applies; however fantasy is still chosen for its clean multi-provider API for embeddings/summarisation.)
  • Build our own agent runtime in Go. Pure NIH. Rejected.
  • charmbracelet/fantasy. ← chosen. 730 stars, actively developed, clean API, multi-provider via catwalk.

Consequences

Positive

  • Foundation is right for the knowledge layer. (The original "foundation for central persistent agents" rationale is superseded — Phase 4 is cancelled by ADR-019. Persistent agents now live as Firecracker VM snapshots managed by ACE.)
  • Single static Go binary is operationally simpler than Python uv/venv + Alembic + worker on each deployment host.
  • Charm ecosystem alignment with sf-worker (ADR-013), flight recorder (ADR-015), Charm TUI client (ADR-017). One language for the new-services tier.
  • Wire contract preserved — clients are zero-touch.

Negative

  • Migration is a real undertaking — ~12 weeks total, with the recall endpoint as the critical parity gate. See MIGRATION.md.
  • Polyglot deployment grows — Python (during transition) + Go (new) + TS (sf core) + Rust (sf native). Bounded; once Python retires, three languages with clear boundaries.
  • fantasy and pony are pre-1.0 — API churn is real.

Risks and mitigations

  • Risk: recall quality regression between Python and Go.
    • Mitigation: held-out evaluation set; ±2% recall@k threshold enforced in CI before flipping traffic.
  • Risk: pgx + vchord custom-type decoder edge cases.
    • Mitigation: prove out in Phase 1 against a small endpoint; engage vchord author if blocked.
  • Risk: fantasy API churn during the migration.
    • Mitigation: pin a version; one planned upgrade midway through the migration.
  • Risk: central agents prove unworkable as a model and we've over-built the foundation.
    • Mitigation: the foundation cost is incremental (fantasy ≈ raw SDK + a thin abstraction). Worst case we use fantasy for embeddings only and never grow it. No wasted bet. (Moot — Phase 4 is cancelled by ADR-019; fantasy stays scoped to the knowledge layer.)

Out of Scope

  • Cross-tenant Singularity Memory — single trust domain per deployment.
  • Retrieval-pipeline redesign — BM25 + vector + RRF + reranker semantics are preserved exactly.
  • DB migration — Postgres + vchord stay.
  • Public-internet endpoint — tailnet only per ADR-013.

Sequencing

Phase What Cost
0 Prep: commit OpenAPI spec, build test suite, set up CI (per existing TODO.md) 12 weeks
1 Greenfield Go scaffold parallel to Python; first endpoint (GET /v1/banks) 23 weeks
2 Endpoint parity (recall is the critical gate) 48 weeks
3 Worker + admin UI (pony + ultraviolet on wish) 23 weeks
4 Central persistent-agent host variable
5 Python deprecation 1 week

Total: ~12 weeks for Phases 03 + Phase 5. Phase 4 is cancelled — see section below.

Phase 4 — Cancelled (See ADR-019)

Phase 4 was originally planned as a "central persistent-agent runtime" built on charmbracelet/fantasy inside singularity-memory's Go server. ADR-019 (Workspace VM Convergence, 2026-05-01) supersedes this plan entirely.

What replaced it: Persistent agents now live as Firecracker VM snapshots managed by ACE's orchestration layer. A "persistent agent" is a named VM snapshot: restore it, and the agent wakes with its full memory and context intact. singularity-memory's scope is now strictly the knowledge layer (Phases 03). See ADR-019 § "ADR-014 Phase 4 is reassigned" for the authoritative statement.

Historical: Original Phase 4 Plan

The content below is the original Phase 4 design, preserved as a historical record. It is not the current plan.

The original Phase 4 called for singularity-memory's Go server to host a central persistent-agent runtime using charmbracelet/fantasy. Long-lived cross-project agents (code-reviewer, memory-curator, security-auditor, build-watch) would run there, with their state managed by the same Postgres store. This depended on the older persistent-agent notes being fully scoped ("status NEW" at ADR-014's writing date).

The rationale for building this in singularity-memory was ecosystem alignment with fantasy + charm-server + wish and avoiding per-project agent redundancy. The timeline was listed as "variable" because persistent-agent scope had not been fully defined.

ADR-019 made this moot by choosing a cleaner isolation model (hypervisor-level VM snapshots) that is language-agnostic inside the VM, multi-tenant by construction, and owned by ACE rather than a shared Go server.

References

  • MIGRATION.md (singularity-memory repo) — implementation arm.
  • Older SPEC notes §16 — Knowledge Layer historical input.
  • Older SPEC notes §1718 — Persistent Agents and Inter-Agent Messaging historical input.
  • ADR-012 — Multi-instance federation (this is one of its surfaces).
  • ADR-013 — Network and remote-execution (deployment substrate).
  • ADR-016 — Charm AI stack adoption (frames the polyglot decision).
  • charmbracelet/charm — KV with sync (auth/identity patterns ported here).
  • charmbracelet/fantasy — agent runtime.
  • charmbracelet/catwalk — provider/model registry.