singularity-forge/docs/dev/ADR-007-model-catalog-split.md
ace-pm b29c12d5e5 refactor(native): rename gsd_parser.rs to forge_parser.rs
Final rebrand: rename remaining Rust source file to complete the gsd → forge
transition. All parser references already use forge_parser after earlier commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:58:21 +02:00

16 KiB

ADR-007: Model Catalog Split and Provider API Encapsulation

Status: Proposed Date: 2026-04-03 Deciders: Jeremy McSpadden Related: ADR-004 (capability-aware model routing), ADR-005, ADR-006, packages/pi-ai/src/providers/, packages/pi-ai/src/models.ts

Context

The model/provider system in pi-ai has two structural problems worth fixing — but the system is not fundamentally broken. The heavy lifting (lazy SDK imports, registry-based dispatch, extension-based registration) is already well-designed. This ADR targets the two areas where the current design creates real friction without proposing unnecessary runtime changes.

Current Architecture

stream.ts
  └─ import "./providers/register-builtins.js"  ← side-effect import at load time
       ├─ import anthropic.ts            (6.8 KB)
       ├─ import anthropic-vertex.ts     (3.9 KB)
       ├─ import openai-completions.ts   (26 KB)
       ├─ import openai-responses.ts     (6.4 KB)
       ├─ import openai-codex-responses.ts (29 KB)
       ├─ import azure-openai-responses.ts (7.8 KB)
       ├─ import google.ts              (13.6 KB)
       ├─ import google-vertex.ts       (14.5 KB)
       ├─ import google-gemini-cli.ts   (30 KB)
       ├─ import mistral.ts             (18.9 KB)
       └─ amazon-bedrock.ts             (24 KB) ← only lazy-loaded provider

models.ts
  └─ import models.generated.ts   ← 13,848 lines, ALL providers, loaded at init
  └─ import models.custom.ts      ← 197 lines, additional providers

What Already Works Well

  1. SDK lazy loading. Every provider file uses async function getXxxClass() with a cached dynamic import(). The heavy npm packages (@anthropic-ai/sdk, openai, @google/genai, @aws-sdk/*, @mistralai/*) are only loaded on first API call. This is where the real startup cost would be — and it's already handled.

  2. Registry-based dispatch. api-registry.ts cleanly maps API types to stream functions. Callers use stream(model, context) and the registry routes to the right provider. This pattern is sound.

  3. Extension registration. Ollama and Claude Code CLI register via registerApiProvider() at runtime. This extensibility point works correctly.

  4. Provider implementation code loading (~200KB total). While all providers load eagerly, V8 parses local .js files in single-digit milliseconds each. The total parse cost for all provider files is ~10-30ms — not a user-visible bottleneck on a CLI that's about to make a multi-second API call anyway.

What's Actually Worth Fixing

Problem 1: Monolithic model catalog — developer experience, not runtime

models.generated.ts is 13,848 lines in a single file. This creates real friction:

  • PR reviews are painful. When the generation script runs, the diff is a wall of changes across unrelated providers. Reviewers can't tell what actually changed for a specific provider.
  • Navigation is slow. Finding a specific model requires scrolling or searching through thousands of lines of static object literals.
  • Merge conflicts are frequent. Any two PRs that touch model generation will conflict on the same monolithic file.
  • Git blame is useless. Every line was "last changed" by the generation script, obscuring the history of individual provider additions.

The runtime cost of loading all model definitions is negligible — a Map of ~200 model objects is maybe 50-100KB of heap. The problem is purely about code organization and developer workflow.

Problem 2: Barrel export leaks provider internals — API design

packages/pi-ai/src/index.ts re-exports every provider module's internals:

export * from "./providers/anthropic.js";
export * from "./providers/google.js";
export * from "./providers/google-gemini-cli.js";
export * from "./providers/google-vertex.js";
export * from "./providers/mistral.js";
export * from "./providers/openai-completions.js";
export * from "./providers/openai-responses.js";
// ... etc

This is a public API problem:

  • Consumers can bypass the registry. Any code that import { streamAnthropic } from "pi-ai" has a direct dependency on an implementation detail that should be internal.
  • Refactoring is blocked. Renaming a function inside a provider file is a breaking change because it's re-exported from the package root.
  • API surface is unnecessarily large. The public API should be stream(), streamSimple(), registerApiProvider(), model utilities, and types. Provider-specific stream functions are implementation details.

What Is NOT Worth Changing

Lazy provider loading (converting register-builtins.ts to async on-demand loading). This was considered and rejected because:

  1. The SDKs are already lazy. The heavy cost is handled. Provider implementation code (~200KB of local .js) parses in ~10-30ms total.
  2. Async resolution adds complexity to the hot path. stream.ts currently does a synchronous Map.get(). Making resolveApiProvider async adds a microtask hop to every API call — not just the first. Small but measurable, and for no user-visible gain.
  3. High blast radius, low payoff. Touching stream.ts, api-registry.ts, and the registration lifecycle simultaneously risks regressions in the core streaming path for an optimization that wouldn't show up in profiling.
  4. Bedrock's lazy loading is a special case, not a template. It exists because @aws-sdk/client-bedrock-runtime is uniquely massive. Generalizing this pattern to providers where the SDK is already lazy-imported doesn't compound the benefit.

Decision

Make two targeted improvements to code organization and API hygiene. Do not change runtime loading behavior.

Change 1: Split models.generated.ts into per-provider files

Replace the monolithic 13,848-line generated file with per-provider files:

packages/pi-ai/src/models/
  ├── index.ts                  ← re-exports combined registry, same public API
  ├── generated/
  │   ├── anthropic.ts          ← Anthropic model definitions
  │   ├── openai.ts             ← OpenAI model definitions
  │   ├── google.ts             ← Google model definitions
  │   ├── mistral.ts            ← Mistral model definitions
  │   ├── amazon-bedrock.ts     ← Bedrock model definitions
  │   ├── groq.ts               ← Groq model definitions
  │   ├── xai.ts                ← xAI model definitions
  │   ├── cerebras.ts           ← Cerebras model definitions
  │   ├── openrouter.ts         ← OpenRouter model definitions
  │   └── ...                   ← one file per provider in the catalog
  ├── custom.ts                 ← replaces models.custom.ts (unchanged content)
  └── capability-patches.ts     ← CAPABILITY_PATCHES extracted for clarity

models/index.ts keeps the exact same synchronous public API:

// models/index.ts
// SF — Model registry (split by provider for maintainability)

import { ANTHROPIC_MODELS } from "./generated/anthropic.js";
import { OPENAI_MODELS } from "./generated/openai.js";
import { GOOGLE_MODELS } from "./generated/google.js";
// ... one import per provider

import { CUSTOM_MODELS } from "./custom.js";
import { CAPABILITY_PATCHES, applyCapabilityPatches } from "./capability-patches.js";
import type { Api, KnownProvider, Model, Usage } from "../types.js";

// Combine all generated models into single registry — same as today
const MODELS = {
  ...ANTHROPIC_MODELS,
  ...OPENAI_MODELS,
  ...GOOGLE_MODELS,
  // ...
};

// Rest of the file is identical to current models.ts:
// modelRegistry Map construction, capability patch application,
// getModel(), getProviders(), getModels(), calculateCost(),
// supportsXhigh(), modelsAreEqual()

Key constraint: loading stays synchronous and eager. All model files are statically imported. The Map is built at module init exactly as today. No async, no lazy loading, no runtime behavior change. This is purely a file organization change.

Update generate-models.ts to emit one file per provider instead of a single models.generated.ts. The script already groups models by provider internally — it just needs to write separate files instead of one.

Why this matters

Before After
PR diffs show 13K-line file changes PR diffs scoped to the provider that changed
Merge conflicts on any concurrent model update Conflicts only when same provider is touched
git blame shows "regenerate models" for every line git blame shows per-provider history
Finding a model = search through 13K lines Finding a model = open the provider file
One reviewer must understand all providers Reviewers only need context for affected provider

Change 2: Stop barrel-exporting provider internals

Update packages/pi-ai/src/index.ts:

// Before (current — 17 re-exports including all providers):
export * from "./providers/anthropic.js";
export * from "./providers/azure-openai-responses.js";
export * from "./providers/google.js";
export * from "./providers/google-gemini-cli.js";
export * from "./providers/google-vertex.js";
export * from "./providers/mistral.js";
export * from "./providers/openai-completions.js";
export * from "./providers/openai-responses.js";
export * from "./providers/register-builtins.js";
// ...

// After (clean public API):
export * from "./api-registry.js";
export * from "./env-api-keys.js";
export * from "./models/index.js";
export * from "./providers/register-builtins.js";  // resetApiProviders() is public
export * from "./stream.js";
export * from "./types.js";
export * from "./utils/event-stream.js";
export * from "./utils/json-parse.js";
export type { OAuthAuthInfo, OAuthCredentials, /* ... */ } from "./utils/oauth/types.js";
export * from "./utils/overflow.js";
export * from "./utils/typebox-helpers.js";
export * from "./utils/repair-tool-json.js";
export * from "./utils/validation.js";

Provider-specific exports (streamAnthropic, streamGoogle, etc.) are removed from the public API. Any external consumer that imported them directly should use the registry-based stream() / streamSimple() functions instead — which is how all internal callers already work.

Why this matters

  • Enforces the registry pattern. The correct way to call a provider is stream(model, context). Direct provider function imports create fragile coupling.
  • Enables future refactoring. Provider internal function signatures can change without breaking the package API. Today, renaming streamAnthropic would be a semver-breaking change.
  • Reduces API surface. Consumers see only what they need: stream, streamSimple, registerApiProvider, model utilities, and types.

What Does NOT Change

  • Runtime behavior — all providers still load eagerly, same as today
  • The Model<TApi> type system — all types, interfaces, and generics stay the same
  • The ApiProvider interface — providers still implement { api, stream, streamSimple }
  • The api-registry.ts registry — synchronous Map.get() dispatch, unchanged
  • stream.ts — no changes to the streaming entry point
  • register-builtins.ts — still eagerly imports and registers all providers (only resetApiProviders remains in barrel export)
  • The extension systemregisterApiProvider() continues to work for Ollama, Claude Code CLI, etc.
  • models.json user config — custom models, overrides, provider settings are unaffected
  • Model discovery — discovery adapters are already lazy and independent
  • Model routing — ADR-004's capability-aware routing is orthogonal

Consequences

Positive

  1. Cleaner PRs. Model catalog changes are scoped to the provider that changed. Reviewers see a 200-line diff in models/generated/openai.ts instead of a 13K-line diff in models.generated.ts.

  2. Fewer merge conflicts. Two PRs that update different providers no longer conflict on the same file.

  3. Better navigability. Developers can jump directly to models/generated/anthropic.ts to see Anthropic's model definitions instead of searching through a monolith.

  4. Cleaner package API. pi-ai exports only what consumers need. Provider internals are properly encapsulated.

  5. Future-proofs refactoring. Provider implementation details can evolve without breaking the public API contract.

  6. Zero runtime risk. No changes to loading, registration, streaming, or dispatch. The refactor is purely structural.

Negative

  1. More files. Instead of 1 generated file + 1 custom file, we'll have ~15-20 generated files. Marginal complexity increase, but each file is focused and small.

  2. Generation script update. generate-models.ts needs to write per-provider files. The script already groups by provider, so this is straightforward but requires testing.

  3. Import audit for barrel export change. Any code that directly imports streamAnthropic (etc.) from pi-ai needs to be updated. Based on research, the main consumer is register-builtins.ts itself, which imports providers directly (not through the barrel). External usage should be minimal.

Alternatives Considered

1. Full lazy provider loading (original ADR-005 proposal)

Make all providers load on-demand via async dynamic imports, generalizing the Bedrock pattern. Rejected because:

  • SDK imports are already lazy — the heavy cost is handled
  • Provider implementation parsing is ~10-30ms total — not a bottleneck
  • Adds async complexity to the synchronous stream dispatch hot path
  • High migration effort and regression risk for unmeasurable performance gain

2. Plugin architecture with separate npm packages

Move each provider to its own package (@sf/provider-anthropic, etc.). Maximum isolation but dramatically more complex build/release/versioning. Overkill for a monorepo where all providers ship together.

3. Do nothing

The current architecture works. This is a valid choice. The split is justified by the developer experience friction (13K-line file, merge conflicts, unusable git blame) and the API hygiene issue (leaking provider internals), not by a runtime problem. If the team is not experiencing these friction points, deferring is reasonable.

Implementation Plan

Wave 1: Split Model Catalog (Low-Medium Risk)

  1. Update generate-models.ts to emit per-provider files into models/generated/
  2. Create models/index.ts that imports all per-provider files and builds the same registry
  3. Extract CAPABILITY_PATCHES into models/capability-patches.ts
  4. Move models.custom.ts to models/custom.ts
  5. Update imports in models.ts (or replace it with the new models/index.ts)
  6. Verify npm run build and npm run test pass
  7. Delete models.generated.ts and models.custom.ts

Wave 2: Clean Up Barrel Export (Low Risk)

  1. Remove provider re-exports from index.ts
  2. Grep for direct provider imports from "pi-ai" across the codebase
  3. Migrate any found usages to use stream() / streamSimple() through the registry
  4. Verify build and tests

Wave 3: Validate

  1. Run full test suite
  2. Verify extension registration (Ollama, Claude Code CLI) still works
  3. Verify resetApiProviders() test helper still works
  4. Spot-check a few providers end-to-end

References

  • Current model catalog: packages/pi-ai/src/models.generated.ts (13,848 lines)
  • Current barrel export: packages/pi-ai/src/index.ts
  • Model registry: packages/pi-ai/src/models.ts
  • API provider registry: packages/pi-ai/src/api-registry.ts
  • Eager registration: packages/pi-ai/src/providers/register-builtins.ts
  • Stream dispatch: packages/pi-ai/src/stream.ts
  • Generation script: packages/pi-ai/scripts/generate-models.ts
  • Extension registration: packages/pi-coding-agent/src/core/model-registry.ts
  • ADR-004: docs/ADR-004-capability-aware-model-routing.md