singularity-forge/.plans/issue-125-provider-fallback.md
Flux Labs adca6901ec feat: add cross-provider fallback when rate/quota limits are hit (#125)
When all credentials for a provider are exhausted, the system now
automatically falls back to the next available provider in a
user-configured fallback chain. Higher-priority providers are
restored automatically when their backoff expires.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 15:45:44 -05:00

13 KiB

Issue #125: Provider Fallback When Multiple Providers Configured

Copyright (c) 2026 Jeremy McSpadden jeremy@fluxlabs.net

Overview

Add cross-provider fallback so that when a provider hits rate/quota limits, the system automatically switches to another provider that serves an equivalent model (or a user-configured fallback chain of different models).

Current State

The codebase already supports:

  • Multi-credential per provider — round-robin or session-sticky selection
  • Per-credential backoff tracking — rate_limit (30s), quota_exhausted (30min), server_error (20s)
  • Credential rotation on errormarkUsageLimitReached() backs off one key and returns whether another key exists for the same provider
  • Retry with exponential backoff — 3 retries, 2s/4s/8s delays
  • Error classification — quota_exhausted, rate_limit, server_error, unknown

The gap: fallback only works within a single provider (multiple API keys). There is no mechanism to fall back to a different provider serving the same or equivalent model.


Architecture

Phase 1: Fallback Chain Configuration & Storage

Goal: Let users define ordered fallback chains that map a primary model to backup model+provider combos.

1.1 — Settings Schema (settings-manager.ts)

Add a new top-level setting:

interface FallbackChainEntry {
  provider: string;       // e.g. "zai", "alibaba", "openai"
  model: string;          // e.g. "glm-5", "claude-opus-4-6"
  priority: number;       // lower = higher priority (1 = primary)
}

interface FallbackSettings {
  enabled: boolean;                          // default: false
  chains: Record<string, FallbackChainEntry[]>;  // keyed by chain name
  // Example:
  // "coding": [
  //   { provider: "zai", model: "glm-5", priority: 1 },
  //   { provider: "alibaba", model: "glm-5", priority: 2 },
  //   { provider: "openai", model: "gpt-4.1", priority: 3 }
  // ]
}

Files to modify:

  • packages/pi-coding-agent/src/core/settings-manager.ts — add getFallbackSettings(), setFallbackChain(), removeFallbackChain(), getter/setter for fallback.enabled

1.2 — Settings File Location

Stored in the existing ~/.pi/agent/settings.json under a new fallback key.

1.3 — CLI Configuration Commands

Add subcommands to the existing settings CLI:

  • pi settings fallback enable/disable
  • pi settings fallback add-chain <name> --provider <p> --model <m> --priority <n>
  • pi settings fallback remove-chain <name>
  • pi settings fallback list

Files to modify:

  • packages/pi-coding-agent/src/cli/commands/settings.ts (or equivalent CLI entry point)

Phase 2: Provider-Level Backoff Tracking

Goal: Track backoff state at the provider level (not just credential level) so the fallback system knows when an entire provider is unavailable.

2.1 — Extend AuthStorage (auth-storage.ts)

Add a provider-level backoff map alongside the existing credential-level one:

private providerBackoff: Map<string, number> = new Map();
// Map<provider, backoffExpiresAt>

New methods:

markProviderExhausted(provider: string, errorType: UsageLimitErrorType): void
isProviderAvailable(provider: string): boolean
getProviderBackoffRemaining(provider: string): number  // ms until available, 0 if available

Logic: When markUsageLimitReached() returns false (all credentials for a provider are backed off), also mark the provider itself as backed off with the longest remaining credential backoff duration.

Files to modify:

  • packages/pi-coding-agent/src/core/auth-storage.ts

Phase 3: Fallback Resolution Engine

Goal: Given a current model+provider that just failed, find the next available fallback from the configured chain.

3.1 — FallbackResolver (fallback-resolver.ts — new file)

// packages/pi-coding-agent/src/core/fallback-resolver.ts

export interface FallbackResult {
  model: Model<Api>;
  reason: string;  // "quota_exhausted on zai, falling back to alibaba"
}

export class FallbackResolver {
  constructor(
    private settings: SettingsManager,
    private authStorage: AuthStorage,
    private modelRegistry: ModelRegistry,
  ) {}

  /**
   * Find the next available fallback for the current model.
   * Returns null if no fallback is configured or available.
   */
  async findFallback(
    currentModel: Model<Api>,
    errorType: UsageLimitErrorType,
  ): Promise<FallbackResult | null> {
    // 1. Check if fallback is enabled
    // 2. Find chain(s) containing currentModel's provider+model
    // 3. Sort by priority
    // 4. Skip entries where provider is backed off
    // 5. Skip entries without valid API keys
    // 6. Return first available, or null
  }

  /**
   * Find the chain a model belongs to.
   */
  findChainForModel(provider: string, modelId: string): FallbackChainEntry[] | null

  /**
   * Get the highest-priority available model from a chain.
   * Used on session start to pick the best available model.
   */
  async getBestAvailable(chainName: string): Promise<FallbackResult | null>
}

3.2 — Model Equivalence

For same-model cross-provider fallback (Phase 1 of the feature), the chain entries explicitly name the provider+model pairs. No automatic equivalence detection needed — the user defines what's equivalent.


Phase 4: Integrate Fallback into Retry Flow

Goal: When credential rotation fails (all keys for a provider exhausted), try the fallback chain before giving up or doing exponential backoff.

4.1 — Modify _handleRetryableError() (agent-session.ts)

Current flow:

1. Classify error
2. Try credential rotation within provider → if success, retry immediately
3. If quota_exhausted and all backed off → give up
4. Exponential backoff retry

New flow:

1. Classify error
2. Try credential rotation within provider → if success, retry immediately
3. ** Try provider fallback via FallbackResolver **
   a. If fallback found → swap model on agent, retry immediately
   b. Emit event: "fallback_provider_switch" with old/new provider info
4. If quota_exhausted and no fallback → give up
5. Exponential backoff retry

Key changes in agent-session.ts (~lines 2317-2370):

// After credential rotation fails:
if (!hasAlternate) {
  const fallbackResult = await this.fallbackResolver?.findFallback(
    this.agent.model,
    errorType,
  );

  if (fallbackResult) {
    // Swap to fallback model
    this.agent.setModel(fallbackResult.model);
    this._removeLastError();
    this._emitEvent("auto_retry_start", {
      attempt: this._retryAttempt + 1,
      delayMs: 0,
      reason: fallbackResult.reason,
    });
    await this.agent.continue();
    return true;
  }
}

4.2 — Agent Model Swapping

The agent needs a method to swap its model mid-conversation:

// agent.ts or agent-loop.ts
setModel(model: Model<Api>): void {
  this.config.model = model;
  // Re-resolve API key for new provider
}

Important: The API key must also be re-resolved since we're switching providers. The getApiKey callback in AgentOptions already takes a provider string, so this should work naturally.

Files to modify:

  • packages/pi-coding-agent/src/core/agent-session.ts
  • packages/pi-ai/src/agent.ts or packages/pi-ai/src/agent-loop.ts

Phase 5: Provider Restoration (Auto-Upgrade)

Goal: When a higher-priority provider's backoff expires, switch back to it.

5.1 — Pre-Request Priority Check

Before each LLM request, check if a higher-priority provider in the chain has become available again:

// In agent-loop.ts streamAssistantResponse(), before calling streamFn:
if (this.fallbackResolver) {
  const bestAvailable = await this.fallbackResolver.getBestAvailable(currentChain);
  if (bestAvailable && bestAvailable.model.provider !== currentModel.provider) {
    // Upgrade back to higher-priority provider
    this.setModel(bestAvailable.model);
    this._emitEvent("fallback_provider_restored", { ... });
  }
}

5.2 — Quota Reset Awareness (Future Enhancement)

For now, rely on backoff expiry times. A future enhancement could:

  • Parse rate limit headers for reset timestamps
  • Store per-provider quota windows (5-hour, daily, weekly, monthly)
  • Predict when quota will restore based on usage patterns

This is complex and should be a separate issue.


Phase 6: User-Facing Events & UI

Goal: Surface fallback activity to the user so they know what's happening.

6.1 — New Events

type FallbackEvent =
  | { type: "fallback_provider_switch"; from: string; to: string; reason: string }
  | { type: "fallback_provider_restored"; provider: string; reason: string }
  | { type: "fallback_chain_exhausted"; chain: string; reason: string }

6.2 — TUI Integration

Display a brief notification in the TUI when fallback occurs:

  • ⚡ Switched from zai/glm-5 → alibaba/glm-5 (rate limit)
  • ✓ Restored to zai/glm-5 (quota available)
  • ⚠ All providers in chain "coding" exhausted

Files to modify:

  • packages/pi-tui/src/ — event handler for new fallback events
  • Status bar or notification area in the TUI

Implementation Order

Step Phase Effort Dependencies
1 Phase 1.1-1.2: Settings schema Small None
2 Phase 2: Provider-level backoff Small None
3 Phase 3: FallbackResolver Medium Steps 1, 2
4 Phase 4: Retry integration Medium Step 3
5 Phase 5.1: Auto-restoration Small Step 4
6 Phase 1.3: CLI commands Small Step 1
7 Phase 6: Events & UI Small Step 4

Steps 1 and 2 can be done in parallel. Steps 6 and 7 can be done in parallel.


Key Design Decisions

1. Explicit chains vs automatic model equivalence

Decision: Explicit user-configured chains. Why: Automatic equivalence is unreliable — models with the same name from different providers may have different capabilities, limits, or pricing. Users should explicitly opt in to which models they consider interchangeable.

2. Where fallback sits in the retry flow

Decision: After credential rotation, before exponential backoff. Why: Provider fallback is a better recovery than waiting and retrying the same exhausted provider. If the fallback also fails, exponential backoff still kicks in.

3. Model swap vs new agent

Decision: Swap model on existing agent mid-conversation. Why: Creating a new agent would lose conversation context. The agent's streamFn already accepts model as a parameter, and getApiKey resolves per-provider, so swapping is straightforward.

4. Restoration strategy

Decision: Check before each request (lazy check on backoff expiry). Why: No background timers needed. The cost of one isProviderAvailable() check per request is negligible. More sophisticated quota tracking can be added later.

5. Scope of fallback

Decision: Per-session, not per-agent-type (initially). Why: The issue mentions per-agent-type toggle, but the simpler initial implementation is a global fallback chain that applies to any session using a model in the chain. Per-agent-type scoping can be added by extending the chain config with an agentTypes filter.


Risks & Mitigations

Risk Impact Mitigation
Model swap mid-conversation changes behavior Medium Log the swap, let user disable fallback
Different providers have different tool/feature support High Validate fallback model supports same API features before swapping
Credential resolution race conditions Low Use existing file-lock mechanism in auth-storage
Chain misconfiguration (nonexistent model) Low Validate chain entries on save, warn on invalid
Backoff timing mismatch with actual quota reset Medium Conservative backoff defaults; Phase 5.2 for future improvement

Testing Strategy

  1. Unit tests for FallbackResolver — mock auth-storage and model-registry, test chain resolution, priority ordering, backoff skipping
  2. Unit tests for extended auth-storage — provider-level backoff tracking
  3. Integration test for retry flow — simulate rate limit → credential fallback → provider fallback → restoration
  4. E2E test — configure a chain, hit rate limit on provider A, verify automatic switch to provider B
  5. Settings tests — validate chain CRUD operations, persistence, invalid input handling

Files Summary

File Action Changes
packages/pi-coding-agent/src/core/settings-manager.ts Modify Add FallbackSettings types, getters/setters
packages/pi-coding-agent/src/core/auth-storage.ts Modify Add provider-level backoff tracking
packages/pi-coding-agent/src/core/fallback-resolver.ts New FallbackResolver class
packages/pi-coding-agent/src/core/agent-session.ts Modify Integrate fallback into retry flow
packages/pi-ai/src/agent.ts Modify Add setModel() method
packages/pi-coding-agent/src/cli/commands/settings.ts Modify Add fallback CLI subcommands
packages/pi-tui/src/ Modify Fallback event display