singularity-forge/mintlify-docs/guides/dynamic-model-routing.mdx
ace-pm 9d739dfa5d Rename GSD→SF: complete rebrand from fork origin
- All gsdDir/gsdRoot/gsdHome → sfDir/sfRootDir/sfHome
- GSDWorkspace* → SFWorkspace* interfaces
- bootstrapGsdProject → bootstrapProject
- runGSDDoctor → runSFDoctor
- GsdClient → SfClient, gsd-client.ts → sf-client.ts
- .gsd/ → .sf/ in all tests, docs, docker, native, vscode
- Auto-migration: headless detects .gsd/ → renames to .sf/
- Deleted gsd-phase-state.ts backward-compat re-export
- Renamed bin/gsd-from-source → bin/sf-from-source
- Updated mintlify docs, github workflows, docker configs
2026-04-15 18:33:47 +02:00

94 lines
3 KiB
Text

---
title: "Dynamic model routing"
description: "Automatically select cheaper models for simple work and reserve expensive models for complex tasks."
---
Dynamic model routing classifies each dispatched unit into a complexity tier and selects an appropriate model. This reduces token consumption by 20-50% without sacrificing quality where it matters.
The key rule: **downgrade-only semantics**. Your configured model is always the ceiling — routing never upgrades beyond what you've configured.
## Enabling
```yaml
dynamic_routing:
enabled: true
```
## Complexity tiers
| Tier | Typical work | Default model level |
|------|-------------|-------------------|
| **Light** | Slice completion, UAT, hooks | Haiku-class |
| **Standard** | Research, planning, execution | Sonnet-class |
| **Heavy** | Replanning, roadmap reassessment | Opus-class |
## Configuration
```yaml
dynamic_routing:
enabled: true
tier_models:
light: claude-haiku-4-5
standard: claude-sonnet-4-6
heavy: claude-opus-4-6
escalate_on_failure: true # bump tier on task failure
budget_pressure: true # auto-downgrade near budget ceiling
cross_provider: true # consider models from other providers
```
### `escalate_on_failure`
When a task fails at a given tier, the router escalates: Light → Standard → Heavy. Prevents cheap models from burning retries on work that needs more reasoning.
### `budget_pressure`
Progressive downgrading as budget ceiling approaches:
| Budget used | Effect |
|------------|--------|
| < 50% | No adjustment |
| 50-75% | Standard → Light |
| 75-90% | More aggressive |
| > 90% | Nearly everything → Light |
### `cross_provider`
The router may select models from providers other than your primary, using a built-in cost table to find the cheapest model at each tier.
## Task plan analysis
For `execute-task` units, the classifier analyzes the task plan:
| Signal | Simple → Light | Complex → Heavy |
|--------|---------------|----------------|
| Step count | ≤ 3 | ≥ 8 |
| File count | ≤ 3 | ≥ 8 |
| Description length | < 500 chars | > 2000 chars |
| Code blocks | — | ≥ 5 |
| Complexity keywords | None | Present |
## Adaptive learning
The routing history (`.sf/routing-history.json`) tracks success/failure per tier per unit type. If a tier's failure rate exceeds 20%, future classifications are bumped up.
User feedback (`/sf rate`) is weighted 2x vs automatic outcomes.
## Cost table
| Model | Input (per M) | Output (per M) |
|-------|-------|--------|
| claude-haiku-4-5 | $0.80 | $4.00 |
| claude-sonnet-4-6 | $3.00 | $15.00 |
| claude-opus-4-6 | $15.00 | $75.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4o | $2.50 | $10.00 |
| gemini-2.0-flash | $0.10 | $0.40 |
The cost table is for comparison only — actual billing comes from your provider.
## Interaction with token profiles
- **Token profiles** control phase skipping and context compression
- **Dynamic routing** controls per-unit model selection within those constraints
The `budget` profile + dynamic routing provides maximum cost savings.