singularity-forge/mintlify-docs/guides/token-optimization.mdx
ace-pm 9d739dfa5d Rename GSD→SF: complete rebrand from fork origin
- All gsdDir/gsdRoot/gsdHome → sfDir/sfRootDir/sfHome
- GSDWorkspace* → SFWorkspace* interfaces
- bootstrapGsdProject → bootstrapProject
- runGSDDoctor → runSFDoctor
- GsdClient → SfClient, gsd-client.ts → sf-client.ts
- .gsd/ → .sf/ in all tests, docs, docker, native, vscode
- Auto-migration: headless detects .gsd/ → renames to .sf/
- Deleted gsd-phase-state.ts backward-compat re-export
- Renamed bin/gsd-from-source → bin/sf-from-source
- Updated mintlify docs, github workflows, docker configs
2026-04-15 18:33:47 +02:00

175 lines
4.7 KiB
Text

---
title: "Token optimization"
description: "Token profiles, context compression, and complexity-based task routing to reduce costs by 40-60%."
---
SF's token optimization system has three pillars: **token profiles**, **context compression**, and **complexity-based task routing**.
## Token profiles
A token profile coordinates model selection, phase skipping, and context compression. Set it in preferences:
```yaml
token_profile: balanced
```
### `budget` — maximum savings (40-60% reduction)
| Dimension | Setting |
|-----------|---------|
| Planning model | Sonnet |
| Execution model | Sonnet |
| Simple task model | Haiku |
| Completion model | Haiku |
| Milestone research | Skipped |
| Slice research | Skipped |
| Reassessment | Skipped |
| Context level | Minimal |
Best for: prototyping, small projects, well-understood codebases.
### `balanced` — smart defaults
| Dimension | Setting |
|-----------|---------|
| All models | User's default |
| Subagent model | Sonnet |
| Milestone research | Runs |
| Slice research | Skipped |
| Reassessment | Runs |
| Context level | Standard |
Best for: most projects, day-to-day development.
### `quality` — full context
Every phase runs. Every context artifact is inlined. No shortcuts. Best for: complex architectures, greenfield projects, critical production work.
## Context compression
Each profile maps to an **inline level** controlling how much context is pre-loaded into dispatch prompts:
| Profile | Level | What's included |
|---------|-------|-----------------|
| `budget` | Minimal | Task plan, essential prior summaries (truncated). Drops decisions, requirements, templates. |
| `balanced` | Standard | Task plan, prior summaries, slice plan, roadmap excerpt. |
| `quality` | Full | Everything — all plans, summaries, decisions, requirements, templates. |
### Prompt compression
SF can apply deterministic text compression before falling back to section-boundary truncation:
```yaml
compression_strategy: compress # or "truncate"
```
| Strategy | Behavior | Default for |
|----------|----------|------------|
| `truncate` | Drop entire sections at boundaries | `quality` |
| `compress` | Heuristic text compression first, then truncate | `budget`, `balanced` |
### Context selection
```yaml
context_selection: smart # or "full"
```
| Mode | Behavior | Default for |
|------|----------|------------|
| `full` | Inline entire files | `balanced`, `quality` |
| `smart` | TF-IDF semantic chunking for large files | `budget` |
## Complexity-based task routing
SF classifies each task by complexity and routes it to an appropriate model tier.
<Warning>
Dynamic routing requires explicit `models` in your preferences. Without a `models` section, routing is skipped.
</Warning>
### Classification signals
| Signal | Simple | Standard | Complex |
|--------|--------|----------|---------|
| Step count | ≤ 3 | 4-7 | ≥ 8 |
| File count | ≤ 3 | 4-7 | ≥ 8 |
| Description length | < 500 chars | 500-2000 | > 2000 chars |
| Code blocks | — | — | ≥ 5 |
| Complexity keywords | None | Any present | — |
**Complexity keywords:** `research`, `investigate`, `refactor`, `migrate`, `integrate`, `complex`, `architect`, `redesign`, `security`, `performance`, `concurrent`, `parallel`
### Budget pressure
When approaching the budget ceiling, the classifier automatically downgrades tiers:
| Budget used | Effect |
|------------|--------|
| < 50% | No adjustment |
| 50-75% | Standard → Light |
| 75-90% | More aggressive |
| > 90% | Everything except Heavy → Light |
## Adaptive learning
SF tracks success/failure per tier and adjusts classifications over time. User feedback via `/sf rate` is weighted 2x:
```
/sf rate over # model was overpowered
/sf rate ok # appropriate
/sf rate under # too weak
```
## Configuration examples
<Tabs>
<Tab title="Cost-optimized">
```yaml
---
version: 1
token_profile: budget
budget_ceiling: 25.00
models:
execution_simple: claude-haiku-4-5-20250414
---
```
</Tab>
<Tab title="Balanced with custom models">
```yaml
---
version: 1
token_profile: balanced
models:
planning:
model: claude-opus-4-6
fallbacks:
- openrouter/z-ai/glm-5
execution: claude-sonnet-4-6
---
```
</Tab>
<Tab title="Full quality">
```yaml
---
version: 1
token_profile: quality
models:
planning: claude-opus-4-6
execution: claude-opus-4-6
---
```
</Tab>
</Tabs>
Per-phase overrides always win over profile defaults:
```yaml
---
version: 1
token_profile: budget
phases:
skip_research: false # keep research despite budget profile
models:
planning: claude-opus-4-6 # use Opus for planning despite budget
---
```