singularity-forge/docs/user-docs/cost-management.md

# Cost Management

SF tracks token usage and cost for every unit of work dispatched during autonomous mode. This data powers the dashboard, budget enforcement, and cost projections.

## Cost Tracking

Every unit's metrics are captured automatically:

- **Token counts** — input, output, cache read, cache write, total
- **Cost** — USD cost per unit
- **Duration** — wall-clock time
- **Tool calls** — number of tool invocations
- **Message counts** — assistant and user messages

Data is stored in `.sf/metrics.json` and survives across sessions.

### Viewing Costs

**Dashboard:** `Ctrl+Alt+G` or `/status` shows real-time cost breakdown.

**Aggregations available:**
- By phase (research, planning, execution, completion, reassessment)
- By slice (M001/S01, M001/S02, ...)
- By model (which models consumed the most budget)
- Project totals

## Budget Ceiling

Set a maximum spend for a project:

```yaml
---
version: 1
budget_ceiling: 50.00
---
```

### Enforcement Modes

Control what happens when the ceiling is reached:

```yaml
budget_enforcement: pause    # default when ceiling is set
```

| Mode | Behavior |
|------|----------|
| `warn` | Log a warning, continue executing |
| `pause` | Pause autonomous mode, wait for user action |
| `halt` | Stop autonomous mode entirely |

## Cost Projections

Once at least two slices have completed, SF projects the remaining cost:

```
Projected remaining: $12.40 ($6.20/slice avg × 2 remaining)
```

Projections use per-slice averages from completed work. If the budget ceiling has been reached, a warning is appended.

## Budget Pressure & Model Downgrading

When approaching the budget ceiling, the [complexity router](./token-optimization.md#budget-pressure) automatically downgrades model assignments to cheaper tiers. This is graduated:

- **< 50% used** — no adjustment
- **50-75% used** — standard tasks downgrade to light
- **75-90% used** — same, more aggressive
- **> 90% used** — nearly everything downgrades; only heavy tasks stay at standard

This ensures the budget is spread across remaining work instead of being exhausted early on complex tasks.

## Token Profiles & Cost

The `token_profile` preference directly affects cost:

| Profile | Typical Savings | How |
|---------|----------------|-----|
| `budget` | 40-60% | Cheaper models, phase skipping, minimal context |
| `balanced` | 10-20% | Default models, skip slice research, standard context |
| `quality` | 0% (baseline) | Full models, all phases, full context |

See [Token Optimization](./token-optimization.md) for details.

## Tips

- Start with `balanced` profile and a generous `budget_ceiling` to establish baseline costs
- Check `/status` after a few slices to see per-slice cost averages
- Switch to `budget` profile for well-understood, repetitive work
- Use `quality` only when architectural decisions are being made
- Per-phase model selection lets you use Opus only for planning while keeping execution on Sonnet
- Enable `dynamic_routing` for automatic model downgrading on simple tasks — see [Dynamic Model Routing](./dynamic-model-routing.md)
- Use `/visualize` → Metrics tab to see where your budget is going