3.1 KiB
Product Sense
The Core Thesis
SF is a purpose-to-software compiler. It exists to take bounded intent, turn it into a falsifiable PDD contract, research missing context, decide whether autonomy is allowed, and then run the resulting milestone to completion with clean git history, passing tests, and recorded evidence.
Every design decision should be evaluated against this question: does it make purpose-to-software compilation more reliable, more observable, more recoverable, or more falsifiable?
User Goals
- Hand off a milestone and have it complete without babysitting
- Know the agent won't make irreversible mistakes (write gates, protected files, budget ceilings)
- Resume after a crash without losing work (state-on-disk, crash recovery)
- See what the agent did and why (trace files, decision register, records keeper)
- Steer mid-run without breaking the loop (message queue, steering gate)
Non-Goals
- Being a chat interface — use the Pi interactive mode for exploratory conversation
- Replacing CI — SF triggers verification but does not replace your existing CI pipeline
- Working without context — SF needs a spec, a roadmap, and a task plan; it does not invent work from nothing
What Good Product Judgment Looks Like
Fresh context per unit, not accumulated context. Each task gets a new session with exactly the context it needs pre-injected (task plan, slice plan, prior summaries, relevant skills). This prevents quality degradation from context accumulation — one of the primary failure modes of naive LLM agents on long projects.
State machine, not LLM guessing. The loop is deterministic: read STATE.md → validate → dispatch → post-unit → verify → advance. The LLM executes work inside a unit; it does not decide what the next unit is. Separating orchestration from execution keeps the system predictable.
Spec-first. No behavior change without a failing test first. No completion without a real consumer. This is the iron law — not a suggestion. A system that completes tasks without PDD fields and executable evidence is just making things up.
Crash recovery must be invisible. A crashed session should resume within seconds with no visible data loss. If recovery requires human intervention, it is a product failure.
User stays in the loop via gates, not via interrupts. Discussion gates, write gates, budget ceilings, and approval prompts are the designed points of human interaction. The agent should not need to ask for help in the middle of a task.
Tradeoffs
| Choice | What we gave up | Why |
|---|---|---|
| Fresh session per unit | Conversational continuity across units | Quality and predictability over convenience |
| State on disk (not in memory) | Speed of in-memory state | Crash recovery and multi-process visibility |
| Write gate during queue | Faster iteration in planning | Safety: prevents accidental file mutations during discussion |
| Protected files (ADRs, SPEC.md) | Agent autonomy over architecture docs | Human oversight over durable decisions |
| Serial execution default | Throughput | Correctness before parallelism; parallel locking is deferred debt |