Product Sense

The Core Thesis

SF is a purpose-to-software compiler. It exists to take bounded intent, turn it into a falsifiable PDD contract, research missing context, decide whether autonomy is allowed, and then run the resulting milestone to completion with clean git history, passing tests, and recorded evidence.

Every design decision should be evaluated against this question: does it make purpose-to-software compilation more reliable, more observable, more recoverable, or more falsifiable?

User Goals

Hand off a milestone and have it complete without babysitting
Know the agent won't make irreversible mistakes (write gates, protected files, budget ceilings)
Resume after a crash without losing work (state-on-disk, crash recovery)
See what the agent did and why (trace files, decision register, records keeper)
Steer mid-run without breaking the loop (message queue, steering gate)

Non-Goals

Being a chat interface — use the Pi interactive mode for exploratory conversation
Replacing CI — SF triggers verification but does not replace your existing CI pipeline
Working without context — SF needs a spec, a roadmap, and a task plan; it does not invent work from nothing

What Good Product Judgment Looks Like

Fresh context per unit, not accumulated context. Each task gets a new session with exactly the context it needs pre-injected (task plan, slice plan, prior summaries, relevant skills). This prevents quality degradation from context accumulation — one of the primary failure modes of naive LLM agents on long projects.

State machine, not LLM guessing. The loop is deterministic: read STATE.md → validate → dispatch → post-unit → verify → advance. The LLM executes work inside a unit; it does not decide what the next unit is. Separating orchestration from execution keeps the system predictable.

Spec-first. No behavior change without a failing test first. No completion without a real consumer. This is the iron law — not a suggestion. A system that completes tasks without PDD fields and executable evidence is just making things up.

Crash recovery must be invisible. A crashed session should resume within seconds with no visible data loss. If recovery requires human intervention, it is a product failure.

User stays in the loop via gates, not via interrupts. Discussion gates, write gates, budget ceilings, and approval prompts are the designed points of human interaction. The agent should not need to ask for help in the middle of a task.

Tradeoffs

Choice	What we gave up	Why
Fresh session per unit	Conversational continuity across units	Quality and predictability over convenience
State on disk (not in memory)	Speed of in-memory state	Crash recovery and multi-process visibility
Write gate during queue	Faster iteration in planning	Safety: prevents accidental file mutations during discussion
Protected files (ADRs, SPEC.md)	Agent autonomy over architecture docs	Human oversight over durable decisions
Serial execution default	Throughput	Correctness before parallelism; parallel locking is deferred debt

3.1 KiB Raw Blame History