7.8 KiB
ADR-011: Swarm chat and debate mode for ephemeral subagents
Date: 2026-04-29 Status: accepted (Option A implemented; full swarm chat deferred)
Context
sf's subagent tool originally dispatched one or more subagents in parallel fire-and-forget mode (subagent({ tasks: [...] })). All tasks ran concurrently; none saw each other; the parent collected results and synthesised.
This is sufficient for many cases (parallel research, parallel gate evaluation), but it has a structural gap for adversarial review and multi-stakeholder negotiation:
- An advocate's strongest defence never gets stress-tested by the challenger — they fire monologues in parallel.
- A multi-stakeholder swarm (the canonical Vision Alignment Meeting roles in
plan-milestone: PM, User Advocate, Combatant, Architect, …) never actually negotiates; each issues a verdict the parent then weighs. - The parent is the only synthesiser — there's no convergence dynamic among the subagents themselves.
The user asked whether agent-to-agent communication could happen inside ephemeral swarm tasks, sharing the chat machinery rather than waiting for the long-lived persistent-agent layer (SPEC §17–18) to land.
Decision
Implement Option A now; defer Option C.
Round-robin debate mode is implemented on the existing subagent tool as
subagent({ mode: "debate", rounds: N, tasks: [...] }). It gives each
participant the prior rounds' transcript and keeps the parent as synthesiser.
Full inbox-based swarm chat remains deferred until the persistent-agent layer
(agents, agent_messages, agent_inbox, send_message tool) lands in
current .sf/DB-backed state. That machinery should not be rebuilt inside the
ephemeral subagent extension.
Alternatives Considered
Option A — Round-robin debate mode (IMPLEMENTED)
Add mode: "debate" and rounds: N to the subagent tool. Each round, every task sees the previous round's outputs.
subagent({
mode: "debate",
rounds: 3,
tasks: [
{ agent: "reviewer", task: "Make case for X. ..." },
{ agent: "reviewer", task: "Attack X. ..." }
]
})
- Cost:
rounds × taskstokens. - Determinism: still reasonable — outputs are sequenced deterministically per round.
- Fit: best for adversarial review where the challenger should engage with the advocate's strongest defence. Minor extension of the existing
subagentcontract. - Why not: doesn't support free-form many-to-many messaging. Each task speaks once per round in a fixed order.
- Why first: smallest change, biggest immediate quality win, reusable as a primitive.
Implementation: src/resources/extensions/subagent/index.ts.
Regression test: src/tests/subagent-debate-mode.test.ts.
Option B — Shared scratchpad
Subagents share a JSON scratchpad written between turns. Each subagent reads what the others wrote, appends, hands off.
- Pros: state is explicit and auditable; low protocol complexity.
- Cons: feels mechanical — agents don't "talk", they write to a buffer. No spontaneous response.
- Verdict: rejected. If we're going to add inter-agent state, do it as messaging (Option A or C), not a buffer.
Option C — Ephemeral swarm with inbox (long-term target)
Reuse the future persistent-agent infrastructure once it exists in current .sf/DB-backed state, but scope each ephemeral swarm by swarm_id with a TTL. Swarm agents can send_message to each other freely during the task; on synthesize(), the swarm's rows get archived.
swarm({
ttl_ms: 600_000,
agents: [
{ id: "pm", model_tier: "planning", system: "..." },
{ id: "user", model_tier: "validation", system: "..." },
{ id: "combatant", model_tier: "validation", system: "..." },
{ id: "architect", model_tier: "validation", system: "..." }
],
initial: { from: "moderator", to: "all", content: "Roadmap proposal: ..." }
})
- Pros: open negotiation; most powerful for multi-stakeholder Vision Alignment Meeting; reuses persistent-agent machinery.
- Cons: path-dependent (harder to reproduce); harder to budget tokens; swarm convergence isn't guaranteed without a moderator. Depends on the persistent-agent layer landing first.
- Verdict: target end state; not first.
Consequences
Positive
- Higher-quality adversarial review — the challenger actually engages the advocate's strongest defence, instead of issuing a parallel monologue.
- Multi-stakeholder pressure testing — the Vision Alignment Meeting can use bounded debate rounds instead of only a parallel survey.
- Reusable primitive — debate mode can be invoked from any skill that today does
subagent({ tasks: [advocate, challenger] })(currentlyadvisory-partner,brainstorming,requesting-code-review).
Negative
- Cost grows linearly with rounds. A 3-round debate is roughly 3× the tokens. Callers should reserve budget accordingly.
- Determinism drops. A fire-and-collect batch is reproducible from prompts alone; a debate is path-dependent. Trace recording becomes more important —
.sf/traces/must capture each round. - Synthesis complexity rises — the parent must summarise a debate transcript, not just collect verdicts. The synthesis prompt itself becomes a tunable artefact.
Risks and mitigations
- Risk: runaway debate — agents loop without converging.
- Mitigation: hard
roundscap; convergence heuristic (stop when no new claims appear in a round).
- Mitigation: hard
- Risk: one agent dominates and silences the others.
- Mitigation: moderator role injects a turn-order constraint; per-agent token budget within a round.
- Risk: debate quality is only marginally better than parallel-fire-and-collect.
- Mitigation: A/B harness — run both modes on the same fixture set, compare verdict accuracy on a benchmark of known good/bad designs. If the lift is < 10 % accuracy, defer Option A indefinitely.
Out of Scope
- Persistent inter-agent messaging across runs — belongs in current
.sf/DB-backed persistent-agent state when that layer exists; orthogonal to ephemeral swarms. - Cross-session swarm replay — a swarm session, once archived, is read-only. No "fork from round 2" support in v1.
- Human-in-the-loop debate — swarms are agent-to-agent only. If the user wants to inject a turn, that's a different surface (the existing
discussflow).
Implementation Notes (Option A)
subagentacceptsmode: "parallel" | "debate"ontasksbatches.roundsdefaults to2, is capped at5, and is valid only withmode: "debate".- Debate requires at least two participants.
- Each round runs the participant tasks, then appends their outputs to an in-memory transcript.
- Later rounds receive the transcript under
Debate transcript so far. - The final round asks each participant to end with
FINAL_VERDICT. - The parent still owns synthesis and persistence; debate mode does not create persistent agent messages.
Sequencing
| When | Why |
|---|---|
| Now | Option A is available as bounded debate mode on subagent. |
| Six months of Option A in production | Decide whether full swarm-chat with inbox is worth the build. |
Persistent-agent layer projected into .sf/DB state |
Revisit Option C when inbox/message persistence exists as current runtime state. |
References
- Older persistent-agent and inter-agent messaging SPEC notes — external design evidence only; project accepted facts into
.sf/DB-backed state before treating them as operational. src/resources/extensions/sf/skills/dispatching-subagents/SKILL.md— current single/parallel/debate/chain guidance.src/resources/extensions/sf/skills/advisory-partner/SKILL.md— primary consumer of adversarial dispatch today.src/resources/extensions/sf/prompts/gate-evaluate.md— pre-execution Q3/Q4 gates.src/resources/extensions/sf/prompts/validate-milestone.md— post-execution 3-reviewer pattern.