Integration of 3 quick wins into existing UOK infrastructure:
1. Model Learning (Quick Win #2) → metrics.js
- Record outcomes to model-learner for per-task-type performance tracking
- Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome()
- Fire-and-forget: never blocks outcome recording on learning failure
- Enables adaptive model routing decisions in downstream gates
2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js
- Auto-fix high-confidence reports (>0.85) in applyTriageReport()
- Hook: After triage and requirement promotion, apply auto-fixes
- Fire-and-forget: never blocks report application on fix failure
- Returns reportsAutoFixed count for triage metrics
3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js
- Already active in execute-task prompt template
- Semantic matching with graceful degradation
All integration points:
- Fire-and-forget: learning/fixing failures never block dispatch
- UOK-native: use existing outcome recording, db, gates
- Backward compatible: applyTriageReport now async, but callers handle it
- No new dependencies: all modules already in codebase
Testing: 2934 tests pass (no regressions from integration)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Caveman skill (output compression) installed at ~/.claude/skills/caveman/
and activated for dr-repo. Two follow-ups for INPUT-side compression
remain — sf's own prompts are verbose (execute-task alone has 10-step
instructions, runtime context, multiple inlined plans), and that's paid
on every dispatch:
- Tier 2 (1-2 days): Manually rewrite heaviest prompt sections in
caveman style. Preserve intent + nuance, drop fluff. Compare against
current to confirm no quality regression.
- Tier 3 (3-4 days): Runtime input preprocessor — pipe rendered prompt
through caveman-compress (sub-skill, ~46% reduction) before dispatch.
Behind a terse_prompts: true flag. Adds drift risk vs authored intent;
needs comparison harness.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked
.gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use
.sf instead'. That was a superficial read.
The fix is about detecting and recovering from a broken/redirected
staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is
a one-line path translation, not a design difference. The
symlink-resilience logic is exactly what we need for our staging.
Path-translate .gsd/ → .sf/ in the port. The substance ports.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
OpenRouter is already neutered via the provider_model_allow allowlist
(see d38e5ea09 fix(schema): auto-coerce string → [string] for sf_* list
fields + provider_model_allow tests). The 248 model entries in
models.generated.ts are inert — no dispatch path reaches them.
Removing the data entries would be aesthetic cleanup with zero
behavioral effect. Not worth a Tier-1 follow-up.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After attempting cluster B (4 surgical agent-session fixes), even the
first commit conflicted because of structural namespace divergence
(gsd_*→sf_* rename, @sf-run/*→@singularity-forge/* rename, prior
pi-mono direct cherry-picks). The conflicts are real semantic
divergence, not noise.
Conclusion: sf is a fork; we do not periodically sync from
gsd-build/gsd-2. Pretending we still track upstream means weeks of
merge work for diminishing return.
BUILD_PLAN.md adds an explicit "Upstream stance" section documenting
the fork posture and the rationale for the three irreversible naming
choices.
UPSTREAM_CHERRY_PICK_CANDIDATES.md is reframed as a reference list,
not an action plan. The clusters and SHAs remain useful as an
intelligence source — port specific fixes by hand when one bites us;
do not run automated cherry-picks against the list.
Pi-mono SDK syncs continue separately — that path doesn't have the
same divergence problem.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Of the 56 NEW items in SPEC.md, not all are worth building for v3.
This plan groups them by tier:
- Tier 1 ESSENTIAL (~5 weeks): Vault resolver, sm integration decision,
schema reconciliation, config alignment.
- Tier 2 STRONG (~3-4 weeks): doc-sync, intent chapters, PhaseReview
3-pass, turn_status marker, last_error cap, cost_micro_usd.
- Tier 3 NICE (v3.1+): persistent agents, inter-agent messaging,
workflow content pinning, runs table, pending_retain.
- Tier 4 DEFER: SSH workers, HTTP API auth, trace_index, PhaseUAT —
build when a deployment demands it.
- Tier 5 DROP: items from late adversarial-review iterations that
don't earn their keep (workflow_pins separate table, snap_ columns,
agent_capabilities separate index).
Includes a recommended ~6-8 week v3.0 schedule and four decision
points that should be settled before starting work.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>