Commit graph

13 commits

Author SHA1 Message Date
Mikael Hugo
8088489e38 sf snapshot: uncommitted changes after 258m inactivity 2026-05-07 15:37:55 +02:00
Mikael Hugo
553ba23b89 integrate: hook quick wins into UOK dispatch loop
Integration of 3 quick wins into existing UOK infrastructure:

1. Model Learning (Quick Win #2) → metrics.js
   - Record outcomes to model-learner for per-task-type performance tracking
   - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome()
   - Fire-and-forget: never blocks outcome recording on learning failure
   - Enables adaptive model routing decisions in downstream gates

2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js
   - Auto-fix high-confidence reports (>0.85) in applyTriageReport()
   - Hook: After triage and requirement promotion, apply auto-fixes
   - Fire-and-forget: never blocks report application on fix failure
   - Returns reportsAutoFixed count for triage metrics

3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js
   - Already active in execute-task prompt template
   - Semantic matching with graceful degradation

All integration points:
- Fire-and-forget: learning/fixing failures never block dispatch
- UOK-native: use existing outcome recording, db, gates
- Backward compatible: applyTriageReport now async, but callers handle it
- No new dependencies: all modules already in codebase

Testing: 2934 tests pass (no regressions from integration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-06 22:34:41 +02:00
Mikael Hugo
93c1bbcb9a docs: plan judge calibration service 2026-04-29 18:28:45 +02:00
Mikael Hugo
b32fe7acd1 docs: clarify SF harness rollout boundaries 2026-04-29 17:47:51 +02:00
Mikael Hugo
d78c5ac198 feat: add SF skills and subagent debate mode 2026-04-29 17:44:30 +02:00
Mikael Hugo
ffa216d6ad docs: log caveman input-compression follow-ups in BUILD_PLAN
Caveman skill (output compression) installed at ~/.claude/skills/caveman/
and activated for dr-repo. Two follow-ups for INPUT-side compression
remain — sf's own prompts are verbose (execute-task alone has 10-step
instructions, runtime context, multiple inlined plans), and that's paid
on every dispatch:

- Tier 2 (1-2 days): Manually rewrite heaviest prompt sections in
  caveman style. Preserve intent + nuance, drop fluff. Compare against
  current to confirm no quality regression.
- Tier 3 (3-4 days): Runtime input preprocessor — pipe rendered prompt
  through caveman-compress (sub-skill, ~46% reduction) before dispatch.
  Behind a terse_prompts: true flag. Adds drift risk vs authored intent;
  needs comparison harness.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 15:46:32 +02:00
Mikael Hugo
1bbd20bf78 docs: correct gsd-2 self-heal port — substance applies after path translation
Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked
.gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use
.sf instead'. That was a superficial read.

The fix is about detecting and recovering from a broken/redirected
staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is
a one-line path translation, not a design difference. The
symlink-resilience logic is exactly what we need for our staging.

Path-translate .gsd/ → .sf/ in the port. The substance ports.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:49:05 +02:00
Mikael Hugo
dea4c2dbc1 docs: update Tier 0 with port status; flag SSE parser refactor as bigger work
5 of 9 Tier 0 items landed:
- #1 HTML export escape (security)            701ec8fb8 + 92c6d933c
- #2 Empty tools array fix                    58b1d7c60
- #4 undici 5min timeout                      d0907b6d8
- #5 Bedrock inference profile                7c487bb60

Deferred:
- #3 Anthropic SSE proxy event tolerance — fix applies to pi-mono's
  custom SSE parser, but we still use @anthropic-ai/sdk directly.
  To get protection we'd need to port the full "own Anthropic SSE
  parsing" refactor (3 commits, ~200 LOC). Added as a separate Tier 0
  item.

Remaining TODO from Tier 0: items #6-#9 (symlinked dedup, setWorkingVisible
extension API, Cloudflare provider, Azure Cognitive Services).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:35:55 +02:00
Mikael Hugo
a3c487c918 docs: add Tier 0 (pi-mono ports) and Tier 0.5 (gsd-2 manual ports) — sf does these first
Tier 0 (pi-mono — should land cleanly via cherry-pick, no namespace divergence):
9 items ranked security → bug-fixes → infra → features.

  Critical:
    1. HTML export escape (security)
    2. Empty tools array fix (provider compatibility)
    3. Anthropic SSE proxy event tolerance
    4. Long local-LLM SSE 5min timeout fix

  Infrastructure:
    5. Bedrock inference profile normalization
    6. Symlinked packages dedup
    7. ctx.ui.setWorkingVisible() extension API

  Features:
    8. Cloudflare Workers AI provider
    9. Azure Cognitive Services endpoint

Tier 0.5 (gsd-2 — must be MANUALLY ported; cherry-pick fails on namespace):

  Critical fixes (11):
    1-6.  bash race, security hardening, web_search injection narrowing,
          symlinked staging self-heal, KNOWLEDGE budget, mcp-server deadlock
    7-10. agent_end transition fixes (4 commits)
    11.   claude-code-cli Always-Allow persistence

  Normal-value features (6):
    12. /gsd eval-review slim port (prompt + tool + template)
    13. Workflow state machine hardening (5 commits as unit)
    14. Proactive rate limiting (min_request_interval_ms)
    15. Per-call token telemetry (opt-in pi-coding-agent hooks)
    16. Worktree TUI commands
    17. Doctor check for orphan milestone directories

Skipped from each upstream is documented. All in BUILD_PLAN.md so sf
can work the list systematically.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 14:04:31 +02:00
Mikael Hugo
bb1c68b7ab docs: drop OpenRouter-removal follow-up
OpenRouter is already neutered via the provider_model_allow allowlist
(see d38e5ea09 fix(schema): auto-coerce string → [string] for sf_* list
fields + provider_model_allow tests). The 248 model entries in
models.generated.ts are inert — no dispatch path reaches them.

Removing the data entries would be aesthetic cleanup with zero
behavioral effect. Not worth a Tier-1 follow-up.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:58:33 +02:00
Mikael Hugo
310ce963ea docs: add session follow-ups to BUILD_PLAN
Six items surfaced during 2026-04-29 ports/refactors that didn't get
tracked anywhere:

- Tier 1: Remove OpenRouter (~248 model entries; user confirmed unused)
- Tier 1: Minimax search tests (deferred from initial port)
- Tier 2: Search provider registry refactor (rid of 9-file-per-provider)
- Tier 2: Product-audit phase machine wire-up (slim port shipped tool;
  phase dispatch not yet wired)
- Tier 2: Headless assistant-text preview (bunker pattern, deferred from
  headless UX commit)
- Tier 3: Pi-mono SDK sync cadence

Each entry has rationale + effort estimate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 13:56:55 +02:00
Mikael Hugo
7a6169705a docs: lock in fork stance, reframe cherry-pick list as reference-only
After attempting cluster B (4 surgical agent-session fixes), even the
first commit conflicted because of structural namespace divergence
(gsd_*→sf_* rename, @sf-run/*→@singularity-forge/* rename, prior
pi-mono direct cherry-picks). The conflicts are real semantic
divergence, not noise.

Conclusion: sf is a fork; we do not periodically sync from
gsd-build/gsd-2. Pretending we still track upstream means weeks of
merge work for diminishing return.

BUILD_PLAN.md adds an explicit "Upstream stance" section documenting
the fork posture and the rationale for the three irreversible naming
choices.

UPSTREAM_CHERRY_PICK_CANDIDATES.md is reframed as a reference list,
not an action plan. The clusters and SHAs remain useful as an
intelligence source — port specific fixes by hand when one bites us;
do not run automated cherry-picks against the list.

Pi-mono SDK syncs continue separately — that path doesn't have the
same divergence problem.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:57:44 +02:00
Mikael Hugo
31842885ea docs: add BUILD_PLAN.md — tiered cut of v3 NEW items
Of the 56 NEW items in SPEC.md, not all are worth building for v3.
This plan groups them by tier:

- Tier 1 ESSENTIAL (~5 weeks): Vault resolver, sm integration decision,
  schema reconciliation, config alignment.
- Tier 2 STRONG (~3-4 weeks): doc-sync, intent chapters, PhaseReview
  3-pass, turn_status marker, last_error cap, cost_micro_usd.
- Tier 3 NICE (v3.1+): persistent agents, inter-agent messaging,
  workflow content pinning, runs table, pending_retain.
- Tier 4 DEFER: SSH workers, HTTP API auth, trace_index, PhaseUAT —
  build when a deployment demands it.
- Tier 5 DROP: items from late adversarial-review iterations that
  don't earn their keep (workflow_pins separate table, snap_ columns,
  agent_capabilities separate index).

Includes a recommended ~6-8 week v3.0 schedule and four decision
points that should be settled before starting work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 12:33:07 +02:00