singularity/singularity-forge

Author	SHA1	Message	Date
Mikael Hugo	8088489e38	sf snapshot: uncommitted changes after 258m inactivity	2026-05-07 15:37:55 +02:00
Mikael Hugo	553ba23b89	integrate: hook quick wins into UOK dispatch loop Integration of 3 quick wins into existing UOK infrastructure: 1. Model Learning (Quick Win #2) → metrics.js - Record outcomes to model-learner for per-task-type performance tracking - Hook: recordUnitOutcome() now calls ModelLearner.recordOutcome() - Fire-and-forget: never blocks outcome recording on learning failure - Enables adaptive model routing decisions in downstream gates 2. Self-Report Fixing (Quick Win #1) → triage-self-feedback.js - Auto-fix high-confidence reports (>0.85) in applyTriageReport() - Hook: After triage and requirement promotion, apply auto-fixes - Fire-and-forget: never blocks report application on fix failure - Returns reportsAutoFixed count for triage metrics 3. Knowledge Injection (Quick Win #3) → already integrated in auto-prompts.js - Already active in execute-task prompt template - Semantic matching with graceful degradation All integration points: - Fire-and-forget: learning/fixing failures never block dispatch - UOK-native: use existing outcome recording, db, gates - Backward compatible: applyTriageReport now async, but callers handle it - No new dependencies: all modules already in codebase Testing: 2934 tests pass (no regressions from integration) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-06 22:34:41 +02:00
Mikael Hugo	93c1bbcb9a	docs: plan judge calibration service	2026-04-29 18:28:45 +02:00
Mikael Hugo	b32fe7acd1	docs: clarify SF harness rollout boundaries	2026-04-29 17:47:51 +02:00
Mikael Hugo	d78c5ac198	feat: add SF skills and subagent debate mode	2026-04-29 17:44:30 +02:00
Mikael Hugo	ffa216d6ad	docs: log caveman input-compression follow-ups in BUILD_PLAN Caveman skill (output compression) installed at ~/.claude/skills/caveman/ and activated for dr-repo. Two follow-ups for INPUT-side compression remain — sf's own prompts are verbose (execute-task alone has 10-step instructions, runtime context, multiple inlined plans), and that's paid on every dispatch: - Tier 2 (1-2 days): Manually rewrite heaviest prompt sections in caveman style. Preserve intent + nuance, drop fluff. Compare against current to confirm no quality regression. - Tier 3 (3-4 days): Runtime input preprocessor — pipe rendered prompt through caveman-compress (sub-skill, ~46% reduction) before dispatch. Behind a terse_prompts: true flag. Adds drift risk vs authored intent; needs comparison harness. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-29 15:46:32 +02:00
Mikael Hugo	1bbd20bf78	docs: correct gsd-2 self-heal port — substance applies after path translation Earlier I (and sf parroting BUILD_PLAN.md) dismissed gsd-2's symlinked .gsd self-heal fix (9340f1e9b / #4423) as 'doesn't apply because we use .sf instead'. That was a superficial read. The fix is about detecting and recovering from a broken/redirected staging-dir symlink to prevent silent data loss. The .gsd/ vs .sf/ is a one-line path translation, not a design difference. The symlink-resilience logic is exactly what we need for our staging. Path-translate .gsd/ → .sf/ in the port. The substance ports. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:49:05 +02:00
Mikael Hugo	dea4c2dbc1	docs: update Tier 0 with port status; flag SSE parser refactor as bigger work 5 of 9 Tier 0 items landed: - #1 HTML export escape (security) `701ec8fb8` + `92c6d933c` - #2 Empty tools array fix `58b1d7c60` - #4 undici 5min timeout `d0907b6d8` - #5 Bedrock inference profile `7c487bb60` Deferred: - #3 Anthropic SSE proxy event tolerance — fix applies to pi-mono's custom SSE parser, but we still use @anthropic-ai/sdk directly. To get protection we'd need to port the full "own Anthropic SSE parsing" refactor (3 commits, ~200 LOC). Added as a separate Tier 0 item. Remaining TODO from Tier 0: items #6-#9 (symlinked dedup, setWorkingVisible extension API, Cloudflare provider, Azure Cognitive Services). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:35:55 +02:00
Mikael Hugo	a3c487c918	docs: add Tier 0 (pi-mono ports) and Tier 0.5 (gsd-2 manual ports) — sf does these first Tier 0 (pi-mono — should land cleanly via cherry-pick, no namespace divergence): 9 items ranked security → bug-fixes → infra → features. Critical: 1. HTML export escape (security) 2. Empty tools array fix (provider compatibility) 3. Anthropic SSE proxy event tolerance 4. Long local-LLM SSE 5min timeout fix Infrastructure: 5. Bedrock inference profile normalization 6. Symlinked packages dedup 7. ctx.ui.setWorkingVisible() extension API Features: 8. Cloudflare Workers AI provider 9. Azure Cognitive Services endpoint Tier 0.5 (gsd-2 — must be MANUALLY ported; cherry-pick fails on namespace): Critical fixes (11): 1-6. bash race, security hardening, web_search injection narrowing, symlinked staging self-heal, KNOWLEDGE budget, mcp-server deadlock 7-10. agent_end transition fixes (4 commits) 11. claude-code-cli Always-Allow persistence Normal-value features (6): 12. /gsd eval-review slim port (prompt + tool + template) 13. Workflow state machine hardening (5 commits as unit) 14. Proactive rate limiting (min_request_interval_ms) 15. Per-call token telemetry (opt-in pi-coding-agent hooks) 16. Worktree TUI commands 17. Doctor check for orphan milestone directories Skipped from each upstream is documented. All in BUILD_PLAN.md so sf can work the list systematically. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 14:04:31 +02:00
Mikael Hugo	bb1c68b7ab	docs: drop OpenRouter-removal follow-up OpenRouter is already neutered via the provider_model_allow allowlist (see `d38e5ea09` fix(schema): auto-coerce string → [string] for sf_* list fields + provider_model_allow tests). The 248 model entries in models.generated.ts are inert — no dispatch path reaches them. Removing the data entries would be aesthetic cleanup with zero behavioral effect. Not worth a Tier-1 follow-up. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:58:33 +02:00
Mikael Hugo	310ce963ea	docs: add session follow-ups to BUILD_PLAN Six items surfaced during 2026-04-29 ports/refactors that didn't get tracked anywhere: - Tier 1: Remove OpenRouter (~248 model entries; user confirmed unused) - Tier 1: Minimax search tests (deferred from initial port) - Tier 2: Search provider registry refactor (rid of 9-file-per-provider) - Tier 2: Product-audit phase machine wire-up (slim port shipped tool; phase dispatch not yet wired) - Tier 2: Headless assistant-text preview (bunker pattern, deferred from headless UX commit) - Tier 3: Pi-mono SDK sync cadence Each entry has rationale + effort estimate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 13:56:55 +02:00
Mikael Hugo	7a6169705a	docs: lock in fork stance, reframe cherry-pick list as reference-only After attempting cluster B (4 surgical agent-session fixes), even the first commit conflicted because of structural namespace divergence (gsd_→sf_ rename, @sf-run/→@singularity-forge/ rename, prior pi-mono direct cherry-picks). The conflicts are real semantic divergence, not noise. Conclusion: sf is a fork; we do not periodically sync from gsd-build/gsd-2. Pretending we still track upstream means weeks of merge work for diminishing return. BUILD_PLAN.md adds an explicit "Upstream stance" section documenting the fork posture and the rationale for the three irreversible naming choices. UPSTREAM_CHERRY_PICK_CANDIDATES.md is reframed as a reference list, not an action plan. The clusters and SHAs remain useful as an intelligence source — port specific fixes by hand when one bites us; do not run automated cherry-picks against the list. Pi-mono SDK syncs continue separately — that path doesn't have the same divergence problem. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 12:57:44 +02:00
Mikael Hugo	31842885ea	docs: add BUILD_PLAN.md — tiered cut of v3 NEW items Of the 56 NEW items in SPEC.md, not all are worth building for v3. This plan groups them by tier: - Tier 1 ESSENTIAL (~5 weeks): Vault resolver, sm integration decision, schema reconciliation, config alignment. - Tier 2 STRONG (~3-4 weeks): doc-sync, intent chapters, PhaseReview 3-pass, turn_status marker, last_error cap, cost_micro_usd. - Tier 3 NICE (v3.1+): persistent agents, inter-agent messaging, workflow content pinning, runs table, pending_retain. - Tier 4 DEFER: SSH workers, HTTP API auth, trace_index, PhaseUAT — build when a deployment demands it. - Tier 5 DROP: items from late adversarial-review iterations that don't earn their keep (workflow_pins separate table, snap_ columns, agent_capabilities separate index). Includes a recommended ~6-8 week v3.0 schedule and four decision points that should be settled before starting work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-29 12:33:07 +02:00

13 commits