singularity-forge/.plans/native-perf-optimizations.md
Flux Labs c8b42ed2ae feat: native perf optimizations — deriveState, JSONL, paths, parsing (#576)
Four native Rust optimizations to eliminate hot-path bottlenecks:

1. deriveState raw content (gsd_parser.rs, state.ts):
   - Added rawContent field to ParsedGsdFile in batch parser
   - Eliminates 43-line frontmatter re-serialization loop in state.ts
   - Batch cache now stores original file content directly

2. JSONL streaming parser (gsd_parser.rs, session-forensics.ts):
   - Added parseJsonlTail() — reads from file tail with constant memory
   - Handles arbitrary file sizes (no more 10MB OOM risk)
   - synthesizeCrashRecovery and readLastActivityLog use native first

3. Native directory tree index (gsd_parser.rs, paths.ts):
   - Added scanGsdTree() — walks .gsd/ tree once, returns all entries
   - paths.ts builds lookup map from native scan
   - cachedReaddirWithTypes/cachedReaddir check native cache first
   - Eliminates 20-50 readdirSync calls per dispatch

4. Native plan/summary parsers (gsd_parser.rs, files.ts):
   - Added parsePlanFile() — parses tasks, must-haves, estimates
   - Added parseSummaryFile() — parses frontmatter, sections, files
   - files.ts calls native first, falls back to JS regex parsers
   - 3-5x faster per file, ~20 files per deriveState

All optimizations follow the established pattern: native-first with
JS fallback when native module unavailable.
2026-03-15 20:16:42 -06:00

5.3 KiB

Native Performance Optimizations — deriveState, JSONL, Paths, Parsing

Overview

Four native Rust optimizations to eliminate hot-path bottlenecks in GSD's dispatch cycle. Building on the existing git2 migration and native parser infrastructure.


1. Native deriveState — Eliminate Frontmatter Re-serialization

Problem

state.ts:134-176 — When nativeBatchParseGsdFiles() returns parsed files, the JS side re-serializes frontmatter back into YAML strings so downstream parsers can re-parse them. This is a round-trip waste: Rust parses → JS re-serializes → JS re-parses.

Solution

The native batch parser already returns { metadata: JSON, body, sections }. Instead of re-serializing frontmatter to YAML in JS, modify cachedLoadFile() to return the raw body directly, and update downstream parsers to accept pre-parsed metadata. This eliminates the entire lines 143-172 re-serialization loop.

However, the parsers (parseRoadmap, parseSummary, parsePlan, etc.) all expect raw markdown strings with frontmatter. Changing their signatures would be a massive refactor. Instead:

Approach: Make Rust return the original file content alongside parsed data.

Add a new field rawContent: String to ParsedGsdFile that contains the complete original file content. The JS batch cache stores this directly, eliminating the re-serialization entirely. Downstream parsers get exactly what loadFile() would return.

Implementation

  • Rust (gsd_parser.rs): Add raw_content field to ParsedGsdFile, populate with the original file content read from disk.
  • TS (native-parser-bridge.ts): Expose rawContent in BatchParsedFile.
  • TS (state.ts): Replace the 30-line re-serialization loop with fileContentCache.set(absPath, f.rawContent).

Impact

Eliminates ~30 lines of JS string building per dispatch. Removes JSON.parse of metadata that was only used to re-serialize back to YAML.


2. Native JSONL Streaming Parser

Problem

session-forensics.ts:68-78 — Parses JSONL by split("\n").map(JSON.parse) with a 10MB cap. Large session files cause OOM or slowness.

Solution

Add a Rust JSONL parser that streams through the file with constant memory, returning structured data. Uses serde_json for parsing and handles arbitrary file sizes.

Implementation

  • Rust (gsd_parser.rs): Add parse_jsonl_tail(path, max_entries?) function that:
    1. Memory-maps or streams the file from the tail
    2. Parses each line as JSON
    3. Returns the last N entries as a JSON array string
  • TS (native-parser-bridge.ts): Add bridge function.
  • TS (session-forensics.ts): Use native parser, fall back to JS implementation.

Impact

Handles arbitrary file sizes. 3-5x faster parsing on 10MB files.


3. Native Directory Tree Index

Problem

paths.ts:20-34cachedReaddirSync() caches per-directory, but caches are cleared every dispatch via invalidateAllCaches(). Each resolveMilestoneFile, resolveSliceFile, resolveTaskFile triggers separate directory reads.

Solution

Add a Rust function that walks the entire .gsd/ tree once and returns a flat file listing. The JS side builds a Map from this, making all path resolution O(1) lookups instead of repeated readdirSync + regex matching.

Implementation

  • Rust (gsd_parser.rs): The batchParseGsdFiles already walks the tree. Add scan_gsd_tree(directory) that returns Vec<{ path, isDir, name }> for ALL entries (not just .md files).
  • TS (native-parser-bridge.ts): Add bridge function.
  • TS (paths.ts): Add native tree cache. On first access, call native scan and build lookup maps. clearPathCache() clears the native cache too.

Impact

Eliminates 20-50 readdirSync calls per dispatch. Makes resolveDir/resolveFile O(1) lookups.


4. Expand Native Markdown Parsing

Problem

files.ts parsers (parsePlan, parseSummary, parseContinue) still use JS regex. Each runs ~10-20 regex patterns per file. Only parseRoadmap has a native implementation.

Solution

Add native Rust implementations for parsePlan and parseSummary — the two parsers called most frequently during deriveState. parseContinue is called infrequently and can stay in JS.

Implementation

  • Rust (gsd_parser.rs): Add parse_plan_file(content) and parse_summary_file(content).
  • TS (native-parser-bridge.ts): Add bridge functions with JS fallback.
  • TS (files.ts): Call native versions first, fall back to JS.

Impact

3-5x faster parsing per file. With ~20 files per deriveState, saves 20-40ms.


Implementation Order

  1. deriveState raw content (smallest change, biggest immediate impact)
  2. Directory tree index (eliminates readdirSync overhead)
  3. JSONL streaming parser (helps crash recovery path)
  4. Plan/Summary native parsers (improves parsing throughput)

Files Modified

Rust

  • native/crates/engine/src/gsd_parser.rs — new functions + rawContent field

TypeScript

  • src/resources/extensions/gsd/native-parser-bridge.ts — new bridge functions
  • src/resources/extensions/gsd/state.ts — simplified batch cache
  • src/resources/extensions/gsd/paths.ts — native tree cache
  • src/resources/extensions/gsd/session-forensics.ts — native JSONL
  • src/resources/extensions/gsd/files.ts — native plan/summary parsers