Commit graph

28 commits

Author SHA1 Message Date
Snider
3606ff994b fix: memory, error handling, and signal improvements across pkg/lem
- Stream parquet export rows instead of unbounded memory allocation
- Replace QueryGoldenSet/QueryExpansionPrompts with iter.Seq2 iterators
- Remove legacy runtime.GC() calls from distill (go-mlx handles cleanup)
- Replace log.Fatalf with error return in tier_score.go
- Add SIGINT/SIGTERM signal handling to agent and worker daemon loops
- Add error checks for unchecked db.conn.Exec in import.go and tier_score.go
- Update tests for iterator-based database methods

Co-Authored-By: Gemini <noreply@google.com>
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 04:46:51 +00:00
Snider
de3d6a70f1 lems configs 2026-02-23 04:38:37 +00:00
Snider
56eda1a081 refactor: migrate all 25 commands from passthrough to cobra framework
Replace passthrough() + stdlib flag.FlagSet anti-pattern with proper
cobra integration. Every Run* function now takes a typed *Opts struct
and returns error. Flags registered via cli.StringFlag/IntFlag/etc.
Commands participate in Core lifecycle with full cobra flag parsing.

- 6 command groups: gen, score, data, export, infra, mon
- 25 commands converted, 0 passthrough() calls remain
- Delete passthrough() helper from lem.go
- Update export_test.go to use ExportOpts struct

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 03:32:53 +00:00
Snider
42c0af728b fix: raise GQA threshold to ≤4 KV heads for position-wise analysis
Gemma3-4B has 4 KV heads — too few for meaningful pairwise head
coherence (only 6 pairs). Position-wise differentiation gives richer
signal. Multi-head path now requires ≥5 heads.

4B baseline (260 sovereign probes): mean=6487, stdev=153, range=6170-6886.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 01:02:13 +00:00
Snider
d99384f1e6 feat: GQA position-wise analysis + integer composite (0-10000)
Single KV head models (Gemma3-1B) now use position-wise differentiation
instead of pairwise head coherence. Composite switched from float64 to
int on 0-10000 scale — same principle as blockchain atomic units.

Signal validated: degenerate=5234, sovereign=6031, creative=6480.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:52:47 +00:00
Snider
b621baaded feat: add 19D full feature vector (grammar + heuristic + attention)
FullFeatures concatenates 6D grammar + 8D heuristic + 5D attention
for Poindexter spatial indexing. Nil BOResult zero-fills attention dims.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:34:22 +00:00
Snider
fbc636ee29 feat: integrate attention scoring into distill pipeline (opt-in via config)
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:30:36 +00:00
Snider
e3331920c4 feat: add 'lem score attention' CLI for Q/K Bone Orientation analysis
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:29:41 +00:00
Snider
28309b26dc feat: add Q/K Bone Orientation analysis engine (pure Go CPU math)
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 00:28:48 +00:00
Snider
c701c2e0af feat(lem): integrate Poindexter for spatial score indexing and analytics
- Add feature vector extraction (6D grammar, 8D heuristic, 14D combined)
- Add KDTree ScoreIndex with cosine distance for probe clustering
- Add score distribution analytics (percentiles, variance, skewness)
- Add grammar-profile dedup filtering to distill pipeline
- Add spatial gap detection (FindGaps) for coverage analysis
- Wire analytics into coverage CLI (PrintScoreAnalytics)

New files: features.go, cluster.go, analytics.go + tests
Modified: distill.go (dedup filter), coverage.go (analytics output)
Dep: github.com/Snider/Poindexter

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 21:26:06 +00:00
Snider
f75458bce6 refactor: apply go fix modernizers for Go 1.26
Automated fixes: interface{} → any, range-over-int, t.Context(),
wg.Go(), strings.SplitSeq, strings.Builder, slices.Contains,
maps helpers, min/max builtins.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-22 21:00:17 +00:00
Snider
a3e9a1e035 fix: handle error in score resume merge path
ReadScorerOutput error was silently discarded during resume merge,
risking partial data loss on TOCTOU file changes. Also clean up
compare command construction to pass RunE directly to NewCommand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 19:03:41 +00:00
Snider
a0a0118155 refactor: move runScore and runProbe to pkg/lem
All 28 commands now accessible as exported lem.Run* functions.
Prerequisite for CLI framework migration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:53:15 +00:00
Snider
8532077e46 style: remove redundant named import for go-ml
Package declares itself as 'ml', so the named import alias is
unnecessary. Go resolves the package name from the declaration,
not the module path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:08:01 +00:00
Snider
55519b24aa feat(distill): migrate from go-inference to go-ml Backend
Replace inference.LoadModel() with ml.NewMLXBackend() which wraps
the same Metal model with memory management (SetCacheLimit,
SetMemoryLimit). Replace raw iter.Seq token loop with backend.Chat()
returning Result{Text, Metrics}. Add runtime.GC() between probes
to prevent incremental memory leak.

Reference: go-ml/cmd/cmd_ab.go memory management pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:02:16 +00:00
Snider
8408cc0bab feat(distill): add --cache-limit and --mem-limit flags
Override ai.yaml memory config per-run. Values in GB.
Not yet wired to model loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:00:04 +00:00
Snider
b9da23a0be feat(distill): add Metal memory limit config fields
CacheLimit (8GB) and MemoryLimit (16GB) in DistillConfig control
mlx.SetCacheLimit/SetMemoryLimit before model load. Conservative
defaults for 1B model on 96GB machine.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 17:59:11 +00:00
Snider
d233e76648 feat: add training data to repo + make paths repo-relative
Move training/lem/ (probes, lessons, eval sets) into git so the
full curriculum is publicly releasable. Update .core/ai configs
and distill.go to use repo-relative paths instead of /Volumes/Data/.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:49:12 +00:00
Snider
1b742bf92c feat: native Metal distillation command + .core/ai config
Add `lem distill` — full Go pipeline for self-distillation using
go-mlx (native Metal inference) and go-i18n/reversal (v3 grammar
scoring). Replaces the Python distill.py bridge entirely.

New files:
- .core/ai/ai.yaml: global defaults (scorer, generation, distill)
- .core/ai/models/gemma3/{27b,1b}.yaml: model configs with paths,
  kernel, lessons, baselines
- .core/ai/probes.yaml: probe sets grouped by training phase
- pkg/lem/config.go: YAML config loaders for .core/ai/
- pkg/lem/grammar.go: in-process grammar scoring (ComputeGrammarScore,
  ComputeDelta, ScoreResponse) extracted from cmd/scorer
- pkg/lem/distill.go: RunDistill command — best-of-N generation,
  grammar quality gate, training JSONL output
- pkg/lem/backend_metal.go: blank import for go-mlx Metal registration

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-21 23:42:55 +00:00
Claude
08363ee1af
feat: add lem worker command for distributed inference network
Go client for the LEM distributed inference API (BugSETI/Agentic).
Workers register via Forgejo PAT auth, pull prompt batches, run local
inference (MLX/vLLM/llama.cpp), submit results. Credits tracked as
Phase 1 stub for Phase 2 blockchain LEM token.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 18:10:59 +00:00
Claude
774f097855
feat: scaffold LEM Desktop app (Wails v3 system tray + Docker stack)
Inspired by BugSETI architecture — system tray with WebView2 windows,
Docker Compose stack (Forgejo + InfluxDB + inference proxy), and
scoring agent integration. Builds as signed native binary on macOS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 17:43:19 +00:00
Claude
9fac5749c2
feat: add scoring agent + 23 capability probes (replaces scoring_agent.py)
Go scoring daemon that polls M3 for unscored LoRA checkpoints,
converts MLX→PEFT, runs 23 binary capability probes via OpenAI-
compatible API, and pushes results to InfluxDB. Zero Python deps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 17:22:40 +00:00
Claude
91ee389377
feat: convert all pipeline.py commands to Go
Complete conversion of pipeline.py into Go `lem` CLI:
- import-all: bulk import all LEM data into DuckDB from M3
- consolidate: pull worker JSONLs, merge, deduplicate
- normalize: seeds → deduplicated expansion_prompts table
- approve: filter scored expansions → training JSONL
- tier-score: heuristic/judge tiered expansion scoring
- expand-status: expansion pipeline progress from DuckDB
- inventory: DuckDB table counts and summary
- coverage: seed coverage gap analysis
- seed-influx: bootstrap InfluxDB from DuckDB golden_gen
- query: ad-hoc SQL against DuckDB

22 commands total, 49 Go files. Replaces entire pipeline.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 17:12:03 +00:00
Claude
4eaf1bfb39
feat: add parquet, publish, metrics, convert commands
- `lem parquet` — export JSONL training splits to Parquet (parquet-go)
- `lem publish` — push Parquet files to HuggingFace dataset repo
- `lem metrics` — push DuckDB golden set stats to InfluxDB
- `lem convert` — MLX LoRA adapter → HuggingFace PEFT format
  (pure Go safetensors read/write/transpose, no PyTorch needed)

Dependencies added: parquet-go, go-huggingface, go-rocm, go-pytorch, gotch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 17:05:08 +00:00
Claude
0afa5e9147
feat: add lem ingest command + go-huggingface dependency
Ingests benchmark data (content scores, capability scores, training
curves) from JSONL files and mlx_lm logs into InfluxDB. Batched
writes, iteration extraction from checkpoint labels.

Also adds github.com/hupe1980/go-huggingface for future HF sync.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:55:17 +00:00
Claude
a18fd1c44e
refactor: remove Vi identity from calm conversations
Vi identity is a separate training concern. Seed conversations now
contain only philosophical/mindfulness content for the R300 calm phase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:48:23 +00:00
Claude
c4fb775298
feat: add lem conv command for conversational training data
Ports conversational_training.py to Go with InfluxDB reporting.
24 built-in seed conversations (Vi identity, philosophy, mindfulness).
Supports extra JSONL files and golden set conversion to chat format.

Also fixes InfluxDB client to accept 204 No Content on writes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:42:46 +00:00
Claude
70dd18c065
refactor: move Go library to pkg/lem, thin main.go
All scoring/influx/export/expand logic moves to pkg/lem as an
importable package. main.go is now a thin CLI dispatcher.

This lets new commands import the shared library directly —
ready for converting Python scripts to Go subcommands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-15 16:30:09 +00:00