1 Go Pipeline Commands
Claude edited this page 2026-02-23 19:41:13 +00:00

Go Pipeline Commands

The core ml command suite — all Python pipeline scripts ported to Go.

Global Flags

Flag Default Description
--api-url http://10.69.69.108:8090 OpenAI API endpoint
--db env LEM_DB DuckDB path
--influx http://10.69.69.165:8181 InfluxDB URL
--judge-model gemma3:27b Judge model name
--judge-url http://10.69.69.108:11434 Ollama endpoint
--model Model name for API

Commands

Inference

Command Description
core ml serve OpenAI-compatible inference server (--model-path, --bind, default 8090)
core ml chat Interactive chat session

Scoring & Probing

Command Description
core ml probe 23 capability + 6 content probes (--model, --output)
core ml score 3-tier scoring: heuristic, judge, exact (--input, --output, --suites, --concurrency)
core ml benchmark Run benchmark suite against models

Generation

Command Description
core ml expand Generate expansion responses from DuckDB expansion_prompts
core ml sandwich Generate with axiom sandwich signing
core ml lesson Generate training lessons
core ml sequence Run training sequences

Data Management

Command Description
core ml export Export golden set to JSONL/Parquet (train/test/valid splits)
core ml ingest JSONL → DuckDB golden_set
core ml normalize 87K seeds → 46K deduped expansion_prompts
core ml consolidate Merge worker JSONLs, dedup by idx
core ml import-all Pull all data from M3 + ingest
core ml query Ad-hoc SQL against DuckDB
core ml approve Filter scored expansions (--threshold 6.0), export chat training format
core ml publish Push Parquet + dataset card to HuggingFace

Training

Command Description
core ml train Native LoRA training via MLX backend

Monitoring

Command Description
core ml status Training/generation progress (reads InfluxDB + DuckDB)
core ml metrics Push stats to InfluxDB
core ml live Show live generation progress from InfluxDB
core ml expand-status Expansion pipeline status dashboard
core ml coverage Seed coverage analysis (underrepresented regions/domains)
core ml inventory Full table inventory with per-table stats

Model Conversion

Command Description
core ml convert MLX LoRA → PEFT format
core ml gguf MLX LoRA → GGUF format

Infrastructure

Command Description
core ml worker Distributed scoring worker (--infer endpoint)
core ml agent Scoring agent daemon
core ml seed-influx Seed InfluxDB gold_gen from DuckDB

Python → Go Migration

Python Script Go Replacement
pipeline.py (all commands) core ml status/score/export/expand
lem_generate.py core ml serve + expand
lem_expand.py core ml expand
lem_scorer.py core ml score
lem_semantic_scorer.py core ml score --suites semantic
lem_standard_scorer.py core ml score --suites exact
lem_train_15k.py core ml train

Source Locations

  • Commands: internal/cmd/ml/ in core/cli
  • ML package: pkg/ml/ — backend interface, scoring, heuristics, judge, expand, export, influx, db
  • MLX package: pkg/mlx/ — CGo wrapper, array, ops, model, cache, tokenizer, sampler

DuckDB Access

  • Driver: marcboeker/go-duckdb
  • Location: pkg/ml/db.go
  • Database: golden-set.duckdb (8 tables, 155K+ rows)