lthn/LEM

Template

Table of Contents

Go Pipeline Commands

Global Flags
Commands

Inference
Scoring & Probing
Generation
Data Management
Training
Monitoring
Model Conversion
Infrastructure

Python → Go Migration
Source Locations
DuckDB Access

Go Pipeline Commands

The core ml command suite — all Python pipeline scripts ported to Go.

Global Flags

Flag	Default	Description
`--api-url`	`http://10.69.69.108:8090`	OpenAI API endpoint
`--db`	env `LEM_DB`	DuckDB path
`--influx`	`http://10.69.69.165:8181`	InfluxDB URL
`--judge-model`	`gemma3:27b`	Judge model name
`--judge-url`	`http://10.69.69.108:11434`	Ollama endpoint
`--model`	—	Model name for API

Commands

Inference

Command	Description
`core ml serve`	OpenAI-compatible inference server (`--model-path`, `--bind`, default 8090)
`core ml chat`	Interactive chat session

Scoring & Probing

Command	Description
`core ml probe`	23 capability + 6 content probes (`--model`, `--output`)
`core ml score`	3-tier scoring: heuristic, judge, exact (`--input`, `--output`, `--suites`, `--concurrency`)
`core ml benchmark`	Run benchmark suite against models

Generation

Command	Description
`core ml expand`	Generate expansion responses from DuckDB expansion_prompts
`core ml sandwich`	Generate with axiom sandwich signing
`core ml lesson`	Generate training lessons
`core ml sequence`	Run training sequences

Data Management

Command	Description
`core ml export`	Export golden set to JSONL/Parquet (train/test/valid splits)
`core ml ingest`	JSONL → DuckDB golden_set
`core ml normalize`	87K seeds → 46K deduped expansion_prompts
`core ml consolidate`	Merge worker JSONLs, dedup by idx
`core ml import-all`	Pull all data from M3 + ingest
`core ml query`	Ad-hoc SQL against DuckDB
`core ml approve`	Filter scored expansions (`--threshold 6.0`), export chat training format
`core ml publish`	Push Parquet + dataset card to HuggingFace

Training

Command	Description
`core ml train`	Native LoRA training via MLX backend

Monitoring

Command	Description
`core ml status`	Training/generation progress (reads InfluxDB + DuckDB)
`core ml metrics`	Push stats to InfluxDB
`core ml live`	Show live generation progress from InfluxDB
`core ml expand-status`	Expansion pipeline status dashboard
`core ml coverage`	Seed coverage analysis (underrepresented regions/domains)
`core ml inventory`	Full table inventory with per-table stats

Model Conversion

Command	Description
`core ml convert`	MLX LoRA → PEFT format
`core ml gguf`	MLX LoRA → GGUF format

Infrastructure

Command	Description
`core ml worker`	Distributed scoring worker (`--infer` endpoint)
`core ml agent`	Scoring agent daemon
`core ml seed-influx`	Seed InfluxDB gold_gen from DuckDB

Python → Go Migration

Python Script	Go Replacement
`pipeline.py` (all commands)	`core ml status/score/export/expand`
`lem_generate.py`	`core ml serve` + expand
`lem_expand.py`	`core ml expand`
`lem_scorer.py`	`core ml score`
`lem_semantic_scorer.py`	`core ml score --suites semantic`
`lem_standard_scorer.py`	`core ml score --suites exact`
`lem_train_15k.py`	`core ml train`

Source Locations

Commands: internal/cmd/ml/ in core/cli
ML package: pkg/ml/ — backend interface, scoring, heuristics, judge, expand, export, influx, db
MLX package: pkg/mlx/ — CGo wrapper, array, ops, model, cache, tokenizer, sampler

DuckDB Access

Driver: marcboeker/go-duckdb
Location: pkg/ml/db.go
Database: golden-set.duckdb (8 tables, 155K+ rows)