Training Process

LoRA fine-tuning on M3 Ultra transforms golden set examples into model weights.

Training Script

Script: lem_train_15k.py on M3 Ultra Location: /Volumes/Data/lem/scripts/lem_train_15k.py Python: Must use /opt/homebrew/bin/python3 (not system python — MLX needs it) Unbuffered: Must use -u flag when running via nohup

Model Configurations

Model	Iterations	Batch Size	Notes
gemma-3-1b	1,000	4	Fast, good for testing
gemma-3-12b	600	2	Strong reasoning
gemma-3-27b	400	1	Benchmark leader, gradient checkpointing required

Running Training

# Single model
nohup /opt/homebrew/bin/python3 -u scripts/lem_train_15k.py --models gemma-3-1b > /tmp/lem-train-1b.log 2>&1 &

# Monitor
tail -f /tmp/lem-train-1b.log

Training Data

Location: /Volumes/Data/lem/training-15k/

Split	Count
Train	13,498
Validation	750
Test	750

Output

Fused models: /Volumes/Data/lem/LEM-{model}-15k/
LoRA adapters: /Volumes/Data/lem/adapters-15k/{model}/

DeepSeek R1 Layered LoRA

The DeepSeek R1 models use a multi-layer training approach instead of single-pass LoRA. See DeepSeek R1 Research for details.

Layer Sequence (v1: 3-layer)

Ethics (training-2k, 1839 examples) → Composure (watts-full, 72 examples) → Western (merged, 156 examples)

Layer Sequence (v2: 5-layer)

Ethics → Composure → Western → Ethics-sandwich (600 iter) → Western-fresh (@200, val loss 2.321)

Layer Sequence (v3: 7-layer sovereignty)

Ethics → Composure → Western → Sandwich → WesternFresh → Russian → Gold-full (7019 examples, 1600 iter)

Adapter Management

Adapters on M3 (/Volumes/Data/lem/):

Adapter	Model	Notes
adapters-deepseek-r1-7b	R1-Distill-Qwen-7B	Ethics base
adapters-*-composure	R1-Distill-Qwen-7B	Watts composure layer
adapters-*-western	R1-Distill-Qwen-7B	Western philosophy layer
adapters-*-sandwich	R1-Distill-Qwen-7B	Ethics sandwich
adapters-*-sandwich-watts	R1-Distill-Qwen-7B	OVERFIT — do not use
adapters-*-western-fresh	R1-Distill-Qwen-7B	@200 canonical (best)
adapters-15k/{model}	Various	15K golden set LoRA

InfluxDB Metrics

Training progress is reported to InfluxDB (training database):

Measurement	Fields
`training_loss`	train_loss, val_loss, iteration, learning_rate
`training_status`	model, status (running/complete/failed), total_iters

GGUF Conversion

After training, models are converted for distribution:

MLX QAT → dequantize (mlx_lm convert -d) → convert_hf_to_gguf.py (F16) → llama-quantize (Q4_K_M)

7 models converted (GPT-OSS MoE not supported by llama.cpp)
9 GGUF files total at /Volumes/Data/lem/gguf/
Quantization: Q4_K_M standard, 1B also has Q5_K_M + Q8_0
F16 intermediates deleted to save space

Go Native Training (New)

The core ml train command implements native LoRA training in Go via the MLX backend, replacing the Python training script. See Go Pipeline Commands for usage.