Table of Contents
Training Process
LoRA fine-tuning on M3 Ultra transforms golden set examples into model weights.
Training Script
Script: lem_train_15k.py on M3 Ultra
Location: /Volumes/Data/lem/scripts/lem_train_15k.py
Python: Must use /opt/homebrew/bin/python3 (not system python — MLX needs it)
Unbuffered: Must use -u flag when running via nohup
Model Configurations
| Model | Iterations | Batch Size | Notes |
|---|---|---|---|
| gemma-3-1b | 1,000 | 4 | Fast, good for testing |
| gemma-3-12b | 600 | 2 | Strong reasoning |
| gemma-3-27b | 400 | 1 | Benchmark leader, gradient checkpointing required |
Running Training
# Single model
nohup /opt/homebrew/bin/python3 -u scripts/lem_train_15k.py --models gemma-3-1b > /tmp/lem-train-1b.log 2>&1 &
# Monitor
tail -f /tmp/lem-train-1b.log
Training Data
Location: /Volumes/Data/lem/training-15k/
| Split | Count |
|---|---|
| Train | 13,498 |
| Validation | 750 |
| Test | 750 |
Output
- Fused models:
/Volumes/Data/lem/LEM-{model}-15k/ - LoRA adapters:
/Volumes/Data/lem/adapters-15k/{model}/
DeepSeek R1 Layered LoRA
The DeepSeek R1 models use a multi-layer training approach instead of single-pass LoRA. See DeepSeek R1 Research for details.
Layer Sequence (v1: 3-layer)
Ethics (training-2k, 1839 examples) → Composure (watts-full, 72 examples) → Western (merged, 156 examples)
Layer Sequence (v2: 5-layer)
Ethics → Composure → Western → Ethics-sandwich (600 iter) → Western-fresh (@200, val loss 2.321)
Layer Sequence (v3: 7-layer sovereignty)
Ethics → Composure → Western → Sandwich → WesternFresh → Russian → Gold-full (7019 examples, 1600 iter)
Adapter Management
Adapters on M3 (/Volumes/Data/lem/):
| Adapter | Model | Notes |
|---|---|---|
| adapters-deepseek-r1-7b | R1-Distill-Qwen-7B | Ethics base |
| adapters-*-composure | R1-Distill-Qwen-7B | Watts composure layer |
| adapters-*-western | R1-Distill-Qwen-7B | Western philosophy layer |
| adapters-*-sandwich | R1-Distill-Qwen-7B | Ethics sandwich |
| adapters-*-sandwich-watts | R1-Distill-Qwen-7B | OVERFIT — do not use |
| adapters-*-western-fresh | R1-Distill-Qwen-7B | @200 canonical (best) |
| adapters-15k/{model} | Various | 15K golden set LoRA |
InfluxDB Metrics
Training progress is reported to InfluxDB (training database):
| Measurement | Fields |
|---|---|
training_loss |
train_loss, val_loss, iteration, learning_rate |
training_status |
model, status (running/complete/failed), total_iters |
GGUF Conversion
After training, models are converted for distribution:
MLX QAT → dequantize (mlx_lm convert -d) → convert_hf_to_gguf.py (F16) → llama-quantize (Q4_K_M)
- 7 models converted (GPT-OSS MoE not supported by llama.cpp)
- 9 GGUF files total at
/Volumes/Data/lem/gguf/ - Quantization: Q4_K_M standard, 1B also has Q5_K_M + Q8_0
- F16 intermediates deleted to save space
Go Native Training (New)
The core ml train command implements native LoRA training in Go via the MLX backend, replacing the Python training script. See Go Pipeline Commands for usage.