1 Training Process
Claude edited this page 2026-02-23 19:41:13 +00:00

Training Process

LoRA fine-tuning on M3 Ultra transforms golden set examples into model weights.

Training Script

Script: lem_train_15k.py on M3 Ultra Location: /Volumes/Data/lem/scripts/lem_train_15k.py Python: Must use /opt/homebrew/bin/python3 (not system python — MLX needs it) Unbuffered: Must use -u flag when running via nohup

Model Configurations

Model Iterations Batch Size Notes
gemma-3-1b 1,000 4 Fast, good for testing
gemma-3-12b 600 2 Strong reasoning
gemma-3-27b 400 1 Benchmark leader, gradient checkpointing required

Running Training

# Single model
nohup /opt/homebrew/bin/python3 -u scripts/lem_train_15k.py --models gemma-3-1b > /tmp/lem-train-1b.log 2>&1 &

# Monitor
tail -f /tmp/lem-train-1b.log

Training Data

Location: /Volumes/Data/lem/training-15k/

Split Count
Train 13,498
Validation 750
Test 750

Output

  • Fused models: /Volumes/Data/lem/LEM-{model}-15k/
  • LoRA adapters: /Volumes/Data/lem/adapters-15k/{model}/

DeepSeek R1 Layered LoRA

The DeepSeek R1 models use a multi-layer training approach instead of single-pass LoRA. See DeepSeek R1 Research for details.

Layer Sequence (v1: 3-layer)

Ethics (training-2k, 1839 examples) → Composure (watts-full, 72 examples) → Western (merged, 156 examples)

Layer Sequence (v2: 5-layer)

Ethics → Composure → Western → Ethics-sandwich (600 iter) → Western-fresh (@200, val loss 2.321)

Layer Sequence (v3: 7-layer sovereignty)

Ethics → Composure → Western → Sandwich → WesternFresh → Russian → Gold-full (7019 examples, 1600 iter)

Adapter Management

Adapters on M3 (/Volumes/Data/lem/):

Adapter Model Notes
adapters-deepseek-r1-7b R1-Distill-Qwen-7B Ethics base
adapters-*-composure R1-Distill-Qwen-7B Watts composure layer
adapters-*-western R1-Distill-Qwen-7B Western philosophy layer
adapters-*-sandwich R1-Distill-Qwen-7B Ethics sandwich
adapters-*-sandwich-watts R1-Distill-Qwen-7B OVERFIT — do not use
adapters-*-western-fresh R1-Distill-Qwen-7B @200 canonical (best)
adapters-15k/{model} Various 15K golden set LoRA

InfluxDB Metrics

Training progress is reported to InfluxDB (training database):

Measurement Fields
training_loss train_loss, val_loss, iteration, learning_rate
training_status model, status (running/complete/failed), total_iters

GGUF Conversion

After training, models are converted for distribution:

MLX QAT → dequantize (mlx_lm convert -d) → convert_hf_to_gguf.py (F16) → llama-quantize (Q4_K_M)
  • 7 models converted (GPT-OSS MoE not supported by llama.cpp)
  • 9 GGUF files total at /Volumes/Data/lem/gguf/
  • Quantization: Q4_K_M standard, 1B also has Q5_K_M + Q8_0
  • F16 intermediates deleted to save space

Go Native Training (New)

The core ml train command implements native LoRA training in Go via the MLX backend, replacing the Python training script. See Go Pipeline Commands for usage.