The Decode method strips the SentencePiece leading space from every
token, which loses word boundaries during streaming. DecodeToken
preserves the space (it represents the word boundary) and only the
first token of each generation has its leading space stripped.
Fixes Gemma3 space prefix appearing in chat UI output.
Co-Authored-By: Virgil <virgil@lethean.io>
Add LoRA field to Linear for transparent adapter injection via model's
Forward() path. ApplyLoRA() on Qwen3/Gemma3 wraps target projections.
Deterministic param ordering for adapter save/load consistency.
MaskedCrossEntropyLoss for training on assistant tokens only.
Co-Authored-By: Virgil <virgil@lethean.io>
LoRA: low-rank adaptation with trainable A/B matrices, Kaiming normal
init, safetensors save/load. AdamW: decoupled weight decay optimizer
with positional moment tracking for gradient-replaced params.
14 tests passing including end-to-end LoRA+AdamW training loop.
Co-Authored-By: Virgil <virgil@lethean.io>
Native Go bindings for MLX-C gradient computation on Apple Silicon.
Foundation for LoRA training without Python.
- VJP (reverse-mode autodiff) for backward pass
- JVP (forward-mode autodiff) for directional derivatives
- ValueAndGrad for combined loss + gradient computation
- Checkpoint for memory-efficient gradient recomputation
- CrossEntropyLoss (numerically stable via LogSumExp)
- MSELoss, Log, SumAll, MeanAll, OnesLike helpers
- TakeAlongAxis and LogSumExp ops
- Fix closure callback null vector bug (affects compile.go too)
- Fix Float() returning 0 for float32 arrays
14 tests passing on Metal GPU.
Co-Authored-By: Virgil <virgil@lethean.io>
Remove the manual -tags mlx requirement. MLX is now automatically
compiled on darwin/arm64 via build constraints. Stubs remain for
other platforms. No functional change.
Co-Authored-By: Virgil <virgil@lethean.io>
LEM scoring pipeline, native MLX Metal bindings, Claude SDK wrapper,
RAG with Qdrant/Ollama, unified AI facade, and MCP protocol server.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>