go-mlx

Author	SHA1	Message	Date
Snider	2edb45e2c5	chore: set macOS deployment target to 26.0 All checks were successful Security Scan / security (push) Successful in 11s Details Test / Vet & Build (push) Successful in 54s Details Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-26 05:38:53 +00:00
Claude	421d0c42ff	docs: archive completed backend-abstraction and batch-inference plans All checks were successful Security Scan / security (push) Successful in 15s Details Test / Vet & Build (push) Successful in 50s Details Move both plans to docs/plans/completed/ with summaries. MLX backend implements shared interfaces and batch inference at 5K sentences/sec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 13:51:21 +00:00
Snider	c0f07478c8	docs: document InspectAttention KV cache extraction in architecture guide All checks were successful Security Scan / security (push) Successful in 11s Details Test / Vet & Build (push) Successful in 31s Details Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-23 12:34:31 +00:00
Snider	1ea90b03b4	docs: graduate TODO/FINDINGS into production documentation Replace internal task tracking with structured docs covering CGO/mlx-c architecture, 4 model architectures, training pipeline, mlxlm backend, development guide, and full project history across 5 phases. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-20 15:03:39 +00:00
Snider	ce1acef462	docs: batch inference API design (Phase 5) Two new TextModel methods: Classify (prefill-only, fast path for classification) and BatchGenerate (autoregressive, multi-prompt). Adds attention masking for padded batches. Primary consumer: go-i18n Phase 2a domain classification at ~5K sentences/sec. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 23:18:38 +00:00
Snider	443347a2f8	fix(metal): address 4 minor code review items - Rename New() → newArray() to signal internal-only intent (112 usages) - Remove unused Collect() function and its test - Fix discarded json.Unmarshal error in qwen3.go - Document AsStrided stride formula in gemma3.go Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 21:36:40 +00:00
Snider	97d9041455	docs(plan): fold Virgil review into design and implementation plan Virgil review items integrated: - context.Context on Generate/Chat (required for HTTP cancellation) - Err() error on TextModel (distinguish EOS from OOM) - Chat() on TextModel (model owns its chat template) - Memory control functions exposed at root package level - Functional options convention confirmed - pkg/process confirmed — no changes needed for mlxlm Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 19:25:05 +00:00
Snider	28e2a07316	docs(plan): backend abstraction implementation plan (12 tasks) Detailed step-by-step plan for restructuring go-mlx: - Tasks 1-8: mechanical migration to internal/metal/ - Task 9: new Generate loop with iter.Seq[Token] streaming - Task 10: deterministic memory cleanup (fixes leak) - Tasks 11-12: integration tests and doc updates Critical checkpoint at Task 7: all 148 tests must pass. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 19:14:59 +00:00
Snider	c881813872	docs(design): backend abstraction with internal/metal reorganisation Approved design for restructuring go-mlx: - Root package becomes clean interface (TextModel, LoadModel, Token) - All CGO code moves to internal/metal/ - Deterministic memory management (Close + per-step cleanup) - Error propagation instead of silent logging - mlxlm/ backend placeholder for Python subprocess support Includes API breaking change communication in FINDINGS.md and memory management research tasks in cpp/TODO.md. See: docs/plans/2026-02-19-backend-abstraction-design.md Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 19:12:04 +00:00

9 commits