Commit graph

9 commits

Author SHA1 Message Date
Snider
2edb45e2c5 chore: set macOS deployment target to 26.0
All checks were successful
Security Scan / security (push) Successful in 11s
Test / Vet & Build (push) Successful in 54s
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-26 05:38:53 +00:00
Claude
421d0c42ff
docs: archive completed backend-abstraction and batch-inference plans
All checks were successful
Security Scan / security (push) Successful in 15s
Test / Vet & Build (push) Successful in 50s
Move both plans to docs/plans/completed/ with summaries. MLX backend
implements shared interfaces and batch inference at 5K sentences/sec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:51:21 +00:00
Snider
c0f07478c8 docs: document InspectAttention KV cache extraction in architecture guide
All checks were successful
Security Scan / security (push) Successful in 11s
Test / Vet & Build (push) Successful in 31s
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-23 12:34:31 +00:00
Snider
1ea90b03b4 docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking with structured docs covering CGO/mlx-c
architecture, 4 model architectures, training pipeline, mlxlm backend,
development guide, and full project history across 5 phases.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:03:39 +00:00
Snider
ce1acef462 docs: batch inference API design (Phase 5)
Two new TextModel methods: Classify (prefill-only, fast path for
classification) and BatchGenerate (autoregressive, multi-prompt).
Adds attention masking for padded batches. Primary consumer: go-i18n
Phase 2a domain classification at ~5K sentences/sec.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 23:18:38 +00:00
Snider
443347a2f8 fix(metal): address 4 minor code review items
- Rename New() → newArray() to signal internal-only intent (112 usages)
- Remove unused Collect() function and its test
- Fix discarded json.Unmarshal error in qwen3.go
- Document AsStrided stride formula in gemma3.go

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:36:40 +00:00
Snider
97d9041455 docs(plan): fold Virgil review into design and implementation plan
Virgil review items integrated:
- context.Context on Generate/Chat (required for HTTP cancellation)
- Err() error on TextModel (distinguish EOS from OOM)
- Chat() on TextModel (model owns its chat template)
- Memory control functions exposed at root package level
- Functional options convention confirmed
- pkg/process confirmed — no changes needed for mlxlm

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:25:05 +00:00
Snider
28e2a07316 docs(plan): backend abstraction implementation plan (12 tasks)
Detailed step-by-step plan for restructuring go-mlx:
- Tasks 1-8: mechanical migration to internal/metal/
- Task 9: new Generate loop with iter.Seq[Token] streaming
- Task 10: deterministic memory cleanup (fixes leak)
- Tasks 11-12: integration tests and doc updates

Critical checkpoint at Task 7: all 148 tests must pass.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:14:59 +00:00
Snider
c881813872 docs(design): backend abstraction with internal/metal reorganisation
Approved design for restructuring go-mlx:
- Root package becomes clean interface (TextModel, LoadModel, Token)
- All CGO code moves to internal/metal/
- Deterministic memory management (Close + per-step cleanup)
- Error propagation instead of silent logging
- mlxlm/ backend placeholder for Python subprocess support

Includes API breaking change communication in FINDINGS.md and
memory management research tasks in cpp/TODO.md.

See: docs/plans/2026-02-19-backend-abstraction-design.md

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:12:04 +00:00