LEM/pkg
Snider 55519b24aa feat(distill): migrate from go-inference to go-ml Backend
Replace inference.LoadModel() with ml.NewMLXBackend() which wraps
the same Metal model with memory management (SetCacheLimit,
SetMemoryLimit). Replace raw iter.Seq token loop with backend.Chat()
returning Result{Text, Metrics}. Add runtime.GC() between probes
to prevent incremental memory leak.

Reference: go-ml/cmd/cmd_ab.go memory management pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 18:02:16 +00:00
..
lem feat(distill): migrate from go-inference to go-ml Backend 2026-02-22 18:02:16 +00:00