Replace inference.LoadModel() with ml.NewMLXBackend() which wraps
the same Metal model with memory management (SetCacheLimit,
SetMemoryLimit). Replace raw iter.Seq token loop with backend.Chat()
returning Result{Text, Metrics}. Add runtime.GC() between probes
to prevent incremental memory leak.
Reference: go-ml/cmd/cmd_ab.go memory management pattern.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| lem | ||