- Add backend_mlxlm.go blank import to register mlx-lm subprocess backend - Select backend from ai.yaml config (metal, mlx_lm, rocm, api) - Only set Metal cache/memory limits when using metal backend - Add --no-dedup flag to disable grammar-profile deduplication (trained models with consistent voice trigger false positives at 0.02) - Add --context-len flag and context_len config for KV cache sizing - Pass WithBackend and WithContextLen to go-ml backend loader Co-Authored-By: Virgil <virgil@lethean.io>
6 lines
284 B
Go
6 lines
284 B
Go
package lem
|
|
|
|
// Blank import registers the mlx-lm subprocess backend with go-inference.
|
|
// This spawns a Python process using mlx-lm for inference — handles memory
|
|
// management natively via Python's refcounting (2.4 GB vs 17+ GB in CGO).
|
|
import _ "forge.lthn.ai/core/go-mlx/mlxlm"
|