core/go - Lethean Network

core/go

generated from lthn/LEM

Author	SHA1	Message	Date
Snider	adaa4131f9	refactor: strip to pure package library (#3 ) - Fix remaining 187 pkg/ files referencing core/cli → core/go - Move SDK library code from internal/cmd/sdk/ → pkg/sdk/ (new package) - Create pkg/rag/helpers.go with convenience functions from internal/cmd/rag/ - Fix pkg/mcp/tools_rag.go to use pkg/rag instead of internal/cmd/rag - Fix pkg/build/buildcmd/cmd_sdk.go and pkg/release/sdk.go to use pkg/sdk - Remove all non-library content: main.go, internal/, cmd/, docker/, scripts/, tasks/, tools/, .core/, .forgejo/, .woodpecker/, Taskfile.yml - Run go mod tidy to trim unused dependencies core/go is now a pure Go package suite (library only). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Claude <developers@lethean.io> Reviewed-on: #3	2026-02-16 14:23:45 +00:00
Claude	3269a773f4	feat: add Metal memory budget monitoring after each request Tracks model size at load time and checks Metal active memory after each generation. If usage exceeds 3× model size, forces double GC and cache clear as a safety net. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	8cdafc8d66	fix: add GC-based memory management for MLX array handles Go GC cannot see Metal/C memory pressure, so intermediate arrays from each forward pass accumulated without bound, causing OOM kills after 3-4 requests. Fix: runtime.SetFinalizer on every Array releases C handles when GC collects them, and runtime.GC() is forced every 4 tokens during generation. Also adds SetMemoryLimit(24GB) as a hard Metal ceiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	9688e086ca	fix: add Metal cache management to prevent memory growth - Add ClearCache() wrapping mlx_clear_cache - Clear Metal allocator cache every 8 tokens during generation - Set 16GB cache limit on backend init - Prevents GPU memory from growing unbounded during inference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	098f496364	fix: correct SDPA mask mode and slice logits to last position	2026-02-16 05:53:52 +00:00
Claude	bc28aad526	feat: add native MLX backend for Apple Silicon inference (pkg/mlx) CGo wrapper for mlx-c providing zero-Python Metal GPU inference. Includes Gemma 3 model architecture, BPE tokenizer, KV cache, composable sampling, and OpenAI-compatible serve command. Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00

6 commits