core/cli - Forgejo: Beyond coding. We Forge.

core/cli

Fork 0

Commit graph

Author	SHA1	Message	Date
Claude	8cdafc8d66	fix: add GC-based memory management for MLX array handles Go GC cannot see Metal/C memory pressure, so intermediate arrays from each forward pass accumulated without bound, causing OOM kills after 3-4 requests. Fix: runtime.SetFinalizer on every Array releases C handles when GC collects them, and runtime.GC() is forced every 4 tokens during generation. Also adds SetMemoryLimit(24GB) as a hard Metal ceiling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	5e2d941b4d	fix: correct 20 mlx-c API mismatches for v0.4.1 - Use _axis/_axes variants for softmax, argmax, topk, sum, mean, squeeze, concatenate, argpartition - Fix size_t vs int for count parameters throughout - Fix int64_t strides in as_strided - Add mlx_optional_int + mode param to quantized_matmul - Use mlx_array_new() for null arrays (freqs, key, mask, sinks) - Fix expand_dims to single-axis signature - Fix compile callback signature (size_t index) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00
Claude	bc28aad526	feat: add native MLX backend for Apple Silicon inference (pkg/mlx) CGo wrapper for mlx-c providing zero-Python Metal GPU inference. Includes Gemma 3 model architecture, BPE tokenizer, KV cache, composable sampling, and OpenAI-compatible serve command. Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 05:53:52 +00:00

Author

SHA1

Message

Date

Claude

8cdafc8d66

fix: add GC-based memory management for MLX array handles

Go GC cannot see Metal/C memory pressure, so intermediate arrays from
each forward pass accumulated without bound, causing OOM kills after
3-4 requests. Fix: runtime.SetFinalizer on every Array releases C
handles when GC collects them, and runtime.GC() is forced every 4
tokens during generation. Also adds SetMemoryLimit(24GB) as a hard
Metal ceiling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-16 05:53:52 +00:00

Claude

5e2d941b4d

fix: correct 20 mlx-c API mismatches for v0.4.1

- Use _axis/_axes variants for softmax, argmax, topk, sum, mean, squeeze,
  concatenate, argpartition
- Fix size_t vs int for count parameters throughout
- Fix int64_t strides in as_strided
- Add mlx_optional_int + mode param to quantized_matmul
- Use mlx_array_new() for null arrays (freqs, key, mask, sinks)
- Fix expand_dims to single-axis signature
- Fix compile callback signature (size_t index)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-16 05:53:52 +00:00

Claude

bc28aad526

feat: add native MLX backend for Apple Silicon inference (pkg/mlx)

CGo wrapper for mlx-c providing zero-Python Metal GPU inference.
Includes Gemma 3 model architecture, BPE tokenizer, KV cache,
composable sampling, and OpenAI-compatible serve command.

Build-tagged (darwin && arm64 && mlx) with stubs for cross-platform.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-16 05:53:52 +00:00

3 commits