- Rename New() → newArray() to signal internal-only intent (112 usages) - Remove unused Collect() function and its test - Fix discarded json.Unmarshal error in qwen3.go - Document AsStrided stride formula in gemma3.go Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4.4 KiB
4.4 KiB
TODO.md — go-mlx C++ Task Queue
Tasks for the CLion Claude session. Written by GoLand Claude or Virgil.
Orientation (First Session)
- Map the mlx-c API surface — Read all 27 headers. Full map in FINDINGS.md: ~180 ops functions, Go binds ~40. Identified 8 high-priority unbound functions. (Done 2026-02-19)
- Understand the error model — Free-form strings only (
"<msg> at <file>:<line>"). No error codes or categories. Handler stores string, Go checks return code. Details in FINDINGS.md. (Done 2026-02-19) - Check memory management patterns — Arrays are refcounted via
shared_ptr<ArrayDesc>. Double-free is UB. Free during async is safe. NULL-free is safe. Details in FINDINGS.md. (Done 2026-02-19)
Priority Tasks (from GoLand Claude)
- Find
mlx_contiguousor equivalent — FOUND:mlx_contiguous(res, a, allow_col_major, stream)atops.h:220. Plus_mlx_array_is_row_contiguous()for checking. GoLand Claude: see FINDINGS.md for recommended pattern. (Done 2026-02-19) - Verify
mlx_array_data_*eval semantics — Does NOT auto-evaluate. Returns raw buffer pointer (crash/garbage if unevaluated).Materialise()before data access is essential.item()auto-evaluates butdata()does not. (Done 2026-02-19) - Check if
mlx_cumsumexists — FOUND:mlx_cumsum(res, a, axis, reverse, inclusive, stream)atops.h:344. GoLand Claude can now implement proper TopP sampling. (Done 2026-02-19) - Survey
mlx_contiguous/mlx_flatten/mlx_copy— All three exist.mlx_contiguousis the correct tool (forces row-major).mlx_copymay preserve non-contiguous layout.mlx_flattenworks but changes shape semantics. (Done 2026-02-19)
Memory Management Research (from Backend Abstraction Design)
- What does
mlx_clear_cache()release? — Releases allocator pool cache back to system. Does NOT touch active memory. Safe mid-generation. (Done 2026-02-19) - Is
mlx_array_free()safe on graph-referenced arrays? — Yes, safe. Arrays useshared_ptr<ArrayDesc>. Freeing the C handle just decrements refcount. Graph computation continues normally. (Done 2026-02-19) - MLX allocator pool behaviour —
mlx_array_free()returns memory to internal pool (not system). Pool reuses allocations. Under sustained inference, memory should plateau. Callmlx_clear_cache()to release pool to system if needed. (Done 2026-02-19) - Research structured error info — No structured info available. Free-form string only. Format is stable:
"<message> at <file>:<line>". GoLand Claude should use return code (0/1) + stored error string pattern. (Done 2026-02-19)
Standing Tasks
- API gap analysis — When the GoLand Claude needs a C function that isn't exposed by mlx-c, document the gap here and research if upstream mlx-c supports it or if a patch is needed.
On-Demand Tasks (activate when needed)
- mlx-c version bump validation — When upstream mlx-c releases v0.4.2+ or v0.5.0, update
CMakeLists.txtGIT_TAG, rebuild, and document any API changes, additions, or breaking changes. Check if new ops could benefit go-mlx (e.g. fused attention variants, new quantisation modes). - Batch evaluation patterns — Research how mlx-c handles multiple independent forward passes. Does
mlx_evalwith a vector of arrays from separate graphs batch them? Or does MLX need explicit batching at the tensor level? Needed for Phase 5 batch inference API. - GPU profiling/capture — Document
mlx_metal_start_capture()/mlx_metal_stop_capture()usage for GPU debugging. Research how to generate Metal GPU traces for performance analysis of the inference pipeline. - Quantised matmul variants — Survey all quantisation-related functions in ops.h beyond what Go currently binds. Document supported bit widths (2-bit, 3-bit?), group sizes, and affine vs symmetric modes. Relevant for model quantisation awareness (Phase 5).
- Streaming/async patterns — Research
mlx_async_evaland multi-stream patterns. Can separate encode/decode streams overlap? Does MLX support concurrent GPU work from multiple goroutines via separate streams?
Workflow
- GoLand Claude or Virgil writes tasks here
- Pick up in order, mark
[x]when done - newArray findings →
cpp/FINDINGS.md - If Go changes needed → note in FINDINGS.md for GoLand Claude