go-mlx/cpp/TODO.md
Snider c881813872 docs(design): backend abstraction with internal/metal reorganisation
Approved design for restructuring go-mlx:
- Root package becomes clean interface (TextModel, LoadModel, Token)
- All CGO code moves to internal/metal/
- Deterministic memory management (Close + per-step cleanup)
- Error propagation instead of silent logging
- mlxlm/ backend placeholder for Python subprocess support

Includes API breaking change communication in FINDINGS.md and
memory management research tasks in cpp/TODO.md.

See: docs/plans/2026-02-19-backend-abstraction-design.md

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 19:12:04 +00:00

3.7 KiB

TODO.md — go-mlx C++ Task Queue

Tasks for the CLion Claude session. Written by GoLand Claude or Virgil.


Orientation (First Session)

  • Map the mlx-c API surface — Read all 27 headers in build/_deps/mlx-c-src/mlx/c/. Document which functions the Go side currently binds (cross-reference with Go files) vs which are available but unused. Priority headers: ops.h, fast.h, array.h, transforms.h.
  • Understand the error modelerror.h provides mlx_set_error_handler(). The Go side registers a handler that logs to stderr. Research: can we get structured error info (error codes, categories)? Is the error string stable or does it vary?
  • Check memory management patternsmlx_*_free() functions exist for each type. Verify: is double-free safe? What happens if you free during async eval? Document for the Go finaliser integration.

Priority Tasks (from GoLand Claude)

  • Find mlx_contiguous or equivalentFloats()/DataInt32() on non-contiguous arrays (transpose, broadcast, slice views) returns wrong data because mlx_array_data_float32 returns the physical buffer, not the logical layout. Need a C function that copies a non-contiguous array to contiguous memory. Check if mlx_contiguous exists in mlx-c headers or if we need mlx_reshape to force a copy. This is a data correctness bug — see FINDINGS.md in project root.
  • Verify mlx_array_data_* eval semantics — Does mlx_array_data_float32() trigger evaluation (like C++ array::data() does), or must we call mlx_eval first? The Go side calls Materialize() before data access but some code paths might skip it.
  • Check if mlx_cumsum exists — The Go TopP (nucleus) sampler in sample/sample.go is a stub because it needs cumulative sum along an axis. Check if mlx_cumsum or equivalent is in ops.h. If so, the GoLand Claude can implement proper TopP sampling.
  • Survey mlx_contiguous / mlx_flatten / mlx_copy — We need a way to force an array into contiguous row-major memory. Check all of: mlx_contiguous, mlx_flatten, mlx_copy, mlx_as_contiguous. Any of these would fix the Floats() bug.

Memory Management Research (from Backend Abstraction Design)

  • What does mlx_clear_cache() release? — The Go side needs to call this per decode step to free intermediate arrays (logits, attention, MLP activations). Does it release GPU memory? Does it release the allocator pool? Can it be called safely mid-generation without breaking the computation graph?
  • Is mlx_array_free() safe on graph-referenced arrays? — During generation, intermediate arrays may be inputs to other pending lazy ops. If we free an intermediate after materialising it, does MLX handle the reference correctly or does it segfault? This determines whether we can do per-step deterministic cleanup.
  • MLX allocator pool behaviour — Does mlx_array_free() return memory to the system or to an internal pool? Under sustained inference (1000+ tokens), we need memory to plateau, not grow. Document the allocator's reuse strategy.
  • Research structured error info — The Go side is moving from checkError() (log and swallow) to proper error returns. Can mlx_set_error_handler() give us error codes or categories, or is it always a free-form string? Is the string format stable across mlx-c versions?

Standing Tasks

  • API gap analysis — When the GoLand Claude needs a C function that isn't exposed by mlx-c, document the gap here and research if upstream mlx-c supports it or if a patch is needed.

Workflow

  1. GoLand Claude or Virgil writes tasks here
  2. Pick up in order, mark [x] when done
  3. New findings → cpp/FINDINGS.md
  4. If Go changes needed → note in FINDINGS.md for GoLand Claude