From f2ca7fe188131fc7840bbddfd330e66d61d02540 Mon Sep 17 00:00:00 2001
From: Snider <snider@host.uk.com>
Date: Thu, 19 Feb 2026 21:01:45 +0000
Subject: [PATCH] docs(cpp): add on-demand research tasks for CLion Claude
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version bump validation, batch evaluation patterns, GPU profiling,
quantised matmul variants, and async/streaming patterns — all
activate when the Go side needs them.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 cpp/TODO.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/cpp/TODO.md b/cpp/TODO.md
index acdbf88..da73c5e 100644
--- a/cpp/TODO.md
+++ b/cpp/TODO.md
@@ -28,6 +28,14 @@ Tasks for the CLion Claude session. Written by GoLand Claude or Virgil.
 
 - [ ] **API gap analysis** — When the GoLand Claude needs a C function that isn't exposed by mlx-c, document the gap here and research if upstream mlx-c supports it or if a patch is needed.
 
+## On-Demand Tasks (activate when needed)
+
+- [ ] **mlx-c version bump validation** — When upstream mlx-c releases v0.4.2+ or v0.5.0, update `CMakeLists.txt` GIT_TAG, rebuild, and document any API changes, additions, or breaking changes. Check if new ops could benefit go-mlx (e.g. fused attention variants, new quantisation modes).
+- [ ] **Batch evaluation patterns** — Research how mlx-c handles multiple independent forward passes. Does `mlx_eval` with a vector of arrays from separate graphs batch them? Or does MLX need explicit batching at the tensor level? Needed for Phase 5 batch inference API.
+- [ ] **GPU profiling/capture** — Document `mlx_metal_start_capture()` / `mlx_metal_stop_capture()` usage for GPU debugging. Research how to generate Metal GPU traces for performance analysis of the inference pipeline.
+- [ ] **Quantised matmul variants** — Survey all quantisation-related functions in ops.h beyond what Go currently binds. Document supported bit widths (2-bit, 3-bit?), group sizes, and affine vs symmetric modes. Relevant for model quantisation awareness (Phase 5).
+- [ ] **Streaming/async patterns** — Research `mlx_async_eval` and multi-stream patterns. Can separate encode/decode streams overlap? Does MLX support concurrent GPU work from multiple goroutines via separate streams?
+
 ---
 
 ## Workflow