Wraps `mlx::core::contiguous()`. This is the correct fix for the Floats()/DataInt32() non-contiguous bug.
**GoLand Claude action required**: Bind `mlx_contiguous` and call it before `mlx_array_data_*` when the array is non-contiguous. Use `allow_col_major = false` to guarantee row-major layout.
Additionally, there's a **contiguity check** function:
```c
// Internal but available — checks flags().contiguous
int _mlx_array_is_contiguous(bool* res, const mlx_array arr);
int _mlx_array_is_row_contiguous(bool* res, const mlx_array arr);
int _mlx_array_is_col_contiguous(bool* res, const mlx_array arr);
```
**Recommended pattern** for Go `Floats()`:
1. Call `_mlx_array_is_row_contiguous()` to check
2. If not row-contiguous, call `mlx_contiguous(res, arr, false, stream)` to get a contiguous copy
3. Then read via `mlx_array_data_float32()`
Also available but less ideal:
-`mlx_copy(res, a, s)` — copies array but may preserve non-contiguous layout
-`mlx_flatten(res, a, start_axis, end_axis, s)` — flattens dimensions, forces contiguous
-`mlx_reshape(res, a, shape, shape_num, s)` — Go's current workaround, works but semantically wrong
**GoLand Claude action required**: Bind `mlx_cumsum` and implement proper TopP (nucleus) sampling. For standard TopP: `axis=-1, reverse=false, inclusive=true`.
Related cumulative ops also available: `mlx_cumprod`, `mlx_cummax`, `mlx_cummin`, `mlx_logcumsumexp`.
### `mlx_array_data_*` Does NOT Auto-Evaluate
**Source**: `array.cpp:536` calls C++ `mlx_array_get_(arr).data<float>()`, which (`array.h:372`) just does pointer arithmetic into the raw buffer — no implicit evaluation.
The header comment says "Array must be evaluated, otherwise returns NULL" but this is misleading. The C++ `data<T>()` accesses `buffer().raw_ptr()` which will crash or return garbage if the buffer hasn't been allocated yet (i.e., the array is unscheduled).
**Contrast with `mlx_array_item_*`**: These call C++ `item<T>()` which **does** trigger evaluation internally (`array.h:564-569`).
**GoLand Claude action**: The current `Materialise()` call before data access is correct and essential. Never skip it. Consider adding an assertion using `_mlx_array_is_available()` as a safety check.
The message format is: `"<exception_message> at <file>:<line>"` (hardcoded in `_mlx_error`, `error.cpp:37-49`).
- **No error codes** — just a free-form string
- **No categories** — the string is whatever `e.what()` produces from C++ exceptions
- The `" at <file>:<line>"` suffix is always appended and could be parsed, but the file/line refers to the mlx-c wrapper code, not the original error site
- Default handler calls `printf()` then `exit(-1)` — the Go side MUST register a custom handler
- The `data` parameter with `dtor` allows attaching cleanup state (Go could pass a context pointer here)
**GoLand Claude note**: For the refactor from `checkError()` to proper error returns, the best approach is to have the error handler store the last error string (already done), and the Go wrapper functions check the return code (0=success, 1=error) to decide whether to read the stored error. No structured error info is available.
### Memory Management — Complete Picture
#### `mlx_array_free()` — Safe on Graph-Referenced Arrays
**Source**: `private/array.h:49-53`
```cpp
inline void mlx_array_free_(mlx_array d) {
if (d.ctx) {
delete static_cast<mlx::core::array*>(d.ctx);
}
}
```
This deletes the C++ `mlx::core::array` object. But `mlx::core::array` uses `std::shared_ptr<ArrayDesc> array_desc_` (`array.h:522`) internally. So:
- **Graph safety**: Freeing a C handle just decrements the refcount. If the computation graph still holds references (via other arrays' input lists), the data survives. **Safe to free intermediates.**
- **Double-free is NOT safe**: Calling `mlx_array_free()` twice on the same handle calls `delete` twice on the same pointer — undefined behaviour. The Go finaliser must only run once per handle.
- **Free during async operations**: Safe because of refcounting. Async computation holds its own shared_ptr references.
- **NULL-safe**: Checks `d.ctx` before delete, so freeing an empty handle (ctx=NULL) is safe.
#### `mlx_clear_cache()` — Releases Allocator Pool
This releases memory from the allocator's cache pool back to the system. It does NOT release active memory (arrays still in use). Safe to call mid-generation — it only frees allocations that are no longer referenced.
**GoLand Claude note**: Call `mlx_clear_cache()` periodically during generation to prevent memory growth. The allocator pool reuses freed allocations, so under sustained inference, memory should plateau even without explicit cache clears, but clearing helps when switching between different-sized operations.
#### Full Memory API
```c
int mlx_clear_cache(void); // Release cached memory to system
int mlx_get_active_memory(size_t* res); // Currently allocated bytes
int mlx_get_cache_memory(size_t* res); // Cached (reusable) bytes
int mlx_get_peak_memory(size_t* res); // High-water mark
int mlx_reset_peak_memory(void); // Reset high-water mark
int mlx_set_cache_limit(size_t* res, size_t limit); // Max cache size (returns previous)
int mlx_set_memory_limit(size_t* res, size_t limit); // Max total memory (returns previous)
int mlx_set_wired_limit(size_t* res, size_t limit); // Max wired memory (returns previous)
```
**GoLand Claude action**: The Go side should bind `mlx_get_active_memory`, `mlx_get_cache_memory`, `mlx_get_peak_memory`, `mlx_reset_peak_memory`, and `mlx_set_wired_limit` — these are all useful for memory diagnostics. `mlx_set_cache_limit` and `mlx_set_memory_limit` appear to already be bound.
Returns GPU hardware info — architecture name, max buffer size, recommended working set, total memory. Useful for model loading decisions (e.g., choosing model size based on available memory).
**GoLand Claude action**: Consider binding `mlx_metal_device_info()` for automatic model selection.
### Stream API Notes
The Go side uses `mlx_default_gpu_stream_new()`. Additional available:
-`mlx_synchronize(stream)` — block until all ops on stream complete
-`mlx_stream_new_device(dev)` — create stream on specific device
-`mlx_default_cpu_stream_new()` — for CPU fallback ops