go-rocm
AMD ROCm GPU inference backend for Linux. Implements inference.Backend and inference.TextModel from core/go-inference using llama.cpp's server mode with HIP/ROCm.
Quick Links
- Environment — Hardware, ROCm, llama-server setup (validated 19 Feb 2026)
- Architecture — How it works, design decisions, file layout
- Interface-Contract — go-inference types this package must implement
- Models — Available GGUF models and VRAM budget
- Fleet-Context — How this repo fits in the wider agent fleet
Status
| Phase |
Status |
Notes |
| Phase 0: Environment |
Done (Charon, 19 Feb) |
ROCm 7.2, llama-server built, baselines recorded |
| Phase 1: Core Implementation |
Pending |
GPU detection, server lifecycle, HTTP client, TextModel |
| Phase 2: Robustness |
Pending |
Crash recovery, graceful shutdown, VRAM monitoring |
| Phase 3: Model Support |
Pending |
GGUF discovery, chat templates, context sizing |
| Phase 4: Performance |
Pending |
Benchmarks, flash attention, batch inference |
Module
forge.lthn.ai/core/go-rocm
Depends on: forge.lthn.ai/core/go-inference (shared interfaces, zero deps)