go-rocm

AMD ROCm GPU inference backend for Linux. Implements inference.Backend and inference.TextModel from core/go-inference using llama.cpp's server mode with HIP/ROCm.

Quick Links

Environment — Hardware, ROCm, llama-server setup (validated 19 Feb 2026)
Architecture — How it works, design decisions, file layout
Interface-Contract — go-inference types this package must implement
Models — Available GGUF models and VRAM budget
Fleet-Context — How this repo fits in the wider agent fleet

Status

Phase	Status	Notes
Phase 0: Environment	Done (Charon, 19 Feb)	ROCm 7.2, llama-server built, baselines recorded
Phase 1: Core Implementation	Pending	GPU detection, server lifecycle, HTTP client, TextModel
Phase 2: Robustness	Pending	Crash recovery, graceful shutdown, VRAM monitoring
Phase 3: Model Support	Pending	GGUF discovery, chat templates, context sizing
Phase 4: Performance	Pending	Benchmarks, flash attention, batch inference

Module

forge.lthn.ai/core/go-rocm

Depends on: forge.lthn.ai/core/go-inference (shared interfaces, zero deps)