1
Home
Claude edited this page 2026-02-19 20:07:58 +00:00
Table of Contents
go-rocm
AMD ROCm GPU inference backend for Linux. Implements inference.Backend and inference.TextModel from core/go-inference using llama.cpp's server mode with HIP/ROCm.
Quick Links
- Environment — Hardware, ROCm, llama-server setup (validated 19 Feb 2026)
- Architecture — How it works, design decisions, file layout
- Interface-Contract — go-inference types this package must implement
- Models — Available GGUF models and VRAM budget
- Fleet-Context — How this repo fits in the wider agent fleet
Status
| Phase | Status | Notes |
|---|---|---|
| Phase 0: Environment | Done (Charon, 19 Feb) | ROCm 7.2, llama-server built, baselines recorded |
| Phase 1: Core Implementation | Pending | GPU detection, server lifecycle, HTTP client, TextModel |
| Phase 2: Robustness | Pending | Crash recovery, graceful shutdown, VRAM monitoring |
| Phase 3: Model Support | Pending | GGUF discovery, chat templates, context sizing |
| Phase 4: Performance | Pending | Benchmarks, flash attention, batch inference |
Module
forge.lthn.ai/core/go-rocm
Depends on: forge.lthn.ai/core/go-inference (shared interfaces, zero deps)