- CLAUDE.md: document coreerr.E() error handling and go-io exclusion - server_test.go: replace fmt.Errorf with coreerr.E() in test fixtures - gguf_test.go: add tests for v2 format, skipValue (all type branches), readTypedValue uint64 path, unsupported version, truncated file - discover_test.go: add test for corrupt GGUF file skipping - vram_test.go: add tests for invalid/empty sysfs content Coverage: 65.8% → 79.2% (+13.4%) Co-Authored-By: Virgil <virgil@lethean.io>
3.3 KiB
3.3 KiB
CLAUDE.md
What This Is
AMD ROCm GPU inference for Linux. Module: forge.lthn.ai/core/go-rocm
Implements inference.Backend and inference.TextModel (from core/go-inference) using llama.cpp compiled with HIP/ROCm. Targets AMD RDNA 3+ GPUs.
Target Hardware
- GPU: AMD Radeon RX 7800 XT (gfx1100, RDNA 3, 16 GB VRAM) — confirmed gfx1100, not gfx1101
- OS: Ubuntu 24.04 LTS (linux/amd64)
- ROCm: 7.2.0 installed
- Kernel: 6.17.0
Commands
go test ./... # Unit tests (no GPU required)
go test -tags rocm ./... # Integration tests + benchmarks (GPU required)
go test -tags rocm -v -run TestROCm ./... # Full GPU tests only
go test -tags rocm -bench=. -benchtime=3x ./... # Benchmarks
Architecture
See docs/architecture.md for full detail.
go-rocm/
├── backend.go inference.Backend (linux && amd64)
├── model.go inference.TextModel (linux && amd64)
├── server.go llama-server subprocess lifecycle
├── vram.go VRAM monitoring via sysfs
├── discover.go GGUF model discovery
├── register_rocm.go auto-registers via init() (linux && amd64)
├── rocm_stub.go stubs for non-linux/non-amd64
└── internal/
├── llamacpp/ llama-server HTTP client + health check
└── gguf/ GGUF v2/v3 binary metadata parser
Critical: iGPU Crash
The Ryzen 9 9950X iGPU appears as ROCm Device 1. llama-server crashes trying to split tensors across it. serverEnv() always sets HIP_VISIBLE_DEVICES=0. Do not remove or weaken this.
Building llama-server with ROCm
cmake -B build \
-DGGML_HIP=ON \
-DAMDGPU_TARGETS=gfx1100 \
-DGGML_HIP_ROCWMMA_FATTN=ON \
-DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel $(nproc) -t llama-server
sudo cp build/bin/llama-server /usr/local/bin/llama-server
Environment Variables
| Variable | Default | Purpose |
|---|---|---|
ROCM_LLAMA_SERVER_PATH |
PATH lookup | Path to llama-server binary |
HIP_VISIBLE_DEVICES |
overridden to 0 |
Always forced to 0 — do not rely on ambient value |
Coding Standards
- UK English
- Tests: testify assert/require
- Build tags:
linux && amd64for GPU code,rocmfor integration tests - Errors:
coreerr.E("pkg.Func", "what failed", err)viago-log, neverfmt.Errorforerrors.New - File I/O:
ospackage used directly —go-ionot imported (its transitive deps are too heavy for a GPU inference module) - Conventional commits
- Co-Author:
Co-Authored-By: Virgil <virgil@lethean.io> - Licence: EUPL-1.2
Coordination
- Virgil (core/go) is the orchestrator — writes tasks and reviews PRs
- go-mlx is the sibling — Metal backend on macOS, same interface contract
- go-inference defines the shared TextModel/Backend interfaces both backends implement
- go-ml wraps both backends into the scoring engine
Documentation
docs/architecture.md— component design, data flow, interface contractsdocs/development.md— prerequisites, test commands, benchmarks, coding standardsdocs/history.md— completed phases, commit hashes, known limitationsdocs/plans/— phase design documents (read-only reference)