go-rocm/CLAUDE.md
Snider 9aaa404397
Some checks failed
Security Scan / security (pull_request) Successful in 7s
Test / Vet & Build (pull_request) Failing after 21s
fix(dx): audit coding standards and add tests for untested paths
- CLAUDE.md: document coreerr.E() error handling and go-io exclusion
- server_test.go: replace fmt.Errorf with coreerr.E() in test fixtures
- gguf_test.go: add tests for v2 format, skipValue (all type branches),
  readTypedValue uint64 path, unsupported version, truncated file
- discover_test.go: add test for corrupt GGUF file skipping
- vram_test.go: add tests for invalid/empty sysfs content

Coverage: 65.8% → 79.2% (+13.4%)

Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-17 08:50:17 +00:00

3.3 KiB

CLAUDE.md

What This Is

AMD ROCm GPU inference for Linux. Module: forge.lthn.ai/core/go-rocm

Implements inference.Backend and inference.TextModel (from core/go-inference) using llama.cpp compiled with HIP/ROCm. Targets AMD RDNA 3+ GPUs.

Target Hardware

  • GPU: AMD Radeon RX 7800 XT (gfx1100, RDNA 3, 16 GB VRAM) — confirmed gfx1100, not gfx1101
  • OS: Ubuntu 24.04 LTS (linux/amd64)
  • ROCm: 7.2.0 installed
  • Kernel: 6.17.0

Commands

go test ./...                       # Unit tests (no GPU required)
go test -tags rocm ./...            # Integration tests + benchmarks (GPU required)
go test -tags rocm -v -run TestROCm ./...   # Full GPU tests only
go test -tags rocm -bench=. -benchtime=3x ./...  # Benchmarks

Architecture

See docs/architecture.md for full detail.

go-rocm/
├── backend.go           inference.Backend (linux && amd64)
├── model.go             inference.TextModel (linux && amd64)
├── server.go            llama-server subprocess lifecycle
├── vram.go              VRAM monitoring via sysfs
├── discover.go          GGUF model discovery
├── register_rocm.go     auto-registers via init() (linux && amd64)
├── rocm_stub.go         stubs for non-linux/non-amd64
└── internal/
    ├── llamacpp/        llama-server HTTP client + health check
    └── gguf/            GGUF v2/v3 binary metadata parser

Critical: iGPU Crash

The Ryzen 9 9950X iGPU appears as ROCm Device 1. llama-server crashes trying to split tensors across it. serverEnv() always sets HIP_VISIBLE_DEVICES=0. Do not remove or weaken this.

Building llama-server with ROCm

cmake -B build \
    -DGGML_HIP=ON \
    -DAMDGPU_TARGETS=gfx1100 \
    -DGGML_HIP_ROCWMMA_FATTN=ON \
    -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel $(nproc) -t llama-server
sudo cp build/bin/llama-server /usr/local/bin/llama-server

Environment Variables

Variable Default Purpose
ROCM_LLAMA_SERVER_PATH PATH lookup Path to llama-server binary
HIP_VISIBLE_DEVICES overridden to 0 Always forced to 0 — do not rely on ambient value

Coding Standards

  • UK English
  • Tests: testify assert/require
  • Build tags: linux && amd64 for GPU code, rocm for integration tests
  • Errors: coreerr.E("pkg.Func", "what failed", err) via go-log, never fmt.Errorf or errors.New
  • File I/O: os package used directly — go-io not imported (its transitive deps are too heavy for a GPU inference module)
  • Conventional commits
  • Co-Author: Co-Authored-By: Virgil <virgil@lethean.io>
  • Licence: EUPL-1.2

Coordination

  • Virgil (core/go) is the orchestrator — writes tasks and reviews PRs
  • go-mlx is the sibling — Metal backend on macOS, same interface contract
  • go-inference defines the shared TextModel/Backend interfaces both backends implement
  • go-ml wraps both backends into the scoring engine

Documentation

  • docs/architecture.md — component design, data flow, interface contracts
  • docs/development.md — prerequisites, test commands, benchmarks, coding standards
  • docs/history.md — completed phases, commit hashes, known limitations
  • docs/plans/ — phase design documents (read-only reference)