Interface Contract

go-rocm must implement these interfaces from forge.lthn.ai/core/go-inference.

Backend

type Backend interface {
    Name() string                                          // Return "rocm"
    LoadModel(path string, opts ...LoadOption) (TextModel, error)
    Available() bool                                       // Check hardware + binary
}

Available() checks

/dev/kfd exists (ROCm kernel driver loaded)
rocm-smi detects a GPU (optional — may be slow)
llama-server binary findable (PATH or ROCM_LLAMA_SERVER_PATH env)

TextModel

type TextModel interface {
    Generate(ctx context.Context, prompt string, opts ...GenerateOption) iter.Seq[Token]
    Chat(ctx context.Context, messages []Message, opts ...GenerateOption) iter.Seq[Token]
    ModelType() string    // e.g. "gemma3", "qwen3", "llama3"
    Err() error           // Check after iterator stops
    Close() error         // SIGTERM llama-server, wait for exit
}

Key behaviours

Generate() and Chat() return iter.Seq[Token] — Go 1.23+ range-over-func
iter.Seq cannot carry errors — consumers must check Err() after the loop
context.Context enables cancellation (close SSE stream, don't kill server)
Close() sends SIGTERM to llama-server subprocess and waits for clean exit
ModelType() should parse from GGUF metadata or llama-server /props endpoint

Token

type Token struct {
    ID   int32
    Text string
}

Message

type Message struct {
    Role    string // "system", "user", "assistant"
    Content string
}

GenerateConfig (via options)

type GenerateConfig struct {
    MaxTokens     int       // Default: 256
    Temperature   float32   // Default: 0.0 (greedy)
    TopK          int
    TopP          float32
    StopTokens    []int32
    RepeatPenalty float32
}

Map these to llama-server's OpenAI-compatible API fields:

MaxTokens → max_tokens
Temperature → temperature
TopK → top_k (llama.cpp extension)
TopP → top_p
RepeatPenalty → repeat_penalty (llama.cpp extension)

LoadConfig (via options)

type LoadConfig struct {
    Backend    string  // "rocm" (or empty for auto)
    ContextLen int     // → --ctx-size (0 = model default)
    GPULayers  int     // → --n-gpu-layers (-1 = all)
}

Registration

Already done in register_rocm.go:

//go:build linux && amd64

func init() {
    inference.Register(&rocmBackend{})
}

Source

The full interface code is at /home/claude/Code/core/go-inference/inference.go and options.go.