Backend Registry

Module: forge.lthn.ai/core/go-inference

The backend registry manages inference engine implementations. Backends self-register via init() with build tags, enabling platform-specific GPU acceleration without import-time coupling.

Registry Functions

Function	Signature	Description
`Register`	`Register(b Backend)`	Add a backend (called from init())
`Get`	`Get(name string) (Backend, bool)`	Retrieve by name
`List`	`List() []string`	All registered backend names (sorted)
`All`	`All() iter.Seq2[string, Backend]`	Iterator over all backends
`Default`	`Default() (Backend, error)`	First available backend (prefers metal > rocm > llama_cpp)
`LoadModel`	`LoadModel(path string, opts ...LoadOption) (TextModel, error)`	Load via specified or default backend

Load Options

Option	Description
`WithBackend(name)`	Select specific backend ("metal", "rocm", "llama_cpp")
`WithContextLen(n)`	Context window size (0 = model default)
`WithGPULayers(n)`	GPU layer offload count (-1 = all, 0 = none)
`WithParallelSlots(n)`	Concurrent inference slots
`WithAdapterPath(path)`	LoRA adapter directory

Generate Options

Option	Description
`WithMaxTokens(n)`	Cap output length (default 256)
`WithTemperature(t)`	Sampling temperature (0 = greedy)
`WithTopK(k)`	Top-k sampling (0 = disabled)
`WithTopP(p)`	Nucleus sampling threshold
`WithRepeatPenalty(p)`	Repetition penalty (1.0 = none)
`WithStopTokens(ids...)`	Token IDs that stop generation
`WithLogits()`	Return raw logits in ClassifyResult

Model Discovery

Discover(baseDir string) iter.Seq[DiscoveredModel] scans for model directories containing config.json + *.safetensors. Returns DiscoveredModel with path, architecture, quantisation info, and file count.