Replaces filename-based guessModelType with GGUF header parsing.
Caps default context at 4096 to prevent VRAM exhaustion on models
with 128K+ native context.
Co-Authored-By: Virgil <virgil@lethean.io>
Replace stub backend with real implementation: Available() checks
/dev/kfd and llama-server presence, LoadModel() wires up server
lifecycle to return a rocmModel. Add guessModelType() for architecture
detection from GGUF filenames (handles hyphenated variants like
Llama-3). Add TestAvailable and TestGuessModelType.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>