go-rocm

Author	SHA1	Message	Date
Claude	72120bb200	feat: pass --parallel N to llama-server for concurrent inference slots Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 23:13:19 +00:00
Claude	2c77f6f968	feat: use GGUF metadata for model type and context window auto-detection Replaces filename-based guessModelType with GGUF header parsing. Caps default context at 4096 to prevent VRAM exhaustion on models with 128K+ native context. Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 22:23:07 +00:00
Claude	c50a8e9e9b	feat: retry port selection in startServer on process failure Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 21:40:05 +00:00
Claude	2c4966e652	feat: detect server crash before Generate/Chat calls Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 21:34:46 +00:00
Claude	1d8d65f55b	feat: Backend Available() and LoadModel() with GPU detection Replace stub backend with real implementation: Available() checks /dev/kfd and llama-server presence, LoadModel() wires up server lifecycle to return a rocmModel. Add guessModelType() for architecture detection from GGUF filenames (handles hyphenated variants like Llama-3). Add TestAvailable and TestGuessModelType. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 21:12:02 +00:00
Claude	9aa7f624ba	feat: server lifecycle and helpers for llama-server subprocess Adds server.go with the process lifecycle layer that manages spawning llama-server, waiting for readiness, and graceful shutdown. Includes three helper functions (findLlamaServer, freePort, serverEnv) and the full startServer/waitReady/stop lifecycle. The serverEnv function critically filters HIP_VISIBLE_DEVICES to mask the Ryzen 9 iGPU which crashes llama-server if not excluded. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 21:08:07 +00:00

6 commits