go-rocm

Author	SHA1	Message	Date
Snider	4669cc503d	refactor: replace fmt.Errorf/errors.New with coreerr.E() Some checks failed Security Scan / security (push) Successful in 8s Details Test / Vet & Build (push) Failing after 23s Details Co-Authored-By: Virgil <virgil@lethean.io>	2026-03-16 21:08:52 +00:00
Claude	b03f357f5d	feat: implement Classify, BatchGenerate, Info, Metrics on rocmModel Some checks failed Security Scan / security (push) Successful in 10s Details Test / Vet & Build (push) Failing after 34s Details Brings rocmModel into compliance with the updated inference.TextModel interface from go-inference. - Classify: simulates prefill-only via max_tokens=1, temperature=0 - BatchGenerate: sequential autoregressive per prompt via /v1/completions - Info: populates ModelInfo from GGUF metadata (architecture, layers, quant) - Metrics: captures timing + VRAM usage via sysfs after each operation - Refactors duplicate server-exit error handling into setServerExitErr() - Adds timing instrumentation to existing Generate and Chat methods Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-24 18:50:37 +00:00
Claude	a6e647c5b7	test: graceful shutdown and concurrent request integration tests Clear lastErr at the start of each Generate/Chat call so that Err() reflects the most recent call, not a stale cancellation from a prior one. Add two integration tests: - GracefulShutdown: cancel mid-stream then generate again on the same model, verifying the server survives cancellation. - ConcurrentRequests: three goroutines calling Generate() simultaneously, verifying no panics or deadlocks (llama-server serialises via slots). Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 21:50:47 +00:00
Claude	c07f37afe9	fix: guard nil exitErr wrapping, document concurrency invariant Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 21:38:01 +00:00
Claude	2c4966e652	feat: detect server crash before Generate/Chat calls Co-Authored-By: Virgil <virgil@lethean.io>	2026-02-19 21:34:46 +00:00
Claude	a8c494771d	feat: TextModel implementation wrapping llama-server rocmModel implements inference.TextModel with Generate() and Chat() methods that delegate to the llamacpp HTTP client, mapping go-inference types to llama-server's OpenAI-compatible API. Token streaming via iter.Seq[inference.Token] with mutex-protected error propagation. Co-Authored-By: Virgil <virgil@lethean.io> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 21:11:55 +00:00

6 commits