Commit graph

6 commits

Author SHA1 Message Date
Snider
4669cc503d refactor: replace fmt.Errorf/errors.New with coreerr.E()
Some checks failed
Security Scan / security (push) Successful in 8s
Test / Vet & Build (push) Failing after 23s
Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-16 21:08:52 +00:00
Claude
b03f357f5d
feat: implement Classify, BatchGenerate, Info, Metrics on rocmModel
Some checks failed
Security Scan / security (push) Successful in 10s
Test / Vet & Build (push) Failing after 34s
Brings rocmModel into compliance with the updated inference.TextModel
interface from go-inference.

- Classify: simulates prefill-only via max_tokens=1, temperature=0
- BatchGenerate: sequential autoregressive per prompt via /v1/completions
- Info: populates ModelInfo from GGUF metadata (architecture, layers, quant)
- Metrics: captures timing + VRAM usage via sysfs after each operation
- Refactors duplicate server-exit error handling into setServerExitErr()
- Adds timing instrumentation to existing Generate and Chat methods

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-24 18:50:37 +00:00
Claude
a6e647c5b7
test: graceful shutdown and concurrent request integration tests
Clear lastErr at the start of each Generate/Chat call so that Err()
reflects the most recent call, not a stale cancellation from a prior one.

Add two integration tests:
- GracefulShutdown: cancel mid-stream then generate again on the same
  model, verifying the server survives cancellation.
- ConcurrentRequests: three goroutines calling Generate() simultaneously,
  verifying no panics or deadlocks (llama-server serialises via slots).

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:50:47 +00:00
Claude
c07f37afe9
fix: guard nil exitErr wrapping, document concurrency invariant
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:38:01 +00:00
Claude
2c4966e652
feat: detect server crash before Generate/Chat calls
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:34:46 +00:00
Claude
a8c494771d
feat: TextModel implementation wrapping llama-server
rocmModel implements inference.TextModel with Generate() and Chat()
methods that delegate to the llamacpp HTTP client, mapping go-inference
types to llama-server's OpenAI-compatible API. Token streaming via
iter.Seq[inference.Token] with mutex-protected error propagation.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:11:55 +00:00