Commit graph

4 commits

Author SHA1 Message Date
Claude
a6e647c5b7
test: graceful shutdown and concurrent request integration tests
Clear lastErr at the start of each Generate/Chat call so that Err()
reflects the most recent call, not a stale cancellation from a prior one.

Add two integration tests:
- GracefulShutdown: cancel mid-stream then generate again on the same
  model, verifying the server survives cancellation.
- ConcurrentRequests: three goroutines calling Generate() simultaneously,
  verifying no panics or deadlocks (llama-server serialises via slots).

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:50:47 +00:00
Claude
c07f37afe9
fix: guard nil exitErr wrapping, document concurrency invariant
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:38:01 +00:00
Claude
2c4966e652
feat: detect server crash before Generate/Chat calls
Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:34:46 +00:00
Claude
a8c494771d
feat: TextModel implementation wrapping llama-server
rocmModel implements inference.TextModel with Generate() and Chat()
methods that delegate to the llamacpp HTTP client, mapping go-inference
types to llama-server's OpenAI-compatible API. Token streaming via
iter.Seq[inference.Token] with mutex-protected error propagation.

Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:11:55 +00:00