Add ChatComplete() and Complete() methods to the llamacpp Client,
backed by a shared parseSSE() line parser. Types include ChatMessage,
ChatRequest, CompletionRequest and their chunked response structs.
Tests cover multi-chunk streaming, empty responses, HTTP errors, and
context cancellation — all using httptest SSE servers.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add internal/llamacpp package with Client type and Health() method.
Client communicates with llama-server via HTTP; Health checks the
/health endpoint and reports readiness. Foundation type for the
streaming methods (Tasks 2-3).
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Token.ID = 0 acceptable for Phase 1 (no consumer uses it)
- StopTokens: ignore in Phase 1 (YAGNI)
- serverEnv() should filter existing HIP_VISIBLE_DEVICES before appending
- guessModelType() fine for now, upgrade to /props endpoint in Phase 2
- Integration test build tag approach approved
Charon, 19 Feb 2026
Co-Authored-By: Virgil <virgil@lethean.io>