Use sync.Once to ensure resp.Body is closed exactly once, preventing
TCP connection leaks when the iterator is never consumed and
double-close when iterated twice. Also adds Accept: text/event-stream
header to both SSE endpoints.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TestComplete_Streaming (multi-chunk SSE with three tokens) and
TestComplete_HTTPError (400 status propagation) to exercise the
Complete() method alongside the existing chat tests.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ChatComplete() and Complete() methods to the llamacpp Client,
backed by a shared parseSSE() line parser. Types include ChatMessage,
ChatRequest, CompletionRequest and their chunked response structs.
Tests cover multi-chunk streaming, empty responses, HTTP errors, and
context cancellation — all using httptest SSE servers.
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add internal/llamacpp package with Client type and Health() method.
Client communicates with llama-server via HTTP; Health checks the
/health endpoint and reports readiness. Foundation type for the
streaming methods (Tasks 2-3).
Co-Authored-By: Virgil <virgil@lethean.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>