Benchmarks for embedding (Ollama ROCm), chunking (pure CPU), and search (Qdrant) latency. Key results: 97 embeds/sec single (10.3ms), Qdrant search 152µs (6.5K QPS), chunking 11µs per 50-section doc. EmbedBatch confirmed sequential — Ollama has no batch API. Co-Authored-By: Charon <developers@lethean.io>
14 KiB
FINDINGS.md — go-rag Research & Discovery
2026-02-19: Split from go-ai (Virgil)
Origin
Extracted from forge.lthn.ai/core/go-ai/rag/. Zero internal go-ai dependencies.
What Was Extracted
- 7 Go files (~1,017 LOC excluding tests)
- 1 test file (chunk_test.go)
Key Finding: Minimal Test Coverage
Only chunk.go has tests. The Qdrant and Ollama clients are untested — they depend on external services (Qdrant server, Ollama API) which makes unit testing harder. Consider mock interfaces.
Consumers
go-ai/ai/rag.gowraps this asQueryRAGForTask()facadego-ai/mcp/tools_rag.goexposes RAG as MCP tools
2026-02-19: Environment Review (Charon)
go.mod Fix
Replace directive was ../core — should be ../go. Fixed. Tests now pass.
Coverage
go-rag: 18.4% coverage (only chunk.go tested)
Infrastructure Status
| Service | Status | Notes |
|---|---|---|
| Qdrant | Not running | Need docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant |
| Ollama | Not running locally | M3 has Ollama at 10.69.69.108:11434, but local install preferred for tests |
Testability Analysis
| File | Lines | Testable Without Services | Notes |
|---|---|---|---|
| chunk.go | 205 | Yes — pure functions | 8 tests exist, good coverage |
| query.go | 163 | Partially — FormatResults* are pure | Query() needs Qdrant + Ollama |
| qdrant.go | 226 | No — all gRPC calls | Need live Qdrant or mock interface |
| ollama.go | 120 | Partially — EmbedDimension is pure | Embed() needs live Ollama |
| ingest.go | 217 | No — orchestrates Qdrant + Ollama | Need mocks or live services |
| helpers.go | 89 | Partially — QueryDocs/IngestDirectory are convenience wrappers | Same deps as query/ingest |
Recommendation
Phase 1 should focus on pure-function tests (FormatResults*, EmbedDimension, defaults, valueToGo). Phase 2 extracts Embedder and VectorStore interfaces to enable mocked testing for ingest/query. Phase 3+ needs live services.
2026-02-20: Phase 1 Pure-Function Tests Complete (go-rag agent)
Coverage Improvement
Before: 18.4% (8 tests in chunk_test.go only)
After: 38.8% (66 tests across 4 test files)
Per-Function Coverage
All targeted pure functions now at 100% coverage:
| Function | File | Coverage |
|---|---|---|
| FormatResultsText | query.go | 100% |
| FormatResultsContext | query.go | 100% |
| FormatResultsJSON | query.go | 100% |
| DefaultQueryConfig | query.go | 100% |
| DefaultOllamaConfig | ollama.go | 100% |
| DefaultQdrantConfig | qdrant.go | 100% |
| DefaultChunkConfig | chunk.go | 100% |
| DefaultIngestConfig | ingest.go | 100% |
| EmbedDimension | ollama.go | 100% |
| Model | ollama.go | 100% |
| valueToGo | qdrant.go | 100% |
| ChunkID | chunk.go | 100% |
| ChunkMarkdown | chunk.go | 97.6% |
| pointIDToString | qdrant.go | 83.3% |
Discoveries
-
OllamaClient can be constructed with nil
clientfield for testing pure methods (EmbedDimension, Model). The struct fields are unexported but accessible within the same package. -
Qdrant protobuf constructors (
NewValueString,NewValueInt, etc.) make it straightforward to build test values forvalueToGowithout needing a live Qdrant connection. -
pointIDToString default branch (83.3%) — the uncovered path is a
PointIdwithPointIdOptionsset to an unknown type. This cannot be constructed via the public API (NewIDNumandNewIDUUIDare the only constructors), so the 83.3% is the practical maximum without reflection hacks. -
FormatResultsJSON output is valid JSON — confirmed by round-tripping through
json.Unmarshalin tests. The hand-crafted JSON builder inquery.gocorrectly handles escaping of special characters. -
ChunkMarkdown rune safety — the overlap logic in
chunk.gocorrectly uses[]runeslicing, confirmed by CJK text tests that would corrupt if byte-level slicing were used. -
Remaining 61.2% untested is entirely in functions that require live Qdrant or Ollama:
NewQdrantClient,Search,UpsertPoints,Embed,EmbedBatch,Ingest,IngestFile, and the helper wrappers. These are Phase 2 (mock interfaces) and Phase 3 (integration) targets.
Test Files Created
| File | Tests | What It Covers |
|---|---|---|
| query_test.go | 18 | FormatResultsText, FormatResultsContext, FormatResultsJSON, DefaultQueryConfig |
| ollama_test.go | 8 | DefaultOllamaConfig, EmbedDimension (5 models), Model |
| qdrant_test.go | 24 | DefaultQdrantConfig, pointIDToString, valueToGo (all types + nesting), Point, SearchResult |
| chunk_test.go (extended) | 16 new | Empty input, headers-only, unicode/emoji, long paragraphs, config boundaries, ChunkID edge cases, DefaultChunkConfig, DefaultIngestConfig |
2026-02-20: Phase 2 Test Infrastructure Complete (go-rag agent)
Coverage Improvement
Before: 38.8% (66 tests across 4 test files)
After: 69.0% (135 leaf-level tests across 7 test files)
Interface Extraction
Two interfaces extracted to decouple business logic from external services:
| Interface | File | Methods | Satisfied By |
|---|---|---|---|
Embedder |
embedder.go | Embed, EmbedBatch, EmbedDimension | *OllamaClient |
VectorStore |
vectorstore.go | CreateCollection, CollectionExists, DeleteCollection, UpsertPoints, Search | *QdrantClient |
Signature Changes
The following functions now accept interfaces instead of concrete types:
| Function | Old Signature | New Signature |
|---|---|---|
Ingest |
*QdrantClient, *OllamaClient |
VectorStore, Embedder |
IngestFile |
*QdrantClient, *OllamaClient |
VectorStore, Embedder |
Query |
*QdrantClient, *OllamaClient |
VectorStore, Embedder |
These changes are backwards-compatible because *QdrantClient satisfies VectorStore and *OllamaClient satisfies Embedder.
New Helper Functions
Added interface-accepting helpers that the convenience wrappers now delegate to:
| Function | Purpose |
|---|---|
QueryWith |
Query with provided store + embedder |
QueryContextWith |
Query + format as context XML |
IngestDirWith |
Ingest directory with provided store + embedder |
IngestFileWith |
Ingest single file with provided store + embedder |
Per-Function Coverage (Phase 2 targets)
| Function | File | Coverage | Notes |
|---|---|---|---|
| Ingest | ingest.go | 86.8% | Uncovered: filepath.Rel error branch |
| IngestFile | ingest.go | 100% | |
| Query | query.go | 100% | |
| QueryWith | helpers.go | 100% | |
| QueryContextWith | helpers.go | 100% | |
| IngestDirWith | helpers.go | 100% | |
| IngestFileWith | helpers.go | 100% |
Discoveries
-
Interface method signatures must match exactly --
EmbedDimension()returnsuint64(notint), andSearchtakeslimit uint64andfilter map[string]string(notlimit int, threshold float32). The task description suggested approximate signatures; the actual code was the source of truth. -
Convenience wrappers cannot be mocked --
QueryDocs,IngestDirectory,IngestSingleFileconstruct their own concrete clients internally. Added*Withvariants that accept interfaces for testability. The convenience wrappers now delegate to these. -
ChunkMarkdown preserves section headers in chunk text -- Small sections that fit within the chunk size include the
## Headerline in the chunk text. Tests must useContainsrather thanEqualwhen checking chunk text. -
Mock vector store score calculation -- The mock assigns scores as
1.0 - index*0.1, so the second stored point gets 0.9. Tests using threshold must account for this. -
Remaining 31% untested is entirely in concrete client implementations (QdrantClient methods, OllamaClient.Embed/EmbedBatch, NewOllamaClient, NewQdrantClient) and the convenience wrapper functions that construct live clients. These are Phase 3 (integration test with live services) targets.
Test Files Created/Modified
| File | New Tests | What It Covers |
|---|---|---|
| mock_test.go | -- | mockEmbedder + mockVectorStore implementations |
| ingest_test.go (new) | 23 | Ingest (17 subtests) + IngestFile (6 subtests) with mocks |
| query_test.go (extended) | 12 | Query function with mocks: embedding, search, threshold, errors, payload extraction |
| helpers_test.go (new) | 16 | QueryWith (4), QueryContextWith (3), IngestDirWith (4), IngestFileWith (5) |
New Source Files
| File | Purpose |
|---|---|
| embedder.go | Embedder interface definition |
| vectorstore.go | VectorStore interface definition |
2026-02-20: Phase 3 Integration Tests with Live Services (go-rag agent)
Coverage Improvement
Before: 69.0% (135 tests across 7 test files, mock-based only)
After: 89.2% (204 tests across 10 test files, includes live Qdrant + Ollama)
Infrastructure Verified
| Service | Version | Status | Connection |
|---|---|---|---|
| Qdrant | 1.16.3 | Running (Docker) | gRPC localhost:6334, REST localhost:6333 |
| Ollama | native + ROCm | Running | HTTP localhost:11434, model: nomic-embed-text (F16, 274MB) |
Discoveries
-
Qdrant point IDs must be valid UUIDs --
qdrant.NewID()wraps the string as a UUID field. Qdrant's server-side UUID parser accepts 32-character hex strings (as produced byChunkIDvia MD5) but rejects arbitrary strings likepoint-alpha. Error:Unable to parse UUID: point-alpha. Integration tests must useChunkID()or MD5 hex format for point IDs. -
Qdrant Go client version warning is benign -- The client library (v1.16.2) logs
WARN Unable to compare versionsandClient version is not compatible with server versionwhen connecting to Qdrant v1.16.3. This is a cosmetic mismatch in version parsing — all operations function correctly despite the warning. -
Qdrant indexing latency -- After upserting points, a 500ms sleep is needed before searching to avoid flaky results. For small datasets the indexing is nearly instant, but the sleep provides a safety margin on slower machines.
-
Ollama embedding determinism -- Embedding the same text twice with
nomic-embed-textproduces bit-identical vectors (float32level). This is important for idempotent ingest operations. -
Ollama accepts empty strings --
Embed(ctx, "")returns a valid 768-dimension vector without error. This is Ollama-specific behaviour and may differ with other embedding providers. -
Semantic similarity works as expected -- When ingesting both programming and cooking documents, a query about "Go functions and closures" correctly ranks the programming document highest. The cosine distance metric in Qdrant combined with nomic-embed-text embeddings provides meaningful semantic differentiation.
-
Convenience wrappers (QueryDocs, IngestDirectory) create their own gRPC connections -- Each call to
QueryDocsorIngestDirectoryestablishes a new Qdrant gRPC connection. In production this is fine for CLI commands, but for high-throughput scenarios the*Withvariants that accept pre-created clients should be preferred. -
Remaining ~11% untested -- The uncovered code is primarily error-handling branches in
NewQdrantClient(connection failure),Close(), and thefilepath.Relerror branch inIngest. These represent defensive code paths that are difficult to trigger in normal operation.
Test Files Created
| File | Tests | What It Covers |
|---|---|---|
| qdrant_integration_test.go | 11 | Health check, create/delete/list/info collection, exists check, upsert+search, filter, empty upsert, ID validation, overwrite |
| ollama_integration_test.go | 9 | Verify model, embed single, embed batch, consistency, dimension match, model name, different texts, non-zero values, empty string |
| integration_test.go | 12 | End-to-end ingest+query, format results, IngestFile, QueryWith, QueryContextWith, IngestDirWith, IngestFileWith, QueryDocs, IngestDirectory, recreate flag, semantic similarity |
Build Tag Strategy
All integration tests use //go:build rag to isolate them from CI runs that lack live services:
go test ./... -count=1 # 135 tests, 69.0% — mock-only, no services needed
go test -tags rag ./... -count=1 # 204 tests, 89.2% — requires Qdrant + Ollama
2026-02-20: Phase 4 GPU Benchmarks (Charon)
Hardware
- CPU: AMD Ryzen 9 9950X (32 threads @ 5.7GHz)
- GPU: AMD Radeon RX 7800 XT (ROCm, gfx1100)
- Ollama: Native with ROCm, nomic-embed-text (F16, 137M params)
- Qdrant: v1.16.3 (Docker, localhost)
Benchmark Results
| Operation | Latency | Throughput | Notes |
|---|---|---|---|
| Single embed | 10.3ms | 97/sec | nomic-embed-text via Ollama ROCm |
| Batch embed (10 texts) | 102ms | 98/sec effective | Sequential calls, no batch API |
| Embed 50 chars | ~10ms | — | Text length has negligible impact |
| Embed 2000 chars | ~10ms | — | Tokeniser dominates, not GPU |
| Qdrant search (100 pts) | 111µs | 9,042 QPS | Cosine similarity, top-5 |
| Qdrant search (200 pts) | 152µs | 6,580 QPS | Cosine similarity, top-5 |
| Chunk 50 sections | 11.2µs | 89K/sec | Pure CPU, no I/O |
| Chunk 1000 paragraphs | 107µs | 9.4K/sec | Scales linearly |
Key Findings
-
EmbedBatch is sequential —
EmbedBatchcallsEmbedin a loop. Ollama's/api/embedendpoint accepts a singleinputstring. There is no batch API at the HTTP level — each text requires a separate request. Batch throughput equals single throughput. -
Text length barely affects latency — 50-character and 2000-character texts both embed in ~10ms. The tokeniser and model forward pass dominate; HTTP overhead is negligible on localhost.
-
Qdrant search is sub-millisecond — Even with 200 points, search takes 152µs. The bottleneck in any RAG pipeline will be embedding, not search.
-
Pipeline bottleneck is embedding — A full ingest+query cycle for 5 documents takes ~1.5s, with ~95% of that time in embedding calls. Optimisation efforts should focus on reducing embedding round-trips.
-
Ollama ROCm GPU utilisation — The nomic-embed-text model (137M params, F16) fits easily in 16GB VRAM. GPU utilisation during embedding is brief (~2ms compute per call) — the remaining ~8ms is HTTP + serialisation overhead.
Files Created
| File | Purpose |
|---|---|
| benchmark_test.go | Go benchmarks + throughput tests (build tag: rag) |