Benchmarks for embedding (Ollama ROCm), chunking (pure CPU), and search (Qdrant) latency. Key results: 97 embeds/sec single (10.3ms), Qdrant search 152µs (6.5K QPS), chunking 11µs per 50-section doc. EmbedBatch confirmed sequential — Ollama has no batch API. Co-Authored-By: Charon <developers@lethean.io>
7.3 KiB
7.3 KiB
TODO.md — go-rag Task Queue
Dispatched from core/go orchestration. Pick up tasks in phase order.
Phase 0: Environment Setup
- Fix go.mod replace directive — Was
../core, corrected to../go. (Charon, 19 Feb 2026) - Run Qdrant locally — Docker on localhost:6333/6334, v1.16.3. (Charon, 19 Feb 2026)
- Install Ollama — Native with ROCm on snider-linux. Model: nomic-embed-text (F16). (Charon, 19 Feb 2026)
- Verify both services — Integration tests pass: 32 tests across qdrant/ollama/full pipeline. (Charon, 20 Feb 2026)
Phase 1: Unit Tests (18.4% -> 38.8% coverage)
All pure-function tests complete. Remaining untested functions require live services (Phase 2/3).
Testable Without External Services
- FormatResults tests — FormatResultsText, FormatResultsContext, FormatResultsJSON with known QueryResult inputs. Pure string formatting, no deps. (
acb987a) - DefaultConfig tests — Verify DefaultQdrantConfig, DefaultOllamaConfig, DefaultQueryConfig, DefaultChunkConfig, DefaultIngestConfig return expected values. (
acb987a) - EmbedDimension tests — OllamaClient.EmbedDimension() for each model name (nomic-embed-text=768, mxbai-embed-large=1024, all-minilm=384, unknown=768). (
acb987a) - Point/SearchResult types — Round-trip tests for Point struct and pointIDToString helper. (
acb987a) - valueToGo tests — Qdrant value conversion for string, int, double, bool, list, struct, nil. (
acb987a) - Additional chunk tests — Empty input, only headers no content, unicode/emoji, very long paragraph. (
acb987a)
Require External Services (use build tag //go:build rag)
- Qdrant client tests — Create collection, upsert, search, delete, list, info, filter, overwrite. Skip if Qdrant unavailable. 11 subtests in
qdrant_integration_test.go. (e90f281) - Ollama client tests — Embed single text, embed batch, verify model, consistency, dimension check, different texts, non-zero values, empty string. 9 subtests in
ollama_integration_test.go. (e90f281) - Full pipeline integration test — Ingest directory, query, format results, all helpers (QueryWith, QueryContextWith, IngestDirWith, IngestFileWith, QueryDocs, IngestDirectory), recreate flag, semantic similarity. 12 subtests in
integration_test.go. (e90f281)
Phase 2: Test Infrastructure (38.8% -> 69.0% coverage)
- Interface extraction — Extracted
Embedderinterface (embedder.go) andVectorStoreinterface (vectorstore.go). UpdatedIngest,IngestFile,Queryto accept interfaces. AddedQueryWith,QueryContextWith,IngestDirWith,IngestFileWithhelpers. (a49761b) - Mock embedder — Returns deterministic 0.1 vectors, tracks all calls, supports error injection and custom embed functions. (
a49761b) - Mock vector store — In-memory map, stores points, returns them on search with fake descending scores, supports filtering, tracks all calls. (
a49761b) - Re-test with mocks — 69 new mock-based tests across ingest (23), query (12), and helpers (16). Coverage from 38.8% to 69.0%. (
a49761b)
Phase 3: Enhancements
All tasks are pure Go, testable with existing mocks. No external services needed.
3.1 Chunk Boundary Improvements
- Sentence-aware splitting — When a paragraph exceeds
ChunkConfig.Size, split at sentence boundaries (.,?,!) instead of adding the whole paragraph as an oversized chunk. Keep current behaviour as fallback when no sentence boundaries exist. (cf26e88) - Overlap boundary alignment — Current overlap slices by rune count from the end of the previous chunk. Improve by aligning overlap to word boundaries (find the nearest space before the overlap point) to avoid splitting mid-word. (cf26e88)
- Tests — (a) Sentence splitting with 3 sentences > Size, (b) overlap word boundary alignment, (c) existing tests still pass (no regression). (cf26e88)
3.2 Collection Management Helpers
- Create
collections.go— Helper functions for collection lifecycle:ListCollections(ctx, store VectorStore) ([]string, error)— wraps store methodDeleteCollection(ctx, store VectorStore, name string) error— wraps store methodCollectionStats(ctx, store VectorStore, name string) (*CollectionInfo, error)— point count, vector size, status. NeedsCollectionInfostruct (not Qdrant-specific). (cf26e88)
- Add
ListCollectionsandDeleteCollectionto VectorStore interface — Currently these methods exist onQdrantClientbut NOT on theVectorStoreinterface. Add them and update mock. (cf26e88) - Tests — Mock-based tests for all helpers, error injection. (cf26e88)
3.3 Keyword Pre-Filter
- Create
keyword.go—KeywordFilter(results []QueryResult, keywords []string) []QueryResult— re-ranks results by boosting scores for results containing query keywords. Pure string matching (case-insensitivestrings.Contains).- Boost formula:
score *= 1.0 + 0.1 * matchCount(each keyword match adds 10% boost) - Re-sort by boosted score descending (cf26e88)
- Boost formula:
- Add
Keywords boolto QueryConfig — When true, extract keywords from query text and apply KeywordFilter after vector search. (cf26e88) - Tests — (a) No keywords (passthrough), (b) single keyword boost, (c) multiple keywords, (d) case insensitive, (e) no matches (scores unchanged). (cf26e88)
3.4 Benchmarks
- Create
benchmark_test.go— No build tag (mock-only):BenchmarkChunk— 10KB markdown document, default configBenchmarkChunkWithOverlap— Same document, overlap=100BenchmarkQuery_Mock— Query with mock embedder + mock storeBenchmarkIngest_Mock— Ingest 10 files with mock embedder + mock storeBenchmarkFormatResults— FormatResultsText/Context/JSON with 20 resultsBenchmarkKeywordFilter— 100 results, 5 keywords (cf26e88)
Phase 4: GPU Embeddings — COMPLETE
- ROCm Ollama — Tested on RX 7800 XT. 97 embeds/sec single, 10.3ms latency. See FINDINGS.md. (Charon, 20 Feb 2026)
- Batch optimisation — Investigated: Ollama has no batch API. EmbedBatch is inherently sequential (one HTTP call per text). No optimisation possible without upstream changes. (Charon, 20 Feb 2026)
- Benchmarks — Go benchmarks added: BenchmarkEmbedSingle, BenchmarkEmbedBatch, BenchmarkEmbedVaryingLength, BenchmarkChunkMarkdown, BenchmarkQdrantSearch, BenchmarkFullPipeline + throughput/latency tests. (Charon, 20 Feb 2026)
Known Issues
go.mod had wrong replace path— Fixed by Charon.Qdrant and Ollama not running on snider-linux— Resolved. Qdrant v1.16.3 (Docker) and Ollama with ROCm + nomic-embed-text now running on localhost.No mocks/interfaces— Resolved in Phase 2.EmbedderandVectorStoreinterfaces extracted; mock implementations inmock_test.go.log.Ereturns error —forge.lthn.ai/core/go/pkg/log.Ewraps errors with component context. This is the framework's logging pattern.
Platform
- OS: Ubuntu (linux/amd64) — snider-linux
- Co-located with: go-rocm, go-p2p
Workflow
- Charon dispatches tasks here after review
- Pick up tasks in phase order
- Mark
[x]when done, note commit hash - New discoveries → add notes, flag in FINDINGS.md