Replace internal task tracking (TODO.md, FINDINGS.md) with structured documentation in docs/. Trim CLAUDE.md to agent instructions only. Co-Authored-By: Virgil <virgil@lethean.io>
12 KiB
go-rag Architecture
Module: forge.lthn.ai/core/go-rag
Overview
go-rag is a Retrieval-Augmented Generation library for Go. It provides document chunking, embedding generation via Ollama, vector storage and search via Qdrant, and formatted context retrieval suitable for injection into LLM prompts. The library is designed around two core interfaces — Embedder and VectorStore — that decouple business logic from external service implementations.
Package Layout
| File | Purpose |
|---|---|
embedder.go |
Embedder interface definition |
vectorstore.go |
VectorStore interface + CollectionInfo struct |
chunk.go |
Markdown chunking — sections, paragraphs, sentences, overlap |
ollama.go |
OllamaClient — implements Embedder via Ollama HTTP API |
qdrant.go |
QdrantClient — implements VectorStore via Qdrant gRPC |
ingest.go |
Ingestion pipeline — read files, chunk, embed, batch upsert |
query.go |
Query pipeline — embed query, search, threshold filter, format results |
keyword.go |
Keyword boosting post-filter for re-ranking search results |
collections.go |
Package-level collection management helpers |
helpers.go |
Convenience wrappers — *With variants and default-client functions |
Core Interfaces
Embedder
type Embedder interface {
Embed(ctx context.Context, text string) ([]float32, error)
EmbedBatch(ctx context.Context, texts []string) ([][]float32, error)
EmbedDimension() uint64
}
OllamaClient satisfies this interface. The interface enables mock-based testing without a live Ollama instance.
VectorStore
type VectorStore interface {
CreateCollection(ctx context.Context, name string, vectorSize uint64) error
CollectionExists(ctx context.Context, name string) (bool, error)
DeleteCollection(ctx context.Context, name string) error
ListCollections(ctx context.Context) ([]string, error)
CollectionInfo(ctx context.Context, name string) (*CollectionInfo, error)
UpsertPoints(ctx context.Context, collection string, points []Point) error
Search(ctx context.Context, collection string, vector []float32, limit uint64, filter map[string]string) ([]SearchResult, error)
}
QdrantClient satisfies this interface. CollectionInfo is backend-agnostic (name, point count, vector size, status string).
Qdrant Client
QdrantClient wraps the official github.com/qdrant/go-client gRPC library.
Connection: gRPC on port 6334 (default). Supports TLS and API key authentication.
Collection creation: Uses cosine distance metric (qdrant.Distance_Cosine). Vector dimensionality is derived from the configured embedding model via Embedder.EmbedDimension().
Point IDs: Qdrant requires valid UUIDs. Point IDs are generated by ChunkID() using MD5 of "path:index:text_prefix", producing 32-character hex strings that Qdrant accepts as UUIDs.
Search: Uses Qdrant's QueryPoints API. Payload filters are expressed as Must conditions (logical AND). Results include the similarity score and full payload.
Payload conversion: Qdrant payloads are protobuf Value types. The valueToGo function converts these to native Go types (string, int64, float64, bool, []any, map[string]any).
Version mismatch: The client library (v1.16.2) logs a benign warning when connecting to Qdrant v1.16.3. All operations function correctly.
Ollama Embedding Client
OllamaClient wraps the github.com/ollama/ollama/api HTTP client.
Connection: HTTP on port 11434 (default), 30-second timeout.
Embedding: Calls /api/embed. The Ollama API returns float64 values; these are converted to float32 for Qdrant compatibility.
Batch embedding: EmbedBatch is sequential — it calls Embed in a loop. Ollama has no native batch API endpoint. Batch throughput equals single-embed throughput.
Supported models and dimensions:
| Model | Dimensions |
|---|---|
nomic-embed-text (default) |
768 |
mxbai-embed-large |
1024 |
all-minilm |
384 |
| (unknown) | 768 (fallback) |
Determinism: nomic-embed-text produces bit-identical float32 vectors for identical input text, which makes ingest operations idempotent.
Empty strings: Ollama accepts empty string input and returns a valid zero-padded vector without error. This behaviour is Ollama-specific.
Markdown Chunking
ChunkMarkdown(text string, cfg ChunkConfig) []Chunk is the primary chunking function.
ChunkConfig:
type ChunkConfig struct {
Size int // Target characters per chunk (default 500)
Overlap int // Overlap in runes between adjacent chunks (default 50)
}
Three-level splitting strategy:
-
Section split — Text is first split at
##header boundaries. The header line is preserved with its section content. -
Paragraph split — Sections larger than
Sizeare split at double-newline paragraph boundaries. Multiple consecutive newlines are normalised to double-newlines. -
Sentence split — Paragraphs that individually exceed
Sizeare split at sentence boundaries (.,?,!). Sentence splitting is applied before paragraph accumulation to avoid oversized chunks. When no sentence boundaries exist, the oversized paragraph is added as-is.
Overlap: When a chunk boundary is crossed, the new chunk begins with the trailing Overlap runes of the previous chunk. The overlap start point is aligned to the nearest word boundary (first space within the overlap slice) to avoid splitting mid-word. Overlap is rune-safe; UTF-8 multi-byte characters are handled correctly.
Chunk identity: Each Chunk struct carries Text, Section (the ## header title), and Index (zero-based global counter across all sections in the document).
ChunkID: Deterministic MD5 hash of "path:index:text_prefix" (first 100 runes of text). Used as the Qdrant point ID.
Category detection: Category(path string) string classifies files by path keywords into categories: ui-component, brand, product-brief, help-doc, task, architecture, documentation. Used as a payload field to enable category-scoped queries.
Accepted file types: .md, .markdown, .txt (checked by ShouldProcess).
Ingestion Pipeline
Ingest and IngestFile accept VectorStore and Embedder interfaces.
Directory ingestion (Ingest):
- Resolve and validate the source directory.
- Check whether the target collection exists. If
Recreateis set and the collection exists, delete it first. - Create the collection if it does not exist, using
embedder.EmbedDimension()for vector size. - Walk the directory recursively, collecting files matching
ShouldProcess. - For each file: read content, call
ChunkMarkdown, embed each chunk, buildPointstructs with payload fields (text,source,section,category,chunk_index). - Batch-upsert points to the vector store in slices of
BatchSize(default 100).
Point payload schema:
| Field | Type | Description |
|---|---|---|
text |
string | Raw chunk text |
source |
string | Relative file path from the ingestion directory root |
section |
string | Markdown section header (may be empty) |
category |
string | Category from Category() path detection |
chunk_index |
int | Chunk position within the document |
Query Pipeline
Query(ctx, store, embedder, query string, cfg QueryConfig) ([]QueryResult, error):
- Embed the query text using
embedder.Embed. - Construct a payload filter from
cfg.Categoryif set. - Call
store.Searchwith the query vector, limit, and filter. - Discard results below
cfg.Threshold(default 0.5). - Deserialise payload fields into
QueryResultstructs.chunk_indexhandlesint64,float64, andinttypes to accommodate JSON unmarshalling differences. - Optionally apply keyword boosting (
cfg.Keywords == true).
QueryConfig:
type QueryConfig struct {
Collection string
Limit uint64 // Default 5
Threshold float32 // Default 0.5
Category string // Payload filter; empty means no filter
Keywords bool // Enable keyword boosting post-filter
}
Keyword Boosting
KeywordFilter(results []QueryResult, keywords []string) []QueryResult re-ranks results after vector search.
Algorithm: For each result, count how many keywords appear (case-insensitive substring match) in the chunk text. Apply a 10% score boost per matching keyword: score *= 1.0 + 0.1 * matchCount. Re-sort by boosted score descending.
Keyword extraction: extractKeywords splits the query on whitespace and discards words shorter than 3 characters.
When enabled: Query calls extractKeywords on the query string and passes the result to KeywordFilter after the threshold filter has been applied.
Result Formatting
Three output formats are available:
| Function | Format | Use case |
|---|---|---|
FormatResultsText |
Plain text with score/source headers | Human-readable display |
FormatResultsContext |
XML <retrieved_context> with <document> elements |
LLM prompt injection |
FormatResultsJSON |
Hand-crafted JSON array | Structured consumption |
FormatResultsContext applies html.EscapeString to all attribute values and text content to produce well-formed XML safe for embedding in prompts.
Collection Management
Package-level helpers in collections.go delegate to VectorStore:
ListCollections(ctx, store VectorStore) ([]string, error)
DeleteCollection(ctx, store VectorStore, name string) error
CollectionStats(ctx, store VectorStore, name string) (*CollectionInfo, error)
Convenience Helpers
Two tiers of helpers are provided in helpers.go:
Interface-accepting (*With variants) — accept pre-constructed VectorStore and Embedder. Suitable for testing, long-lived processes, and high-throughput use:
QueryWith(ctx, store, embedder, question, collectionName string, topK int) ([]QueryResult, error)
QueryContextWith(ctx, store, embedder, question, collectionName string, topK int) (string, error)
IngestDirWith(ctx, store, embedder, directory, collectionName string, recreate bool) error
IngestFileWith(ctx, store, embedder, filePath, collectionName string) (int, error)
Default-client wrappers — construct new QdrantClient and OllamaClient on each call using DefaultQdrantConfig and DefaultOllamaConfig. Each call opens a new gRPC connection. Suitable for CLI commands and infrequent operations:
QueryDocs(ctx, question, collectionName string, topK int) ([]QueryResult, error)
QueryDocsContext(ctx, question, collectionName string, topK int) (string, error)
IngestDirectory(ctx, directory, collectionName string, recreate bool) error
IngestSingleFile(ctx, filePath, collectionName string) (int, error)
IngestDirectory and IngestSingleFile additionally run HealthCheck on Qdrant and VerifyModel on Ollama before proceeding.
Performance Characteristics
Measured on AMD Ryzen 9 9950X + RX 7800 XT with ROCm, nomic-embed-text (F16):
| Operation | Latency | Throughput |
|---|---|---|
| Single embed | 10.3ms | 97/sec |
| Batch embed (10 texts) | 102ms | 98/sec effective |
| Qdrant search (100 points) | 111µs | 9,042 QPS |
| Qdrant search (200 points) | 152µs | 6,580 QPS |
| Chunk 50 sections | 11.2µs | 89K/sec |
| Chunk 1000 paragraphs | 107µs | 9.4K/sec |
The embedding step dominates pipeline latency. In a full ingest+query cycle for 5 documents, approximately 95% of elapsed time is in embedding calls. Text length (50 to 2000 characters) has negligible effect on embedding latency because tokenisation and HTTP overhead dominate the GPU compute time (~2ms).
Dependency Graph
go-rag
├── forge.lthn.ai/core/go (logging — pkg/log)
├── github.com/ollama/ollama (embedding HTTP client)
├── github.com/qdrant/go-client (vector DB gRPC client)
└── github.com/stretchr/testify (test assertions)
Transitive: google.golang.org/grpc, google.golang.org/protobuf, github.com/google/uuid.