144 lines
4.8 KiB
Markdown
144 lines
4.8 KiB
Markdown
---
|
|
title: RAG Pipeline
|
|
description: Retrieval-augmented generation via Qdrant vector search and Ollama embeddings.
|
|
---
|
|
|
|
# RAG Pipeline
|
|
|
|
go-ai integrates with the RAG (Retrieval-Augmented Generation) pipeline provided by `go-rag`. This surfaces as three MCP tools for vector search and a high-level facade function for programmatic use.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
MCP Client Programmatic callers
|
|
| |
|
|
v v
|
|
rag_query / rag_ingest / rag_collections ai.QueryRAGForTask()
|
|
| |
|
|
+----------- go-rag --------------------+
|
|
| |
|
|
v v
|
|
Qdrant Ollama
|
|
(vectors) (embeddings)
|
|
```
|
|
|
|
## MCP Tools
|
|
|
|
### `rag_query`
|
|
|
|
Query the vector database for documents relevant to a natural-language question.
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|-----------|------|----------|-------------|
|
|
| `question` | `string` | Yes | Natural-language query |
|
|
| `collection` | `string` | No | Qdrant collection name (default: `hostuk-docs`) |
|
|
| `limit` | `int` | No | Maximum results to return (default: 3) |
|
|
| `threshold` | `float64` | No | Minimum similarity score (default: 0.5) |
|
|
|
|
The tool embeds the question via Ollama, searches Qdrant with the specified parameters, and returns formatted context with source references.
|
|
|
|
### `rag_ingest`
|
|
|
|
Ingest a file into the vector database. The file is chunked (for Markdown, this respects heading boundaries), each chunk is embedded via Ollama, and the resulting vectors are stored in Qdrant.
|
|
|
|
| Parameter | Type | Required | Description |
|
|
|-----------|------|----------|-------------|
|
|
| `path` | `string` | Yes | Path to the file to ingest (relative to workspace root) |
|
|
| `collection` | `string` | No | Target Qdrant collection |
|
|
|
|
This tool is logged at `Security` level due to its write nature.
|
|
|
|
### `rag_collections`
|
|
|
|
List all available collections in the connected Qdrant instance, with point counts and vector dimensions.
|
|
|
|
## AI Facade: QueryRAGForTask
|
|
|
|
The `ai` package provides a higher-level wrapper for programmatic RAG queries. It is used by agentic task planners to enrich task context without importing `go-rag` directly.
|
|
|
|
```go
|
|
type TaskInfo struct {
|
|
Title string
|
|
Description string
|
|
}
|
|
|
|
func QueryRAGForTask(task TaskInfo) string {
|
|
query := task.Title + " " + task.Description
|
|
|
|
// Truncate to 500 runes to keep the embedding focused
|
|
runes := []rune(query)
|
|
if len(runes) > 500 {
|
|
query = string(runes[:500])
|
|
}
|
|
|
|
qdrantCfg := rag.DefaultQdrantConfig()
|
|
qdrantClient, err := rag.NewQdrantClient(qdrantCfg)
|
|
if err != nil {
|
|
return ""
|
|
}
|
|
defer qdrantClient.Close()
|
|
|
|
ollamaCfg := rag.DefaultOllamaConfig()
|
|
ollamaClient, err := rag.NewOllamaClient(ollamaCfg)
|
|
if err != nil {
|
|
return ""
|
|
}
|
|
|
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
|
defer cancel()
|
|
|
|
results, err := rag.Query(ctx, qdrantClient, ollamaClient, query, rag.QueryConfig{
|
|
Collection: "hostuk-docs",
|
|
Limit: 3,
|
|
Threshold: 0.5,
|
|
})
|
|
if err != nil {
|
|
return ""
|
|
}
|
|
return rag.FormatResultsContext(results)
|
|
}
|
|
```
|
|
|
|
Key design decisions:
|
|
- The query is capped at **500 runes** to keep the embedding vector focused on the task's core intent
|
|
- A **10-second timeout** prevents hanging when services are slow
|
|
- The function degrades to an empty string rather than propagating errors, allowing callers to continue without RAG context
|
|
|
|
## External Service Dependencies
|
|
|
|
### Qdrant
|
|
|
|
Vector database storing embedded document chunks.
|
|
|
|
- Default address: `localhost:6334` (gRPC)
|
|
- Configuration: `rag.DefaultQdrantConfig()`
|
|
|
|
### Ollama
|
|
|
|
Local LLM server providing embedding generation.
|
|
|
|
- Default address: `localhost:11434` (HTTP)
|
|
- Configuration: `rag.DefaultOllamaConfig()`
|
|
- Default embedding model: `nomic-embed-text`
|
|
|
|
Both services must be running for RAG tools to function. In CI, tests that touch RAG tools are guarded with `skipIfShort(t)`.
|
|
|
|
## Embedding Benchmark
|
|
|
|
The `cmd/embed-bench/` utility compares embedding models for the OpenBrain knowledge store. It tests how well models separate semantically related vs unrelated agent memory pairs.
|
|
|
|
```bash
|
|
go run ./cmd/embed-bench
|
|
go run ./cmd/embed-bench -ollama http://localhost:11434
|
|
```
|
|
|
|
The benchmark evaluates:
|
|
- **Cluster separation** -- intra-group vs inter-group similarity
|
|
- **Query recall accuracy** -- top-1 and top-3 retrieval precision
|
|
- **Embedding throughput** -- milliseconds per memory
|
|
|
|
Models tested: `nomic-embed-text` and `embeddinggemma`.
|
|
|
|
## Testing
|
|
|
|
RAG tool tests cover handler validation (empty question/path fields, default behaviour) and graceful degradation when Qdrant or Ollama are unavailable. Full RAG round-trip tests require live services and are skipped in short mode.
|