--- title: RAG Pipeline description: Retrieval-augmented generation via Qdrant vector search and Ollama embeddings. --- # RAG Pipeline go-ai integrates with the RAG (Retrieval-Augmented Generation) pipeline provided by `go-rag`. This surfaces as three MCP tools for vector search and a high-level facade function for programmatic use. ## Architecture ``` MCP Client Programmatic callers | | v v rag_query / rag_ingest / rag_collections ai.QueryRAGForTask() | | +----------- go-rag --------------------+ | | v v Qdrant Ollama (vectors) (embeddings) ``` ## MCP Tools ### `rag_query` Query the vector database for documents relevant to a natural-language question. | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `question` | `string` | Yes | Natural-language query | | `collection` | `string` | No | Qdrant collection name (default: `hostuk-docs`) | | `limit` | `int` | No | Maximum results to return (default: 3) | | `threshold` | `float64` | No | Minimum similarity score (default: 0.5) | The tool embeds the question via Ollama, searches Qdrant with the specified parameters, and returns formatted context with source references. ### `rag_ingest` Ingest a file into the vector database. The file is chunked (for Markdown, this respects heading boundaries), each chunk is embedded via Ollama, and the resulting vectors are stored in Qdrant. | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `path` | `string` | Yes | Path to the file to ingest (relative to workspace root) | | `collection` | `string` | No | Target Qdrant collection | This tool is logged at `Security` level due to its write nature. ### `rag_collections` List all available collections in the connected Qdrant instance, with point counts and vector dimensions. ## AI Facade: QueryRAGForTask The `ai` package provides a higher-level wrapper for programmatic RAG queries. It is used by agentic task planners to enrich task context without importing `go-rag` directly. ```go type TaskInfo struct { Title string Description string } func QueryRAGForTask(task TaskInfo) string { query := task.Title + " " + task.Description // Truncate to 500 runes to keep the embedding focused runes := []rune(query) if len(runes) > 500 { query = string(runes[:500]) } qdrantCfg := rag.DefaultQdrantConfig() qdrantClient, err := rag.NewQdrantClient(qdrantCfg) if err != nil { return "" } defer qdrantClient.Close() ollamaCfg := rag.DefaultOllamaConfig() ollamaClient, err := rag.NewOllamaClient(ollamaCfg) if err != nil { return "" } ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) defer cancel() results, err := rag.Query(ctx, qdrantClient, ollamaClient, query, rag.QueryConfig{ Collection: "hostuk-docs", Limit: 3, Threshold: 0.5, }) if err != nil { return "" } return rag.FormatResultsContext(results) } ``` Key design decisions: - The query is capped at **500 runes** to keep the embedding vector focused on the task's core intent - A **10-second timeout** prevents hanging when services are slow - The function degrades to an empty string rather than propagating errors, allowing callers to continue without RAG context ## External Service Dependencies ### Qdrant Vector database storing embedded document chunks. - Default address: `localhost:6334` (gRPC) - Configuration: `rag.DefaultQdrantConfig()` ### Ollama Local LLM server providing embedding generation. - Default address: `localhost:11434` (HTTP) - Configuration: `rag.DefaultOllamaConfig()` - Default embedding model: `nomic-embed-text` Both services must be running for RAG tools to function. In CI, tests that touch RAG tools are guarded with `skipIfShort(t)`. ## Embedding Benchmark The `cmd/embed-bench/` utility compares embedding models for the OpenBrain knowledge store. It tests how well models separate semantically related vs unrelated agent memory pairs. ```bash go run ./cmd/embed-bench go run ./cmd/embed-bench -ollama http://localhost:11434 ``` The benchmark evaluates: - **Cluster separation** -- intra-group vs inter-group similarity - **Query recall accuracy** -- top-1 and top-3 retrieval precision - **Embedding throughput** -- milliseconds per memory Models tested: `nomic-embed-text` and `embeddinggemma`. ## Testing RAG tool tests cover handler validation (empty question/path fields, default behaviour) and graceful degradation when Qdrant or Ollama are unavailable. Full RAG round-trip tests require live services and are skipped in short mode.