go-ai/docs/rag.md

---
title: RAG Pipeline
description: Retrieval-augmented generation via Qdrant vector search and Ollama embeddings.
---

# RAG Pipeline

go-ai integrates with the RAG (Retrieval-Augmented Generation) pipeline provided by `go-rag`. This surfaces as three MCP tools for vector search and a high-level facade function for programmatic use.

## Architecture

```
MCP Client                           Programmatic callers
    |                                       |
    v                                       v
rag_query / rag_ingest / rag_collections    ai.QueryRAGForTask()
    |                                       |
    +----------- go-rag --------------------+
                    |              |
                    v              v
                Qdrant          Ollama
              (vectors)       (embeddings)
```

## MCP Tools

### `rag_query`

Query the vector database for documents relevant to a natural-language question.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `question` | `string` | Yes | Natural-language query |
| `collection` | `string` | No | Qdrant collection name (default: `hostuk-docs`) |
| `limit` | `int` | No | Maximum results to return (default: 3) |
| `threshold` | `float64` | No | Minimum similarity score (default: 0.5) |

The tool embeds the question via Ollama, searches Qdrant with the specified parameters, and returns formatted context with source references.

### `rag_ingest`

Ingest a file into the vector database. The file is chunked (for Markdown, this respects heading boundaries), each chunk is embedded via Ollama, and the resulting vectors are stored in Qdrant.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `path` | `string` | Yes | Path to the file to ingest (relative to workspace root) |
| `collection` | `string` | No | Target Qdrant collection |

This tool is logged at `Security` level due to its write nature.

### `rag_collections`

List all available collections in the connected Qdrant instance, with point counts and vector dimensions.

## AI Facade: QueryRAGForTask

The `ai` package provides a higher-level wrapper for programmatic RAG queries. It is used by agentic task planners to enrich task context without importing `go-rag` directly.

```go
type TaskInfo struct {
    Title       string
    Description string
}

func QueryRAGForTask(task TaskInfo) string {
    query := task.Title + " " + task.Description

    // Truncate to 500 runes to keep the embedding focused
    runes := []rune(query)
    if len(runes) > 500 {
        query = string(runes[:500])
    }

    qdrantCfg := rag.DefaultQdrantConfig()
    qdrantClient, err := rag.NewQdrantClient(qdrantCfg)
    if err != nil {
        return ""
    }
    defer qdrantClient.Close()

    ollamaCfg := rag.DefaultOllamaConfig()
    ollamaClient, err := rag.NewOllamaClient(ollamaCfg)
    if err != nil {
        return ""
    }

    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    results, err := rag.Query(ctx, qdrantClient, ollamaClient, query, rag.QueryConfig{
        Collection: "hostuk-docs",
        Limit:      3,
        Threshold:  0.5,
    })
    if err != nil {
        return ""
    }
    return rag.FormatResultsContext(results)
}
```

Key design decisions:
- The query is capped at **500 runes** to keep the embedding vector focused on the task's core intent
- A **10-second timeout** prevents hanging when services are slow
- The function degrades to an empty string rather than propagating errors, allowing callers to continue without RAG context

## External Service Dependencies

### Qdrant

Vector database storing embedded document chunks.

- Default address: `localhost:6334` (gRPC)
- Configuration: `rag.DefaultQdrantConfig()`

### Ollama

Local LLM server providing embedding generation.

- Default address: `localhost:11434` (HTTP)
- Configuration: `rag.DefaultOllamaConfig()`
- Default embedding model: `nomic-embed-text`

Both services must be running for RAG tools to function. In CI, tests that touch RAG tools are guarded with `skipIfShort(t)`.

## Embedding Benchmark

The `cmd/embed-bench/` utility compares embedding models for the OpenBrain knowledge store. It tests how well models separate semantically related vs unrelated agent memory pairs.

```bash
go run ./cmd/embed-bench
go run ./cmd/embed-bench -ollama http://localhost:11434
```

The benchmark evaluates:
- **Cluster separation** -- intra-group vs inter-group similarity
- **Query recall accuracy** -- top-1 and top-3 retrieval precision
- **Embedding throughput** -- milliseconds per memory

Models tested: `nomic-embed-text` and `embeddinggemma`.

## Testing

RAG tool tests cover handler validation (empty question/path fields, default behaviour) and graceful degradation when Qdrant or Ollama are unavailable. Full RAG round-trip tests require live services and are skipped in short mode.