go-ai/docs/rag.md
Virgil f9e2948176
All checks were successful
Security Scan / security (push) Successful in 10s
Test / test (push) Successful in 2m27s
feat(ai): make RAG facade fail soft
Co-Authored-By: Virgil <virgil@lethean.io>
2026-04-01 05:58:24 +00:00

4.8 KiB

title description
RAG Pipeline Retrieval-augmented generation via Qdrant vector search and Ollama embeddings.

RAG Pipeline

go-ai integrates with the RAG (Retrieval-Augmented Generation) pipeline provided by go-rag. This surfaces as three MCP tools for vector search and a high-level facade function for programmatic use.

Architecture

MCP Client                           Programmatic callers
    |                                       |
    v                                       v
rag_query / rag_ingest / rag_collections    ai.QueryRAGForTask()
    |                                       |
    +----------- go-rag --------------------+
                    |              |
                    v              v
                Qdrant          Ollama
              (vectors)       (embeddings)

MCP Tools

rag_query

Query the vector database for documents relevant to a natural-language question.

Parameter Type Required Description
question string Yes Natural-language query
collection string No Qdrant collection name (default: hostuk-docs)
limit int No Maximum results to return (default: 3)
threshold float64 No Minimum similarity score (default: 0.5)

The tool embeds the question via Ollama, searches Qdrant with the specified parameters, and returns formatted context with source references.

rag_ingest

Ingest a file into the vector database. The file is chunked (for Markdown, this respects heading boundaries), each chunk is embedded via Ollama, and the resulting vectors are stored in Qdrant.

Parameter Type Required Description
path string Yes Path to the file to ingest (relative to workspace root)
collection string No Target Qdrant collection

This tool is logged at Security level due to its write nature.

rag_collections

List all available collections in the connected Qdrant instance, with point counts and vector dimensions.

AI Facade: QueryRAGForTask

The ai package provides a higher-level wrapper for programmatic RAG queries. It is used by agentic task planners to enrich task context without importing go-rag directly.

type TaskInfo struct {
    Title       string
    Description string
}

func QueryRAGForTask(task TaskInfo) string {
    query := task.Title + " " + task.Description

    // Truncate to 500 runes to keep the embedding focused
    runes := []rune(query)
    if len(runes) > 500 {
        query = string(runes[:500])
    }

    qdrantCfg := rag.DefaultQdrantConfig()
    qdrantClient, err := rag.NewQdrantClient(qdrantCfg)
    if err != nil {
        return ""
    }
    defer qdrantClient.Close()

    ollamaCfg := rag.DefaultOllamaConfig()
    ollamaClient, err := rag.NewOllamaClient(ollamaCfg)
    if err != nil {
        return ""
    }

    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()

    results, err := rag.Query(ctx, qdrantClient, ollamaClient, query, rag.QueryConfig{
        Collection: "hostuk-docs",
        Limit:      3,
        Threshold:  0.5,
    })
    if err != nil {
        return ""
    }
    return rag.FormatResultsContext(results)
}

Key design decisions:

  • The query is capped at 500 runes to keep the embedding vector focused on the task's core intent
  • A 10-second timeout prevents hanging when services are slow
  • The function degrades to an empty string rather than propagating errors, allowing callers to continue without RAG context

External Service Dependencies

Qdrant

Vector database storing embedded document chunks.

  • Default address: localhost:6334 (gRPC)
  • Configuration: rag.DefaultQdrantConfig()

Ollama

Local LLM server providing embedding generation.

  • Default address: localhost:11434 (HTTP)
  • Configuration: rag.DefaultOllamaConfig()
  • Default embedding model: nomic-embed-text

Both services must be running for RAG tools to function. In CI, tests that touch RAG tools are guarded with skipIfShort(t).

Embedding Benchmark

The cmd/embed-bench/ utility compares embedding models for the OpenBrain knowledge store. It tests how well models separate semantically related vs unrelated agent memory pairs.

go run ./cmd/embed-bench
go run ./cmd/embed-bench -ollama http://localhost:11434

The benchmark evaluates:

  • Cluster separation -- intra-group vs inter-group similarity
  • Query recall accuracy -- top-1 and top-3 retrieval precision
  • Embedding throughput -- milliseconds per memory

Models tested: nomic-embed-text and embeddinggemma.

Testing

RAG tool tests cover handler validation (empty question/path fields, default behaviour) and graceful degradation when Qdrant or Ollama are unavailable. Full RAG round-trip tests require live services and are skipped in short mode.