go/tools/rag
Snider f4ba17b9f5 fix: address PR review comments from CodeRabbit, Copilot, and Gemini
Fixes across 25 files addressing 46+ review comments:

- pkg/ai/metrics.go: handle error from Close() on writable file handle
- pkg/ansible: restore loop vars after loop, restore become settings,
  fix Upload with become=true and no password (use sudo -n), honour
  SSH timeout config, use E() helper for contextual errors, quote git
  refs in checkout commands
- pkg/rag: validate chunk config, guard negative-to-uint64 conversion,
  use E() helper for errors, add context timeout to Ollama HTTP calls
- pkg/deploy/python: fix exec.ExitError type assertion (was os.PathError),
  handle os.UserHomeDir() error
- pkg/build/buildcmd: use cmd.Context() instead of context.Background()
  for proper Ctrl+C cancellation
- install.bat: add curl timeouts, CRLF line endings, use --connect-timeout
  for archive downloads
- install.sh: use absolute path for version check in CI mode
- tools/rag: fix broken ingest.py function def, escape HTML in query.py,
  pin qdrant-client version, add markdown code block languages
- internal/cmd/rag: add chunk size validation, env override handling

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 22:33:43 +00:00
..
ingest.py fix: address PR review comments from CodeRabbit, Copilot, and Gemini 2026-02-03 22:33:43 +00:00
query.py fix: address PR review comments from CodeRabbit, Copilot, and Gemini 2026-02-03 22:33:43 +00:00
README.md fix: address PR review comments from CodeRabbit, Copilot, and Gemini 2026-02-03 22:33:43 +00:00
requirements.txt fix: address PR review comments from CodeRabbit, Copilot, and Gemini 2026-02-03 22:33:43 +00:00

RAG Pipeline for Host UK Documentation

Store documentation in a vector database so Claude (and local LLMs) can retrieve relevant context without being reminded every conversation.

The Problem This Solves

"The amount of times I've had to re-tell you how to make a Flux button is crazy"

Instead of wasting context window on "remember, Flux buttons work like this...", the RAG system:

  1. Stores all documentation in Qdrant
  2. Claude queries before answering
  3. Relevant docs injected automatically
  4. No more re-teaching

Prerequisites

Already running on your lab:

  • Qdrant: linux.snider.dev:6333
  • Ollama: linux.snider.dev:11434 (or local)

Install Python deps:

pip install -r requirements.txt

Ensure embedding model is available:

ollama pull nomic-embed-text

Quick Start

1. Ingest Documentation

# Ingest recovered Host UK docs
python ingest.py /Users/snider/Code/host-uk/core/tasks/recovered-hostuk \
    --collection hostuk-docs \
    --recreate

# Ingest Flux UI docs separately (higher priority)
python ingest.py /path/to/flux-ui-docs \
    --collection flux-ui-docs \
    --recreate

2. Query the Database

# Search for Flux button docs
python query.py "how to create a Flux button component"

# Filter by category
python query.py "path sandboxing" --category architecture

# Get more results
python query.py "Vi personality" --top 10

# Output as JSON
python query.py "brand voice" --format json

# Output for LLM context injection
python query.py "Flux modal component" --format context

3. List Collections

python query.py --list-collections
python query.py --stats --collection flux-ui-docs

Collections Strategy

Collection Content Priority
flux-ui-docs Flux Pro component docs High (UI questions)
hostuk-docs Recovered implementation docs Medium
brand-docs Vi, brand voice, visual identity For content generation
lethean-docs SASE/dVPN technical docs Product-specific

Integration with Claude Code

Option 1: MCP Server (Best)

Create an MCP server that Claude can query:

// In core CLI
func (s *RagServer) Query(query string) ([]Document, error) {
    // Query Qdrant
    // Return relevant docs
}

Then Claude can call rag.query("Flux button") and get docs automatically.

Option 2: CLAUDE.md Instruction

Add to project CLAUDE.md:

## Before Answering UI Questions

When asked about Flux UI components, query the RAG database first:
```bash
python /path/to/query.py "your question" --collection flux-ui-docs --format context

Include the retrieved context in your response.


### Option 3: Claude Code Hook

Create a hook that auto-injects context for certain queries.

## Category Taxonomy

The ingestion automatically categorizes files:

| Category | Matches |
|----------|---------|
| `ui-component` | flux, ui/component |
| `brand` | brand, mascot |
| `product-brief` | brief |
| `help-doc` | help, draft |
| `task` | task, plan |
| `architecture` | architecture, migration |
| `documentation` | default |

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `QDRANT_HOST` | linux.snider.dev | Qdrant server |
| `QDRANT_PORT` | 6333 | Qdrant port |
| `EMBEDDING_MODEL` | nomic-embed-text | Ollama model |
| `CHUNK_SIZE` | 500 | Characters per chunk |
| `CHUNK_OVERLAP` | 50 | Overlap between chunks |

## Training a Model vs RAG

**RAG** (what this does):
- Model weights unchanged
- Documents retrieved at query time
- Knowledge updates instantly (re-ingest)
- Good for: facts, API docs, current information

**Fine-tuning** (separate process):
- Model weights updated
- Knowledge baked into model
- Requires retraining to update
- Good for: style, patterns, conventions

**For Flux UI**: RAG is perfect. The docs change, API changes, you want current info.

**For Vi's voice**: Fine-tuning is better. Style doesn't change often, should be "baked in".

## Vector Math (For Understanding)

```text
"How do I make a Flux button?"
    ↓ Embedding
[0.12, -0.45, 0.78, ...768 floats...]
    ↓ Cosine similarity search
Find chunks with similar vectors
    ↓ Results
1. doc/ui/flux/components/button.md (score: 0.89)
2. doc/ui/flux/forms.md (score: 0.76)
3. doc/ui/flux/components/input.md (score: 0.71)

The embedding model converts text to "meaning vectors". Similar meanings = similar vectors = found by search.

Troubleshooting

"No results found"

  • Lower threshold: --threshold 0.3
  • Check collection has data: --stats
  • Verify Ollama is running: ollama list

"Connection refused"

  • Check Qdrant is running: curl http://linux.snider.dev:6333/collections
  • Check firewall/network

"Embedding model not available"

ollama pull nomic-embed-text

Part of the Host UK Core CLI tooling