From a6e4f865e4170de2e18bcfb0e346303c25675345 Mon Sep 17 00:00:00 2001 From: Snider Date: Tue, 3 Mar 2026 09:22:56 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20OpenBrain=20design=20=E2=80=94=20shared?= =?UTF-8?q?=20agent=20knowledge=20graph?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Shared vector-indexed knowledge store accessible by all agents via MCP. MariaDB for relational metadata, Qdrant for semantic search, Ollama for embeddings. Four MCP tools: brain_remember, brain_recall, brain_forget, brain_list. Replaces scattered MEMORY.md files with singular state. Co-Authored-By: Virgil --- docs/plans/2026-03-03-openbrain-design.md | 213 ++++++++++++++++++++++ 1 file changed, 213 insertions(+) create mode 100644 docs/plans/2026-03-03-openbrain-design.md diff --git a/docs/plans/2026-03-03-openbrain-design.md b/docs/plans/2026-03-03-openbrain-design.md new file mode 100644 index 0000000..8b0486d --- /dev/null +++ b/docs/plans/2026-03-03-openbrain-design.md @@ -0,0 +1,213 @@ +# OpenBrain Design + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Shared vector-indexed knowledge store that all agents (Virgil, Charon, Darbs, LEM) read/write through MCP, building singular state across sessions. + +**Architecture:** MariaDB for relational metadata + Qdrant for vector embeddings. Four MCP tools in php-agentic. Go bridge in go-ai for CLI agents. Ollama for embedding generation. + +**Repos:** `forge.lthn.ai/core/php-agentic` (primary), `forge.lthn.ai/core/go-ai` (bridge) + +--- + +## Problem + +Agent knowledge is scattered: +- Virgil's `MEMORY.md` files in `~/.claude/projects/*/memory/` — file-based, single-agent, no semantic search +- Plans in `docs/plans/` across repos — forgotten after completion +- Session handoff notes in `agent_sessions.handoff_notes` — JSON blobs, not searchable +- Research findings lost when context windows compress + +When Charon discovers a scoring calibration bug, Virgil only knows about it if explicitly told. There's no shared knowledge graph. + +## Concept + +**OpenBrain** — "Open" means open protocol (MCP), not open source. All agents on the platform access the same knowledge graph via `brain_*` MCP tools. Data is stored *for agents* — structured for near-native context transfer between sessions and models. + +## Data Model + +### `brain_memories` table (MariaDB) + +| Column | Type | Purpose | +|--------|------|---------| +| `id` | UUID | Primary key, also Qdrant point ID | +| `workspace_id` | FK | Multi-tenant isolation | +| `agent_id` | string | Who wrote it (virgil, charon, darbs, lem) | +| `type` | enum | `decision`, `observation`, `convention`, `research`, `plan`, `bug`, `architecture` | +| `content` | text | The knowledge (markdown) | +| `tags` | JSON | Topic tags for filtering | +| `project` | string nullable | Repo/project scope (null = cross-project) | +| `confidence` | float | 0.0–1.0, how certain the agent is | +| `supersedes_id` | UUID nullable | FK to older memory this replaces | +| `expires_at` | timestamp nullable | TTL for session-scoped context | +| `deleted_at` | timestamp nullable | Soft delete | +| `created_at` | timestamp | | +| `updated_at` | timestamp | | + +### `openbrain` Qdrant collection + +- **Vector dimension:** 768 (nomic-embed-text via Ollama) +- **Distance metric:** Cosine +- **Point ID:** MariaDB UUID +- **Payload:** `workspace_id`, `agent_id`, `type`, `tags`, `project`, `confidence`, `created_at` (for filtered search) + +## MCP Tools + +### `brain_remember` — Store a memory + +```json +{ + "content": "LEM emotional_register was blind to negative emotions. Fixed by adding 8 weighted pattern groups.", + "type": "bug", + "tags": ["scoring", "emotional-register", "lem"], + "project": "eaas", + "confidence": 0.95, + "supersedes": "uuid-of-outdated-memory" +} +``` + +Agent ID injected from MCP session context. Returns the new memory UUID. + +**Pipeline:** +1. Validate input +2. Embed content via Ollama (`POST /api/embeddings`, model: `nomic-embed-text`) +3. Insert into MariaDB +4. Upsert into Qdrant with payload metadata +5. If `supersedes` set, soft-delete the old memory and remove from Qdrant + +### `brain_recall` — Semantic search + +```json +{ + "query": "How does verdict classification work?", + "top_k": 5, + "filter": { + "project": "eaas", + "type": ["decision", "architecture"], + "min_confidence": 0.5 + } +} +``` + +**Pipeline:** +1. Embed query via Ollama +2. Search Qdrant with vector + payload filters +3. Get top-K point IDs with similarity scores +4. Hydrate from MariaDB (content, tags, supersedes chain) +5. Return ranked results with scores + +Only returns latest version of superseded memories (includes `supersedes_count` so agent knows history exists). + +### `brain_forget` — Soft-delete or supersede + +```json +{ + "id": "uuid", + "reason": "Superseded by new calibration approach" +} +``` + +Sets `deleted_at` in MariaDB, removes point from Qdrant. Keeps audit trail. + +### `brain_list` — Browse (no vectors) + +```json +{ + "project": "eaas", + "type": "decision", + "agent_id": "charon", + "limit": 20 +} +``` + +Pure MariaDB query. For browsing, auditing, bulk export. No embedding needed. + +## Architecture + +### PHP side (`php-agentic`) + +``` +Mcp/Tools/Agent/Brain/ +├── BrainRemember.php +├── BrainRecall.php +├── BrainForget.php +└── BrainList.php + +Services/ +└── BrainService.php # Ollama embeddings + Qdrant client + MariaDB CRUD + +Models/ +└── BrainMemory.php # Eloquent model + +Migrations/ +└── XXXX_create_brain_memories_table.php +``` + +`BrainService` handles: +- Ollama HTTP calls for embeddings +- Qdrant REST API (upsert, search, delete points) +- MariaDB CRUD via Eloquent +- Supersession chain management + +### Go side (`go-ai`) + +Thin bridge tools in the MCP server that proxy `brain_*` calls to Laravel via the existing WebSocket bridge. Same pattern as `ide_chat_send` / `ide_session_create`. + +### Data flow + +``` +Agent (any Claude) + ↓ MCP tool call +Go MCP server (local, macOS/Linux) + ↓ WebSocket bridge +Laravel php-agentic (lthn.ai, de1) + ↓ ↓ +MariaDB Qdrant +(relational) (vectors) + ↑ +Ollama (embeddings) +``` + +PHP-native agents skip the Go bridge — call `BrainService` directly. + +### Infrastructure + +- **Qdrant:** New container on de1. Shared between OpenBrain and EaaS scoring (different collections). +- **Ollama:** Existing instance. `nomic-embed-text` model for 768d embeddings. CPU is fine for the volume (~10K memories). +- **MariaDB:** Existing instance on de1. New table in the agentic database. + +## Integration + +### Plans → Brain + +On plan completion, agents can extract key decisions/findings and `brain_remember` them. Optional — agents decide what's worth persisting. The plan itself stays in `agent_plans`; lessons learned go to the brain. + +### Sessions → Brain + +Handoff notes (summary, next_steps, blockers) can auto-persist as memories with `type: observation` and optional TTL. Agents can also manually remember during a session. + +### MEMORY.md migration + +Seed data: collect all `MEMORY.md` files from `~/.claude/projects/*/memory/` across worktrees. Parse into individual memories, embed, and load into OpenBrain. After migration, `brain_recall` replaces file-based memory. + +### EaaS + +Same Qdrant instance, different collection (`eaas_scoring` vs `openbrain`). Shared infrastructure, separate concerns. + +### LEM + +LEM models query the brain for project context during training data curation or benchmark analysis. Same MCP tools, different agent ID. + +## What this replaces + +- Virgil's `MEMORY.md` files (file-based, single-agent, no search) +- Scattered `docs/plans/` findings that get forgotten +- Manual "Charon found X" cross-agent handoffs +- Session-scoped knowledge that dies with context compression + +## What this enables + +- Any Claude picks up where another left off — semantically +- Decisions surface when related code is touched +- Knowledge graph grows with every session across all agents +- Near-native context transfer between models and sessions