docs: OpenBrain design — shared agent knowledge graph
Shared vector-indexed knowledge store accessible by all agents via MCP. MariaDB for relational metadata, Qdrant for semantic search, Ollama for embeddings. Four MCP tools: brain_remember, brain_recall, brain_forget, brain_list. Replaces scattered MEMORY.md files with singular state. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
1ead364afe
commit
a6e4f865e4
1 changed files with 213 additions and 0 deletions
213
docs/plans/2026-03-03-openbrain-design.md
Normal file
213
docs/plans/2026-03-03-openbrain-design.md
Normal file
|
|
@ -0,0 +1,213 @@
|
|||
# OpenBrain Design
|
||||
|
||||
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
||||
|
||||
**Goal:** Shared vector-indexed knowledge store that all agents (Virgil, Charon, Darbs, LEM) read/write through MCP, building singular state across sessions.
|
||||
|
||||
**Architecture:** MariaDB for relational metadata + Qdrant for vector embeddings. Four MCP tools in php-agentic. Go bridge in go-ai for CLI agents. Ollama for embedding generation.
|
||||
|
||||
**Repos:** `forge.lthn.ai/core/php-agentic` (primary), `forge.lthn.ai/core/go-ai` (bridge)
|
||||
|
||||
---
|
||||
|
||||
## Problem
|
||||
|
||||
Agent knowledge is scattered:
|
||||
- Virgil's `MEMORY.md` files in `~/.claude/projects/*/memory/` — file-based, single-agent, no semantic search
|
||||
- Plans in `docs/plans/` across repos — forgotten after completion
|
||||
- Session handoff notes in `agent_sessions.handoff_notes` — JSON blobs, not searchable
|
||||
- Research findings lost when context windows compress
|
||||
|
||||
When Charon discovers a scoring calibration bug, Virgil only knows about it if explicitly told. There's no shared knowledge graph.
|
||||
|
||||
## Concept
|
||||
|
||||
**OpenBrain** — "Open" means open protocol (MCP), not open source. All agents on the platform access the same knowledge graph via `brain_*` MCP tools. Data is stored *for agents* — structured for near-native context transfer between sessions and models.
|
||||
|
||||
## Data Model
|
||||
|
||||
### `brain_memories` table (MariaDB)
|
||||
|
||||
| Column | Type | Purpose |
|
||||
|--------|------|---------|
|
||||
| `id` | UUID | Primary key, also Qdrant point ID |
|
||||
| `workspace_id` | FK | Multi-tenant isolation |
|
||||
| `agent_id` | string | Who wrote it (virgil, charon, darbs, lem) |
|
||||
| `type` | enum | `decision`, `observation`, `convention`, `research`, `plan`, `bug`, `architecture` |
|
||||
| `content` | text | The knowledge (markdown) |
|
||||
| `tags` | JSON | Topic tags for filtering |
|
||||
| `project` | string nullable | Repo/project scope (null = cross-project) |
|
||||
| `confidence` | float | 0.0–1.0, how certain the agent is |
|
||||
| `supersedes_id` | UUID nullable | FK to older memory this replaces |
|
||||
| `expires_at` | timestamp nullable | TTL for session-scoped context |
|
||||
| `deleted_at` | timestamp nullable | Soft delete |
|
||||
| `created_at` | timestamp | |
|
||||
| `updated_at` | timestamp | |
|
||||
|
||||
### `openbrain` Qdrant collection
|
||||
|
||||
- **Vector dimension:** 768 (nomic-embed-text via Ollama)
|
||||
- **Distance metric:** Cosine
|
||||
- **Point ID:** MariaDB UUID
|
||||
- **Payload:** `workspace_id`, `agent_id`, `type`, `tags`, `project`, `confidence`, `created_at` (for filtered search)
|
||||
|
||||
## MCP Tools
|
||||
|
||||
### `brain_remember` — Store a memory
|
||||
|
||||
```json
|
||||
{
|
||||
"content": "LEM emotional_register was blind to negative emotions. Fixed by adding 8 weighted pattern groups.",
|
||||
"type": "bug",
|
||||
"tags": ["scoring", "emotional-register", "lem"],
|
||||
"project": "eaas",
|
||||
"confidence": 0.95,
|
||||
"supersedes": "uuid-of-outdated-memory"
|
||||
}
|
||||
```
|
||||
|
||||
Agent ID injected from MCP session context. Returns the new memory UUID.
|
||||
|
||||
**Pipeline:**
|
||||
1. Validate input
|
||||
2. Embed content via Ollama (`POST /api/embeddings`, model: `nomic-embed-text`)
|
||||
3. Insert into MariaDB
|
||||
4. Upsert into Qdrant with payload metadata
|
||||
5. If `supersedes` set, soft-delete the old memory and remove from Qdrant
|
||||
|
||||
### `brain_recall` — Semantic search
|
||||
|
||||
```json
|
||||
{
|
||||
"query": "How does verdict classification work?",
|
||||
"top_k": 5,
|
||||
"filter": {
|
||||
"project": "eaas",
|
||||
"type": ["decision", "architecture"],
|
||||
"min_confidence": 0.5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Pipeline:**
|
||||
1. Embed query via Ollama
|
||||
2. Search Qdrant with vector + payload filters
|
||||
3. Get top-K point IDs with similarity scores
|
||||
4. Hydrate from MariaDB (content, tags, supersedes chain)
|
||||
5. Return ranked results with scores
|
||||
|
||||
Only returns latest version of superseded memories (includes `supersedes_count` so agent knows history exists).
|
||||
|
||||
### `brain_forget` — Soft-delete or supersede
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"reason": "Superseded by new calibration approach"
|
||||
}
|
||||
```
|
||||
|
||||
Sets `deleted_at` in MariaDB, removes point from Qdrant. Keeps audit trail.
|
||||
|
||||
### `brain_list` — Browse (no vectors)
|
||||
|
||||
```json
|
||||
{
|
||||
"project": "eaas",
|
||||
"type": "decision",
|
||||
"agent_id": "charon",
|
||||
"limit": 20
|
||||
}
|
||||
```
|
||||
|
||||
Pure MariaDB query. For browsing, auditing, bulk export. No embedding needed.
|
||||
|
||||
## Architecture
|
||||
|
||||
### PHP side (`php-agentic`)
|
||||
|
||||
```
|
||||
Mcp/Tools/Agent/Brain/
|
||||
├── BrainRemember.php
|
||||
├── BrainRecall.php
|
||||
├── BrainForget.php
|
||||
└── BrainList.php
|
||||
|
||||
Services/
|
||||
└── BrainService.php # Ollama embeddings + Qdrant client + MariaDB CRUD
|
||||
|
||||
Models/
|
||||
└── BrainMemory.php # Eloquent model
|
||||
|
||||
Migrations/
|
||||
└── XXXX_create_brain_memories_table.php
|
||||
```
|
||||
|
||||
`BrainService` handles:
|
||||
- Ollama HTTP calls for embeddings
|
||||
- Qdrant REST API (upsert, search, delete points)
|
||||
- MariaDB CRUD via Eloquent
|
||||
- Supersession chain management
|
||||
|
||||
### Go side (`go-ai`)
|
||||
|
||||
Thin bridge tools in the MCP server that proxy `brain_*` calls to Laravel via the existing WebSocket bridge. Same pattern as `ide_chat_send` / `ide_session_create`.
|
||||
|
||||
### Data flow
|
||||
|
||||
```
|
||||
Agent (any Claude)
|
||||
↓ MCP tool call
|
||||
Go MCP server (local, macOS/Linux)
|
||||
↓ WebSocket bridge
|
||||
Laravel php-agentic (lthn.ai, de1)
|
||||
↓ ↓
|
||||
MariaDB Qdrant
|
||||
(relational) (vectors)
|
||||
↑
|
||||
Ollama (embeddings)
|
||||
```
|
||||
|
||||
PHP-native agents skip the Go bridge — call `BrainService` directly.
|
||||
|
||||
### Infrastructure
|
||||
|
||||
- **Qdrant:** New container on de1. Shared between OpenBrain and EaaS scoring (different collections).
|
||||
- **Ollama:** Existing instance. `nomic-embed-text` model for 768d embeddings. CPU is fine for the volume (~10K memories).
|
||||
- **MariaDB:** Existing instance on de1. New table in the agentic database.
|
||||
|
||||
## Integration
|
||||
|
||||
### Plans → Brain
|
||||
|
||||
On plan completion, agents can extract key decisions/findings and `brain_remember` them. Optional — agents decide what's worth persisting. The plan itself stays in `agent_plans`; lessons learned go to the brain.
|
||||
|
||||
### Sessions → Brain
|
||||
|
||||
Handoff notes (summary, next_steps, blockers) can auto-persist as memories with `type: observation` and optional TTL. Agents can also manually remember during a session.
|
||||
|
||||
### MEMORY.md migration
|
||||
|
||||
Seed data: collect all `MEMORY.md` files from `~/.claude/projects/*/memory/` across worktrees. Parse into individual memories, embed, and load into OpenBrain. After migration, `brain_recall` replaces file-based memory.
|
||||
|
||||
### EaaS
|
||||
|
||||
Same Qdrant instance, different collection (`eaas_scoring` vs `openbrain`). Shared infrastructure, separate concerns.
|
||||
|
||||
### LEM
|
||||
|
||||
LEM models query the brain for project context during training data curation or benchmark analysis. Same MCP tools, different agent ID.
|
||||
|
||||
## What this replaces
|
||||
|
||||
- Virgil's `MEMORY.md` files (file-based, single-agent, no search)
|
||||
- Scattered `docs/plans/` findings that get forgotten
|
||||
- Manual "Charon found X" cross-agent handoffs
|
||||
- Session-scoped knowledge that dies with context compression
|
||||
|
||||
## What this enables
|
||||
|
||||
- Any Claude picks up where another left off — semantically
|
||||
- Decisions surface when related code is touched
|
||||
- Knowledge graph grows with every session across all agents
|
||||
- Near-native context transfer between models and sessions
|
||||
Reference in a new issue