214 lines
6.9 KiB
Markdown
214 lines
6.9 KiB
Markdown
|
|
# OpenBrain Design
|
|||
|
|
|
|||
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|||
|
|
|
|||
|
|
**Goal:** Shared vector-indexed knowledge store that all agents (Virgil, Charon, Darbs, LEM) read/write through MCP, building singular state across sessions.
|
|||
|
|
|
|||
|
|
**Architecture:** MariaDB for relational metadata + Qdrant for vector embeddings. Four MCP tools in php-agentic. Go bridge in go-ai for CLI agents. Ollama for embedding generation.
|
|||
|
|
|
|||
|
|
**Repos:** `dappco.re/php/agent` (primary), `dappco.re/go/ai` (bridge)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Problem
|
|||
|
|
|
|||
|
|
Agent knowledge is scattered:
|
|||
|
|
- Virgil's `MEMORY.md` files in `~/.claude/projects/*/memory/` — file-based, single-agent, no semantic search
|
|||
|
|
- Plans in `docs/plans/` across repos — forgotten after completion
|
|||
|
|
- Session handoff notes in `agent_sessions.handoff_notes` — JSON blobs, not searchable
|
|||
|
|
- Research findings lost when context windows compress
|
|||
|
|
|
|||
|
|
When Charon discovers a scoring calibration bug, Virgil only knows about it if explicitly told. There's no shared knowledge graph.
|
|||
|
|
|
|||
|
|
## Concept
|
|||
|
|
|
|||
|
|
**OpenBrain** — "Open" means open protocol (MCP), not open source. All agents on the platform access the same knowledge graph via `brain_*` MCP tools. Data is stored *for agents* — structured for near-native context transfer between sessions and models.
|
|||
|
|
|
|||
|
|
## Data Model
|
|||
|
|
|
|||
|
|
### `brain_memories` table (MariaDB)
|
|||
|
|
|
|||
|
|
| Column | Type | Purpose |
|
|||
|
|
|--------|------|---------|
|
|||
|
|
| `id` | UUID | Primary key, also Qdrant point ID |
|
|||
|
|
| `workspace_id` | FK | Multi-tenant isolation |
|
|||
|
|
| `agent_id` | string | Who wrote it (virgil, charon, darbs, lem) |
|
|||
|
|
| `type` | enum | `decision`, `observation`, `convention`, `research`, `plan`, `bug`, `architecture` |
|
|||
|
|
| `content` | text | The knowledge (markdown) |
|
|||
|
|
| `tags` | JSON | Topic tags for filtering |
|
|||
|
|
| `project` | string nullable | Repo/project scope (null = cross-project) |
|
|||
|
|
| `confidence` | float | 0.0–1.0, how certain the agent is |
|
|||
|
|
| `supersedes_id` | UUID nullable | FK to older memory this replaces |
|
|||
|
|
| `expires_at` | timestamp nullable | TTL for session-scoped context |
|
|||
|
|
| `deleted_at` | timestamp nullable | Soft delete |
|
|||
|
|
| `created_at` | timestamp | |
|
|||
|
|
| `updated_at` | timestamp | |
|
|||
|
|
|
|||
|
|
### `openbrain` Qdrant collection
|
|||
|
|
|
|||
|
|
- **Vector dimension:** 768 (nomic-embed-text via Ollama)
|
|||
|
|
- **Distance metric:** Cosine
|
|||
|
|
- **Point ID:** MariaDB UUID
|
|||
|
|
- **Payload:** `workspace_id`, `agent_id`, `type`, `tags`, `project`, `confidence`, `created_at` (for filtered search)
|
|||
|
|
|
|||
|
|
## MCP Tools
|
|||
|
|
|
|||
|
|
### `brain_remember` — Store a memory
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"content": "LEM emotional_register was blind to negative emotions. Fixed by adding 8 weighted pattern groups.",
|
|||
|
|
"type": "bug",
|
|||
|
|
"tags": ["scoring", "emotional-register", "lem"],
|
|||
|
|
"project": "eaas",
|
|||
|
|
"confidence": 0.95,
|
|||
|
|
"supersedes": "uuid-of-outdated-memory"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Agent ID injected from MCP session context. Returns the new memory UUID.
|
|||
|
|
|
|||
|
|
**Pipeline:**
|
|||
|
|
1. Validate input
|
|||
|
|
2. Embed content via Ollama (`POST /api/embeddings`, model: `nomic-embed-text`)
|
|||
|
|
3. Insert into MariaDB
|
|||
|
|
4. Upsert into Qdrant with payload metadata
|
|||
|
|
5. If `supersedes` set, soft-delete the old memory and remove from Qdrant
|
|||
|
|
|
|||
|
|
### `brain_recall` — Semantic search
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"query": "How does verdict classification work?",
|
|||
|
|
"top_k": 5,
|
|||
|
|
"filter": {
|
|||
|
|
"project": "eaas",
|
|||
|
|
"type": ["decision", "architecture"],
|
|||
|
|
"min_confidence": 0.5
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Pipeline:**
|
|||
|
|
1. Embed query via Ollama
|
|||
|
|
2. Search Qdrant with vector + payload filters
|
|||
|
|
3. Get top-K point IDs with similarity scores
|
|||
|
|
4. Hydrate from MariaDB (content, tags, supersedes chain)
|
|||
|
|
5. Return ranked results with scores
|
|||
|
|
|
|||
|
|
Only returns latest version of superseded memories (includes `supersedes_count` so agent knows history exists).
|
|||
|
|
|
|||
|
|
### `brain_forget` — Soft-delete or supersede
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"id": "uuid",
|
|||
|
|
"reason": "Superseded by new calibration approach"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Sets `deleted_at` in MariaDB, removes point from Qdrant. Keeps audit trail.
|
|||
|
|
|
|||
|
|
### `brain_list` — Browse (no vectors)
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"project": "eaas",
|
|||
|
|
"type": "decision",
|
|||
|
|
"agent_id": "charon",
|
|||
|
|
"limit": 20
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Pure MariaDB query. For browsing, auditing, bulk export. No embedding needed.
|
|||
|
|
|
|||
|
|
## Architecture
|
|||
|
|
|
|||
|
|
### PHP side (`php-agentic`)
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Mcp/Tools/Agent/Brain/
|
|||
|
|
├── BrainRemember.php
|
|||
|
|
├── BrainRecall.php
|
|||
|
|
├── BrainForget.php
|
|||
|
|
└── BrainList.php
|
|||
|
|
|
|||
|
|
Services/
|
|||
|
|
└── BrainService.php # Ollama embeddings + Qdrant client + MariaDB CRUD
|
|||
|
|
|
|||
|
|
Models/
|
|||
|
|
└── BrainMemory.php # Eloquent model
|
|||
|
|
|
|||
|
|
Migrations/
|
|||
|
|
└── XXXX_create_brain_memories_table.php
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
`BrainService` handles:
|
|||
|
|
- Ollama HTTP calls for embeddings
|
|||
|
|
- Qdrant REST API (upsert, search, delete points)
|
|||
|
|
- MariaDB CRUD via Eloquent
|
|||
|
|
- Supersession chain management
|
|||
|
|
|
|||
|
|
### Go side (`go-ai`)
|
|||
|
|
|
|||
|
|
Thin bridge tools in the MCP server that proxy `brain_*` calls to Laravel via the existing WebSocket bridge. Same pattern as `ide_chat_send` / `ide_session_create`.
|
|||
|
|
|
|||
|
|
### Data flow
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Agent (any Claude)
|
|||
|
|
↓ MCP tool call
|
|||
|
|
Go MCP server (local, macOS/Linux)
|
|||
|
|
↓ WebSocket bridge
|
|||
|
|
Laravel php-agentic (lthn.sh, de1)
|
|||
|
|
↓ ↓
|
|||
|
|
MariaDB Qdrant
|
|||
|
|
(relational) (vectors)
|
|||
|
|
↑
|
|||
|
|
Ollama (embeddings)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
PHP-native agents skip the Go bridge — call `BrainService` directly.
|
|||
|
|
|
|||
|
|
### Infrastructure
|
|||
|
|
|
|||
|
|
- **Qdrant:** New container on de1. Shared between OpenBrain and EaaS scoring (different collections).
|
|||
|
|
- **Ollama:** Existing instance. `nomic-embed-text` model for 768d embeddings. CPU is fine for the volume (~10K memories).
|
|||
|
|
- **MariaDB:** Existing instance on de1. New table in the agentic database.
|
|||
|
|
|
|||
|
|
## Integration
|
|||
|
|
|
|||
|
|
### Plans → Brain
|
|||
|
|
|
|||
|
|
On plan completion, agents can extract key decisions/findings and `brain_remember` them. Optional — agents decide what's worth persisting. The plan itself stays in `agent_plans`; lessons learned go to the brain.
|
|||
|
|
|
|||
|
|
### Sessions → Brain
|
|||
|
|
|
|||
|
|
Handoff notes (summary, next_steps, blockers) can auto-persist as memories with `type: observation` and optional TTL. Agents can also manually remember during a session.
|
|||
|
|
|
|||
|
|
### MEMORY.md migration
|
|||
|
|
|
|||
|
|
Seed data: collect all `MEMORY.md` files from `~/.claude/projects/*/memory/` across worktrees. Parse into individual memories, embed, and load into OpenBrain. After migration, `brain_recall` replaces file-based memory.
|
|||
|
|
|
|||
|
|
### EaaS
|
|||
|
|
|
|||
|
|
Same Qdrant instance, different collection (`eaas_scoring` vs `openbrain`). Shared infrastructure, separate concerns.
|
|||
|
|
|
|||
|
|
### LEM
|
|||
|
|
|
|||
|
|
LEM models query the brain for project context during training data curation or benchmark analysis. Same MCP tools, different agent ID.
|
|||
|
|
|
|||
|
|
## What this replaces
|
|||
|
|
|
|||
|
|
- Virgil's `MEMORY.md` files (file-based, single-agent, no search)
|
|||
|
|
- Scattered `docs/plans/` findings that get forgotten
|
|||
|
|
- Manual "Charon found X" cross-agent handoffs
|
|||
|
|
- Session-scoped knowledge that dies with context compression
|
|||
|
|
|
|||
|
|
## What this enables
|
|||
|
|
|
|||
|
|
- Any Claude picks up where another left off — semantically
|
|||
|
|
- Decisions surface when related code is touched
|
|||
|
|
- Knowledge graph grows with every session across all agents
|
|||
|
|
- Near-native context transfer between models and sessions
|