go-agentic/docs/architecture.md
Snider 6e8ae2a1ec docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:02:20 +00:00

382 lines
16 KiB
Markdown

# go-agentic Architecture
Module: `forge.lthn.ai/core/go-agentic`
This document describes the internal structure of go-agentic: how components relate, what interfaces they implement, and how data flows through the system.
---
## Overview
go-agentic is an AI service lifecycle and task management library. It provides:
- A Core framework service that spawns `claude` subprocess invocations
- Quota enforcement (allowance management) with three storage backends
- An agent registry with two persistence backends
- A task router and dispatcher for multi-agent coordination
- Event hooks for lifecycle notification
- CLI-oriented helper functions for status display, task submission, and log streaming
The package has no dependency on go-ai, go-ml, go-rag, or go-mcp. Its only non-standard imports are `forge.lthn.ai/core/go` (the Core DI framework), `forge.lthn.ai/core/go-store` (SQLite KV), and `github.com/redis/go-redis/v9`.
---
## Service Lifecycle
`Service` wraps `framework.ServiceRuntime[ServiceOptions]` and integrates with the Core dependency injection container.
```go
type Service struct {
*framework.ServiceRuntime[ServiceOptions]
}
```
On startup, `OnStartup` registers a task handler with the Core runtime:
```go
func (s *Service) OnStartup(ctx context.Context) error {
s.Core().RegisterTask(s.handleTask)
return nil
}
```
The handler dispatches on task type:
- `TaskCommit` — runs the embedded `prompts/commit.md` prompt via `claude -p <prompt> --allowedTools Bash,Read,Glob,Grep` in the specified directory. If `CanEdit` is true, `Write` and `Edit` are added to the tool list.
- `TaskPrompt` — runs an arbitrary prompt. Allowed tools default to `ServiceOptions.DefaultTools` (Bash, Read, Glob, Grep) but can be overridden per task. Reports progress via `Core().Progress()` using the task ID.
Both task types spawn `claude` as a subprocess with `os.Stdout`/`os.Stderr`/`os.Stdin` connected directly, so output streams to the terminal.
### ServiceOptions
```go
type ServiceOptions struct {
DefaultTools []string // default tool list for TaskPrompt
AllowEdit bool // global permission for Write/Edit tools
}
```
Default tools are `["Bash", "Read", "Glob", "Grep"]`. Write/Edit require explicit opt-in either globally or per task.
---
## Allowance Management
Allowance management enforces per-agent and per-model token quotas. It is composed of three layers:
1. `AllowanceStore` — persistence interface
2. `AllowanceService` — enforcement logic
3. Three `AllowanceStore` implementations — Memory, SQLite, Redis
### AllowanceStore Interface
```go
type AllowanceStore interface {
GetAllowance(agentID string) (*AgentAllowance, error)
SetAllowance(a *AgentAllowance) error
GetUsage(agentID string) (*UsageRecord, error)
IncrementUsage(agentID string, tokens int64, jobs int) error
DecrementActiveJobs(agentID string) error
ReturnTokens(agentID string, tokens int64) error
ResetUsage(agentID string) error
GetModelQuota(model string) (*ModelQuota, error)
GetModelUsage(model string) (int64, error)
IncrementModelUsage(model string, tokens int64) error
}
```
All three implementations satisfy this interface with identical semantics. The `AgentAllowance` struct holds per-agent limits; `UsageRecord` holds current period usage.
### MemoryStore
In-memory implementation using `sync.RWMutex`. State is lost on process restart. Suitable for single-process use, testing, and development. Uses defensive copy semantics: `Get` and `Set` return copies of stored structs to prevent aliasing.
### SQLiteStore
Persistent single-node implementation using `forge.lthn.ai/core/go-store` (a SQLite KV abstraction over `modernc.org/sqlite`). Data is namespaced into four groups: `allowances`, `usage`, `model_quotas`, `model_usage`. Read-modify-write operations are serialised with a `sync.Mutex`. WAL mode is enabled for concurrent reads. `time.Duration` values are serialised as int64 nanoseconds to avoid locale-dependent string parsing.
Default database path: `~/.config/agentic/allowance.db`.
### RedisStore
Multi-process persistent implementation using `github.com/redis/go-redis/v9`. Suitable for deployments where multiple agent processes share quota state. Atomic read-modify-write is implemented via Lua scripts registered on the Redis server (`EVAL`). Key pattern: `{prefix}:{type}:{id}`. Default prefix: `agentic`. Pings the server at construction time.
### Backend Selection
`NewAllowanceStoreFromConfig(cfg AllowanceConfig)` is the factory:
```go
type AllowanceConfig struct {
StoreBackend string // "memory" | "sqlite" | "redis"
StorePath string // SQLite path (optional)
RedisAddr string // host:port (optional)
}
```
An empty or unrecognised `StoreBackend` defaults to `"memory"`.
### AllowanceService
`AllowanceService` wraps an `AllowanceStore` and enforces quota policy. The check order in `Check(agentID, model string)` is:
1. Model allowlist — rejects if agent's `ModelAllowlist` is non-empty and the model is not in it
2. Daily token limit — rejects if `TokensUsed >= DailyTokenLimit`; emits `EventQuotaWarning` at 80% usage
3. Daily job limit — rejects if `JobsStarted >= DailyJobLimit`
4. Concurrent jobs — rejects if `ActiveJobs >= ConcurrentJobs`
5. Global model budget — rejects if model-level `TokensUsed >= DailyTokenBudget`
When multiple limits are simultaneously exceeded, the first in this order is reported as the reason.
`RecordUsage(report UsageReport)` handles four quota events:
| Event | Effect |
|---|---|
| `QuotaEventJobStarted` | Increments `JobsStarted` and `ActiveJobs` by 1 |
| `QuotaEventJobCompleted` | Adds `TokensIn + TokensOut` to `TokensUsed`; decrements `ActiveJobs`; records model usage |
| `QuotaEventJobFailed` | Records full token usage; returns 50% of tokens; decrements `ActiveJobs` |
| `QuotaEventJobCancelled` | Returns 100% of tokens; decrements `ActiveJobs`; no model usage recorded |
Two fields on `ModelQuota``HourlyRateLimit` and `CostCeiling` — are stored and round-tripped by all backends but are not currently enforced by `AllowanceService.Check`. Enforcement requires a sliding-window `GetHourlyUsage` method on `AllowanceStore`, which would be a breaking interface change. The fields are annotated as reserved in the source.
---
## Agent Registry
`AgentRegistry` tracks the set of known agents and their health state.
### Interface
```go
type AgentRegistry interface {
Register(agent AgentInfo) error
Deregister(id string) error
Get(id string) (AgentInfo, error)
List() []AgentInfo
Heartbeat(id string) error
Reap(ttl time.Duration) []string
}
```
`Reap` transitions agents whose `LastHeartbeat` is older than `ttl` to `AgentOffline` and returns their IDs. `Heartbeat` transitions an `AgentOffline` agent back to `AgentAvailable`.
### MemoryRegistry
In-memory implementation with `sync.RWMutex`. Uses copy-on-read semantics consistent with `MemoryStore`.
### SQLiteRegistry
Persistent implementation backed by a `database/sql` SQLite connection (using `modernc.org/sqlite` directly, not via go-store). Schema:
```sql
CREATE TABLE agents (
id TEXT PRIMARY KEY,
name TEXT NOT NULL DEFAULT '',
capabilities TEXT NOT NULL DEFAULT '[]',
status TEXT NOT NULL DEFAULT 'available',
last_heartbeat DATETIME NOT NULL DEFAULT (datetime('now')),
current_load INTEGER NOT NULL DEFAULT 0,
max_load INTEGER NOT NULL DEFAULT 0,
registered_at DATETIME NOT NULL DEFAULT (datetime('now'))
)
```
`Register` uses `INSERT ... ON CONFLICT(id) DO UPDATE` (UPSERT). WAL mode and `busy_timeout=5000ms` are set at open time. Capabilities are stored as a JSON array string.
### RedisRegistry
TTL-based implementation where each agent is stored as a JSON value at key `{prefix}:agent:{id}`. Natural key expiry serves as the reap mechanism. `Heartbeat` refreshes the key's TTL. `Reap` performs an explicit SCAN for expired keys as a backup to natural expiry. Uses `WithRedisPassword`, `WithRedisDB`, and `WithRedisPrefix` functional options.
### Backend Selection
`NewAgentRegistryFromConfig(cfg RegistryConfig)` mirrors the allowance factory:
```go
type RegistryConfig struct {
RegistryBackend string // "memory" | "sqlite" | "redis"
RegistryPath string // SQLite path (optional)
RegistryRedisAddr string // host:port (optional)
}
```
Default registry path: `~/.config/agentic/registry.db`.
---
## Task Queue and Dispatcher
### Task Model
`Task` is the core data type:
```go
type Task struct {
ID string
Title string
Description string
Priority TaskPriority // critical | high | medium | low
Status TaskStatus // pending | in_progress | completed | blocked | failed
Labels []string // capability requirements for routing
Files []string // relevant files for context building
Dependencies []string // task IDs that must complete first
MaxRetries int // 0 uses DefaultMaxRetries (3)
RetryCount int // failed dispatch attempts so far
LastAttempt *time.Time // when the last dispatch attempt occurred
FailReason string // set when StatusFailed
// ... timestamps, ClaimedBy, Project, etc.
}
```
### TaskRouter
`DefaultRouter` selects an agent using two strategies:
- For `PriorityCritical` tasks: picks the agent with the lowest `CurrentLoad`, tie-broken alphabetically by agent ID.
- For all other tasks: scores agents as `1.0 - (CurrentLoad / MaxLoad)` and picks the highest scorer. Agents with `MaxLoad == 0` score 1.0 (unlimited capacity).
Eligibility filtering runs first: only agents with status `AgentAvailable` or `AgentBusy` with remaining capacity are considered. The task's `Labels` field maps to capability requirements — all labels must be present in the agent's `Capabilities` slice.
Returns `ErrNoEligibleAgent` when no agent qualifies.
### Dispatcher
`Dispatcher` combines `AgentRegistry`, `TaskRouter`, `AllowanceService`, and `Client`:
```
Dispatch(ctx, task):
1. registry.List() — get candidate agents
2. router.Route(task, agents) — select best agent
3. allowance.Check(agentID) — verify quota
4. client.ClaimTask(task.ID) — mark in-progress via API
5. allowance.RecordUsage(JobStarted) — record job start
→ emit EventTaskDispatched
```
Failure paths emit `EventDispatchFailedNoAgent` or `EventDispatchFailedQuota`.
### DispatchLoop
`DispatchLoop(ctx, interval)` polls `client.ListTasks(pending)` at the given interval and dispatches each task. Key behaviours:
- Tasks are sorted by priority (Critical first) then by `CreatedAt` (oldest first) before dispatch.
- Tasks with `RetryCount > 0` are skipped if `LastAttempt + backoff(RetryCount) > now`. Backoff is exponential starting at 5 seconds (5s, 10s, 20s, ...).
- Tasks that fail dispatch have `RetryCount` incremented. Tasks reaching `MaxRetries` (default 3) are updated to `StatusFailed` via the API (dead-lettered) and emit `EventTaskDeadLettered`.
- Transient API errors listing tasks are logged and the loop continues.
- The loop exits when the context is cancelled and returns `ctx.Err()`.
---
## Context Builder
`BuildTaskContext(task, dir)` gathers context for AI collaboration:
1. Reads files explicitly listed in `task.Files` from the working directory.
2. Runs `git status --porcelain` and `git log --oneline -10`.
3. Extracts keywords from the task title and description (stop-word filtered, minimum 3 chars, up to 5 keywords) and runs `git grep -l -i <keyword> -- *.go *.ts *.js *.py` for each.
4. Reads up to 10 matching source files, truncating files over 5,000 bytes.
`TaskContext.FormatContext()` renders the collected data as a Markdown string with sections for task metadata, task files, git status, recent commits, and related code — suitable for inclusion in a prompt sent to an AI model.
`GatherRelatedFiles(task, dir)` is the public sub-function that reads only the explicitly listed files.
---
## CLI Client
`Client` is an HTTP client for the core-agentic REST API. It sends `Bearer` token authentication on every request.
Endpoints used:
| Method | Path | Client function |
|---|---|---|
| GET | /api/tasks | ListTasks |
| GET | /api/tasks/{id} | GetTask |
| POST | /api/tasks | CreateTask |
| POST | /api/tasks/{id}/claim | ClaimTask |
| PATCH | /api/tasks/{id} | UpdateTask |
| POST | /api/tasks/{id}/complete | CompleteTask |
| GET | /api/health | Ping |
`ClaimTask` accepts both `ClaimResponse{"task": ...}` and bare `Task` JSON to handle varying API implementations. `checkResponse` parses error bodies as `APIError{code, message, details}` or falls back to a generic HTTP status message.
Default timeout: 30 seconds. Default base URL: `http://localhost:8080` (override with `AGENTIC_BASE_URL`).
### Configuration
`LoadConfig(dir)` resolves configuration in this order:
1. `.env` file in the specified directory (or current directory if `dir` is empty), reading `AGENTIC_BASE_URL`, `AGENTIC_TOKEN`, `AGENTIC_PROJECT`, `AGENTIC_AGENT_ID`.
2. `~/.core/agentic.yaml` with the same fields in YAML form.
3. Environment variable overrides applied on top of whichever file was loaded.
A missing token always returns an error. `SaveConfig` writes to `~/.core/agentic.yaml`, creating the directory if needed.
---
## Event Hooks
All lifecycle events are published through the `EventEmitter` interface:
```go
type EventEmitter interface {
Emit(ctx context.Context, event Event) error
}
```
`Event` carries a typed string `EventType`, optional `TaskID` and `AgentID`, a `Timestamp`, and an untyped `Payload`.
### Event Types
| Type | Emitter | Trigger |
|---|---|---|
| `task_dispatched` | Dispatcher | Successful routing, quota check, and claim |
| `task_claimed` | Dispatcher | API claim call succeeded |
| `dispatch_failed_no_agent` | Dispatcher | `ErrNoEligibleAgent` from router |
| `dispatch_failed_quota` | Dispatcher | Allowance check denied |
| `task_dead_lettered` | Dispatcher | Task exceeded `MaxRetries` |
| `quota_warning` | AllowanceService | Agent at 80%+ of daily token limit |
| `quota_exceeded` | AllowanceService | Any limit check fails |
| `usage_recorded` | AllowanceService | `JobStarted` or `JobCompleted` events |
### Implementations
`ChannelEmitter` — buffered `chan Event`. Drops events silently if the buffer is full to avoid blocking the dispatch path. Default buffer size: 64. Use `Events()` to receive.
`MultiEmitter` — fans out to a slice of emitters. Failures in one emitter do not stop the others. Thread-safe: emitters can be added with `Add()` at any time.
Both `Dispatcher` and `AllowanceService` accept an emitter via `SetEventEmitter()`. A nil emitter is safe — all emit calls are no-ops.
---
## CLI Backing Functions
These functions are intended to be called by `core agent` CLI commands implemented in the `core/cli` repository.
`GetStatus(ctx, registry, client, allowanceSvc)` — aggregates `registry.List()`, counts of pending and in-progress tasks from the client, and remaining tokens per agent from the allowance service. Any argument can be nil; those sections are skipped. Returns `*StatusSummary`.
`FormatStatus(summary)` — renders a `StatusSummary` as a human-readable table with agent rows sorted alphabetically by ID.
`SubmitTask(ctx, client, title, description, labels, priority)` — validates that `title` is non-empty, constructs a `Task` with `StatusPending` and current timestamp, and calls `client.CreateTask`.
`StreamLogs(ctx, client, taskID, interval, writer)` — polls `client.GetTask` at the given interval, writing timestamped status lines to `writer`. Stops on `StatusCompleted` or `StatusBlocked`. Transient errors are written to the stream but do not stop polling. Returns `ctx.Err()` on cancellation.
---
## Prompt Templates
`embed.go` embeds all files under `prompts/*.md` at compile time. `Prompt(name)` reads a template by name (without `.md`), trims whitespace, and returns an empty string if the file does not exist. Currently one template is provided: `prompts/commit.md`, used by `TaskCommit` to guide Claude through generating a conventional commit.
---
## Dependency Graph
```
go-agentic
forge.lthn.ai/core/go (ServiceRuntime, framework, io, log)
forge.lthn.ai/core/go-store (SQLite KV — SQLiteStore)
github.com/redis/go-redis/v9 (RedisStore, RedisRegistry)
gopkg.in/yaml.v3 (config file parsing)
modernc.org/sqlite (SQLiteRegistry — direct database/sql)
github.com/stretchr/testify (tests only)
```
The module uses Go workspace `replace` directives to reference `../go` and `../go-store` during development.