Replace internal task tracking (TODO.md, FINDINGS.md) with structured documentation in docs/. Trim CLAUDE.md to agent instructions only. Co-Authored-By: Virgil <virgil@lethean.io>
382 lines
16 KiB
Markdown
382 lines
16 KiB
Markdown
# go-agentic Architecture
|
|
|
|
Module: `forge.lthn.ai/core/go-agentic`
|
|
|
|
This document describes the internal structure of go-agentic: how components relate, what interfaces they implement, and how data flows through the system.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
go-agentic is an AI service lifecycle and task management library. It provides:
|
|
|
|
- A Core framework service that spawns `claude` subprocess invocations
|
|
- Quota enforcement (allowance management) with three storage backends
|
|
- An agent registry with two persistence backends
|
|
- A task router and dispatcher for multi-agent coordination
|
|
- Event hooks for lifecycle notification
|
|
- CLI-oriented helper functions for status display, task submission, and log streaming
|
|
|
|
The package has no dependency on go-ai, go-ml, go-rag, or go-mcp. Its only non-standard imports are `forge.lthn.ai/core/go` (the Core DI framework), `forge.lthn.ai/core/go-store` (SQLite KV), and `github.com/redis/go-redis/v9`.
|
|
|
|
---
|
|
|
|
## Service Lifecycle
|
|
|
|
`Service` wraps `framework.ServiceRuntime[ServiceOptions]` and integrates with the Core dependency injection container.
|
|
|
|
```go
|
|
type Service struct {
|
|
*framework.ServiceRuntime[ServiceOptions]
|
|
}
|
|
```
|
|
|
|
On startup, `OnStartup` registers a task handler with the Core runtime:
|
|
|
|
```go
|
|
func (s *Service) OnStartup(ctx context.Context) error {
|
|
s.Core().RegisterTask(s.handleTask)
|
|
return nil
|
|
}
|
|
```
|
|
|
|
The handler dispatches on task type:
|
|
|
|
- `TaskCommit` — runs the embedded `prompts/commit.md` prompt via `claude -p <prompt> --allowedTools Bash,Read,Glob,Grep` in the specified directory. If `CanEdit` is true, `Write` and `Edit` are added to the tool list.
|
|
- `TaskPrompt` — runs an arbitrary prompt. Allowed tools default to `ServiceOptions.DefaultTools` (Bash, Read, Glob, Grep) but can be overridden per task. Reports progress via `Core().Progress()` using the task ID.
|
|
|
|
Both task types spawn `claude` as a subprocess with `os.Stdout`/`os.Stderr`/`os.Stdin` connected directly, so output streams to the terminal.
|
|
|
|
### ServiceOptions
|
|
|
|
```go
|
|
type ServiceOptions struct {
|
|
DefaultTools []string // default tool list for TaskPrompt
|
|
AllowEdit bool // global permission for Write/Edit tools
|
|
}
|
|
```
|
|
|
|
Default tools are `["Bash", "Read", "Glob", "Grep"]`. Write/Edit require explicit opt-in either globally or per task.
|
|
|
|
---
|
|
|
|
## Allowance Management
|
|
|
|
Allowance management enforces per-agent and per-model token quotas. It is composed of three layers:
|
|
|
|
1. `AllowanceStore` — persistence interface
|
|
2. `AllowanceService` — enforcement logic
|
|
3. Three `AllowanceStore` implementations — Memory, SQLite, Redis
|
|
|
|
### AllowanceStore Interface
|
|
|
|
```go
|
|
type AllowanceStore interface {
|
|
GetAllowance(agentID string) (*AgentAllowance, error)
|
|
SetAllowance(a *AgentAllowance) error
|
|
GetUsage(agentID string) (*UsageRecord, error)
|
|
IncrementUsage(agentID string, tokens int64, jobs int) error
|
|
DecrementActiveJobs(agentID string) error
|
|
ReturnTokens(agentID string, tokens int64) error
|
|
ResetUsage(agentID string) error
|
|
GetModelQuota(model string) (*ModelQuota, error)
|
|
GetModelUsage(model string) (int64, error)
|
|
IncrementModelUsage(model string, tokens int64) error
|
|
}
|
|
```
|
|
|
|
All three implementations satisfy this interface with identical semantics. The `AgentAllowance` struct holds per-agent limits; `UsageRecord` holds current period usage.
|
|
|
|
### MemoryStore
|
|
|
|
In-memory implementation using `sync.RWMutex`. State is lost on process restart. Suitable for single-process use, testing, and development. Uses defensive copy semantics: `Get` and `Set` return copies of stored structs to prevent aliasing.
|
|
|
|
### SQLiteStore
|
|
|
|
Persistent single-node implementation using `forge.lthn.ai/core/go-store` (a SQLite KV abstraction over `modernc.org/sqlite`). Data is namespaced into four groups: `allowances`, `usage`, `model_quotas`, `model_usage`. Read-modify-write operations are serialised with a `sync.Mutex`. WAL mode is enabled for concurrent reads. `time.Duration` values are serialised as int64 nanoseconds to avoid locale-dependent string parsing.
|
|
|
|
Default database path: `~/.config/agentic/allowance.db`.
|
|
|
|
### RedisStore
|
|
|
|
Multi-process persistent implementation using `github.com/redis/go-redis/v9`. Suitable for deployments where multiple agent processes share quota state. Atomic read-modify-write is implemented via Lua scripts registered on the Redis server (`EVAL`). Key pattern: `{prefix}:{type}:{id}`. Default prefix: `agentic`. Pings the server at construction time.
|
|
|
|
### Backend Selection
|
|
|
|
`NewAllowanceStoreFromConfig(cfg AllowanceConfig)` is the factory:
|
|
|
|
```go
|
|
type AllowanceConfig struct {
|
|
StoreBackend string // "memory" | "sqlite" | "redis"
|
|
StorePath string // SQLite path (optional)
|
|
RedisAddr string // host:port (optional)
|
|
}
|
|
```
|
|
|
|
An empty or unrecognised `StoreBackend` defaults to `"memory"`.
|
|
|
|
### AllowanceService
|
|
|
|
`AllowanceService` wraps an `AllowanceStore` and enforces quota policy. The check order in `Check(agentID, model string)` is:
|
|
|
|
1. Model allowlist — rejects if agent's `ModelAllowlist` is non-empty and the model is not in it
|
|
2. Daily token limit — rejects if `TokensUsed >= DailyTokenLimit`; emits `EventQuotaWarning` at 80% usage
|
|
3. Daily job limit — rejects if `JobsStarted >= DailyJobLimit`
|
|
4. Concurrent jobs — rejects if `ActiveJobs >= ConcurrentJobs`
|
|
5. Global model budget — rejects if model-level `TokensUsed >= DailyTokenBudget`
|
|
|
|
When multiple limits are simultaneously exceeded, the first in this order is reported as the reason.
|
|
|
|
`RecordUsage(report UsageReport)` handles four quota events:
|
|
|
|
| Event | Effect |
|
|
|---|---|
|
|
| `QuotaEventJobStarted` | Increments `JobsStarted` and `ActiveJobs` by 1 |
|
|
| `QuotaEventJobCompleted` | Adds `TokensIn + TokensOut` to `TokensUsed`; decrements `ActiveJobs`; records model usage |
|
|
| `QuotaEventJobFailed` | Records full token usage; returns 50% of tokens; decrements `ActiveJobs` |
|
|
| `QuotaEventJobCancelled` | Returns 100% of tokens; decrements `ActiveJobs`; no model usage recorded |
|
|
|
|
Two fields on `ModelQuota` — `HourlyRateLimit` and `CostCeiling` — are stored and round-tripped by all backends but are not currently enforced by `AllowanceService.Check`. Enforcement requires a sliding-window `GetHourlyUsage` method on `AllowanceStore`, which would be a breaking interface change. The fields are annotated as reserved in the source.
|
|
|
|
---
|
|
|
|
## Agent Registry
|
|
|
|
`AgentRegistry` tracks the set of known agents and their health state.
|
|
|
|
### Interface
|
|
|
|
```go
|
|
type AgentRegistry interface {
|
|
Register(agent AgentInfo) error
|
|
Deregister(id string) error
|
|
Get(id string) (AgentInfo, error)
|
|
List() []AgentInfo
|
|
Heartbeat(id string) error
|
|
Reap(ttl time.Duration) []string
|
|
}
|
|
```
|
|
|
|
`Reap` transitions agents whose `LastHeartbeat` is older than `ttl` to `AgentOffline` and returns their IDs. `Heartbeat` transitions an `AgentOffline` agent back to `AgentAvailable`.
|
|
|
|
### MemoryRegistry
|
|
|
|
In-memory implementation with `sync.RWMutex`. Uses copy-on-read semantics consistent with `MemoryStore`.
|
|
|
|
### SQLiteRegistry
|
|
|
|
Persistent implementation backed by a `database/sql` SQLite connection (using `modernc.org/sqlite` directly, not via go-store). Schema:
|
|
|
|
```sql
|
|
CREATE TABLE agents (
|
|
id TEXT PRIMARY KEY,
|
|
name TEXT NOT NULL DEFAULT '',
|
|
capabilities TEXT NOT NULL DEFAULT '[]',
|
|
status TEXT NOT NULL DEFAULT 'available',
|
|
last_heartbeat DATETIME NOT NULL DEFAULT (datetime('now')),
|
|
current_load INTEGER NOT NULL DEFAULT 0,
|
|
max_load INTEGER NOT NULL DEFAULT 0,
|
|
registered_at DATETIME NOT NULL DEFAULT (datetime('now'))
|
|
)
|
|
```
|
|
|
|
`Register` uses `INSERT ... ON CONFLICT(id) DO UPDATE` (UPSERT). WAL mode and `busy_timeout=5000ms` are set at open time. Capabilities are stored as a JSON array string.
|
|
|
|
### RedisRegistry
|
|
|
|
TTL-based implementation where each agent is stored as a JSON value at key `{prefix}:agent:{id}`. Natural key expiry serves as the reap mechanism. `Heartbeat` refreshes the key's TTL. `Reap` performs an explicit SCAN for expired keys as a backup to natural expiry. Uses `WithRedisPassword`, `WithRedisDB`, and `WithRedisPrefix` functional options.
|
|
|
|
### Backend Selection
|
|
|
|
`NewAgentRegistryFromConfig(cfg RegistryConfig)` mirrors the allowance factory:
|
|
|
|
```go
|
|
type RegistryConfig struct {
|
|
RegistryBackend string // "memory" | "sqlite" | "redis"
|
|
RegistryPath string // SQLite path (optional)
|
|
RegistryRedisAddr string // host:port (optional)
|
|
}
|
|
```
|
|
|
|
Default registry path: `~/.config/agentic/registry.db`.
|
|
|
|
---
|
|
|
|
## Task Queue and Dispatcher
|
|
|
|
### Task Model
|
|
|
|
`Task` is the core data type:
|
|
|
|
```go
|
|
type Task struct {
|
|
ID string
|
|
Title string
|
|
Description string
|
|
Priority TaskPriority // critical | high | medium | low
|
|
Status TaskStatus // pending | in_progress | completed | blocked | failed
|
|
Labels []string // capability requirements for routing
|
|
Files []string // relevant files for context building
|
|
Dependencies []string // task IDs that must complete first
|
|
MaxRetries int // 0 uses DefaultMaxRetries (3)
|
|
RetryCount int // failed dispatch attempts so far
|
|
LastAttempt *time.Time // when the last dispatch attempt occurred
|
|
FailReason string // set when StatusFailed
|
|
// ... timestamps, ClaimedBy, Project, etc.
|
|
}
|
|
```
|
|
|
|
### TaskRouter
|
|
|
|
`DefaultRouter` selects an agent using two strategies:
|
|
|
|
- For `PriorityCritical` tasks: picks the agent with the lowest `CurrentLoad`, tie-broken alphabetically by agent ID.
|
|
- For all other tasks: scores agents as `1.0 - (CurrentLoad / MaxLoad)` and picks the highest scorer. Agents with `MaxLoad == 0` score 1.0 (unlimited capacity).
|
|
|
|
Eligibility filtering runs first: only agents with status `AgentAvailable` or `AgentBusy` with remaining capacity are considered. The task's `Labels` field maps to capability requirements — all labels must be present in the agent's `Capabilities` slice.
|
|
|
|
Returns `ErrNoEligibleAgent` when no agent qualifies.
|
|
|
|
### Dispatcher
|
|
|
|
`Dispatcher` combines `AgentRegistry`, `TaskRouter`, `AllowanceService`, and `Client`:
|
|
|
|
```
|
|
Dispatch(ctx, task):
|
|
1. registry.List() — get candidate agents
|
|
2. router.Route(task, agents) — select best agent
|
|
3. allowance.Check(agentID) — verify quota
|
|
4. client.ClaimTask(task.ID) — mark in-progress via API
|
|
5. allowance.RecordUsage(JobStarted) — record job start
|
|
→ emit EventTaskDispatched
|
|
```
|
|
|
|
Failure paths emit `EventDispatchFailedNoAgent` or `EventDispatchFailedQuota`.
|
|
|
|
### DispatchLoop
|
|
|
|
`DispatchLoop(ctx, interval)` polls `client.ListTasks(pending)` at the given interval and dispatches each task. Key behaviours:
|
|
|
|
- Tasks are sorted by priority (Critical first) then by `CreatedAt` (oldest first) before dispatch.
|
|
- Tasks with `RetryCount > 0` are skipped if `LastAttempt + backoff(RetryCount) > now`. Backoff is exponential starting at 5 seconds (5s, 10s, 20s, ...).
|
|
- Tasks that fail dispatch have `RetryCount` incremented. Tasks reaching `MaxRetries` (default 3) are updated to `StatusFailed` via the API (dead-lettered) and emit `EventTaskDeadLettered`.
|
|
- Transient API errors listing tasks are logged and the loop continues.
|
|
- The loop exits when the context is cancelled and returns `ctx.Err()`.
|
|
|
|
---
|
|
|
|
## Context Builder
|
|
|
|
`BuildTaskContext(task, dir)` gathers context for AI collaboration:
|
|
|
|
1. Reads files explicitly listed in `task.Files` from the working directory.
|
|
2. Runs `git status --porcelain` and `git log --oneline -10`.
|
|
3. Extracts keywords from the task title and description (stop-word filtered, minimum 3 chars, up to 5 keywords) and runs `git grep -l -i <keyword> -- *.go *.ts *.js *.py` for each.
|
|
4. Reads up to 10 matching source files, truncating files over 5,000 bytes.
|
|
|
|
`TaskContext.FormatContext()` renders the collected data as a Markdown string with sections for task metadata, task files, git status, recent commits, and related code — suitable for inclusion in a prompt sent to an AI model.
|
|
|
|
`GatherRelatedFiles(task, dir)` is the public sub-function that reads only the explicitly listed files.
|
|
|
|
---
|
|
|
|
## CLI Client
|
|
|
|
`Client` is an HTTP client for the core-agentic REST API. It sends `Bearer` token authentication on every request.
|
|
|
|
Endpoints used:
|
|
|
|
| Method | Path | Client function |
|
|
|---|---|---|
|
|
| GET | /api/tasks | ListTasks |
|
|
| GET | /api/tasks/{id} | GetTask |
|
|
| POST | /api/tasks | CreateTask |
|
|
| POST | /api/tasks/{id}/claim | ClaimTask |
|
|
| PATCH | /api/tasks/{id} | UpdateTask |
|
|
| POST | /api/tasks/{id}/complete | CompleteTask |
|
|
| GET | /api/health | Ping |
|
|
|
|
`ClaimTask` accepts both `ClaimResponse{"task": ...}` and bare `Task` JSON to handle varying API implementations. `checkResponse` parses error bodies as `APIError{code, message, details}` or falls back to a generic HTTP status message.
|
|
|
|
Default timeout: 30 seconds. Default base URL: `http://localhost:8080` (override with `AGENTIC_BASE_URL`).
|
|
|
|
### Configuration
|
|
|
|
`LoadConfig(dir)` resolves configuration in this order:
|
|
|
|
1. `.env` file in the specified directory (or current directory if `dir` is empty), reading `AGENTIC_BASE_URL`, `AGENTIC_TOKEN`, `AGENTIC_PROJECT`, `AGENTIC_AGENT_ID`.
|
|
2. `~/.core/agentic.yaml` with the same fields in YAML form.
|
|
3. Environment variable overrides applied on top of whichever file was loaded.
|
|
|
|
A missing token always returns an error. `SaveConfig` writes to `~/.core/agentic.yaml`, creating the directory if needed.
|
|
|
|
---
|
|
|
|
## Event Hooks
|
|
|
|
All lifecycle events are published through the `EventEmitter` interface:
|
|
|
|
```go
|
|
type EventEmitter interface {
|
|
Emit(ctx context.Context, event Event) error
|
|
}
|
|
```
|
|
|
|
`Event` carries a typed string `EventType`, optional `TaskID` and `AgentID`, a `Timestamp`, and an untyped `Payload`.
|
|
|
|
### Event Types
|
|
|
|
| Type | Emitter | Trigger |
|
|
|---|---|---|
|
|
| `task_dispatched` | Dispatcher | Successful routing, quota check, and claim |
|
|
| `task_claimed` | Dispatcher | API claim call succeeded |
|
|
| `dispatch_failed_no_agent` | Dispatcher | `ErrNoEligibleAgent` from router |
|
|
| `dispatch_failed_quota` | Dispatcher | Allowance check denied |
|
|
| `task_dead_lettered` | Dispatcher | Task exceeded `MaxRetries` |
|
|
| `quota_warning` | AllowanceService | Agent at 80%+ of daily token limit |
|
|
| `quota_exceeded` | AllowanceService | Any limit check fails |
|
|
| `usage_recorded` | AllowanceService | `JobStarted` or `JobCompleted` events |
|
|
|
|
### Implementations
|
|
|
|
`ChannelEmitter` — buffered `chan Event`. Drops events silently if the buffer is full to avoid blocking the dispatch path. Default buffer size: 64. Use `Events()` to receive.
|
|
|
|
`MultiEmitter` — fans out to a slice of emitters. Failures in one emitter do not stop the others. Thread-safe: emitters can be added with `Add()` at any time.
|
|
|
|
Both `Dispatcher` and `AllowanceService` accept an emitter via `SetEventEmitter()`. A nil emitter is safe — all emit calls are no-ops.
|
|
|
|
---
|
|
|
|
## CLI Backing Functions
|
|
|
|
These functions are intended to be called by `core agent` CLI commands implemented in the `core/cli` repository.
|
|
|
|
`GetStatus(ctx, registry, client, allowanceSvc)` — aggregates `registry.List()`, counts of pending and in-progress tasks from the client, and remaining tokens per agent from the allowance service. Any argument can be nil; those sections are skipped. Returns `*StatusSummary`.
|
|
|
|
`FormatStatus(summary)` — renders a `StatusSummary` as a human-readable table with agent rows sorted alphabetically by ID.
|
|
|
|
`SubmitTask(ctx, client, title, description, labels, priority)` — validates that `title` is non-empty, constructs a `Task` with `StatusPending` and current timestamp, and calls `client.CreateTask`.
|
|
|
|
`StreamLogs(ctx, client, taskID, interval, writer)` — polls `client.GetTask` at the given interval, writing timestamped status lines to `writer`. Stops on `StatusCompleted` or `StatusBlocked`. Transient errors are written to the stream but do not stop polling. Returns `ctx.Err()` on cancellation.
|
|
|
|
---
|
|
|
|
## Prompt Templates
|
|
|
|
`embed.go` embeds all files under `prompts/*.md` at compile time. `Prompt(name)` reads a template by name (without `.md`), trims whitespace, and returns an empty string if the file does not exist. Currently one template is provided: `prompts/commit.md`, used by `TaskCommit` to guide Claude through generating a conventional commit.
|
|
|
|
---
|
|
|
|
## Dependency Graph
|
|
|
|
```
|
|
go-agentic
|
|
forge.lthn.ai/core/go (ServiceRuntime, framework, io, log)
|
|
forge.lthn.ai/core/go-store (SQLite KV — SQLiteStore)
|
|
github.com/redis/go-redis/v9 (RedisStore, RedisRegistry)
|
|
gopkg.in/yaml.v3 (config file parsing)
|
|
modernc.org/sqlite (SQLiteRegistry — direct database/sql)
|
|
github.com/stretchr/testify (tests only)
|
|
```
|
|
|
|
The module uses Go workspace `replace` directives to reference `../go` and `../go-store` during development.
|