go-agentic/docs/architecture.md

# go-agentic Architecture

Module: `forge.lthn.ai/core/go-agentic`

This document describes the internal structure of go-agentic: how components relate, what interfaces they implement, and how data flows through the system.

---

## Overview

go-agentic is an AI service lifecycle and task management library. It provides:

- A Core framework service that spawns `claude` subprocess invocations
- Quota enforcement (allowance management) with three storage backends
- An agent registry with two persistence backends
- A task router and dispatcher for multi-agent coordination
- Event hooks for lifecycle notification
- CLI-oriented helper functions for status display, task submission, and log streaming

The package has no dependency on go-ai, go-ml, go-rag, or go-mcp. Its only non-standard imports are `forge.lthn.ai/core/go` (the Core DI framework), `forge.lthn.ai/core/go-store` (SQLite KV), and `github.com/redis/go-redis/v9`.

---

## Service Lifecycle

`Service` wraps `framework.ServiceRuntime[ServiceOptions]` and integrates with the Core dependency injection container.

```go
type Service struct {
    *framework.ServiceRuntime[ServiceOptions]
}
```

On startup, `OnStartup` registers a task handler with the Core runtime:

```go
func (s *Service) OnStartup(ctx context.Context) error {
    s.Core().RegisterTask(s.handleTask)
    return nil
}
```

The handler dispatches on task type:

- `TaskCommit` — runs the embedded `prompts/commit.md` prompt via `claude -p <prompt> --allowedTools Bash,Read,Glob,Grep` in the specified directory. If `CanEdit` is true, `Write` and `Edit` are added to the tool list.
- `TaskPrompt` — runs an arbitrary prompt. Allowed tools default to `ServiceOptions.DefaultTools` (Bash, Read, Glob, Grep) but can be overridden per task. Reports progress via `Core().Progress()` using the task ID.

Both task types spawn `claude` as a subprocess with `os.Stdout`/`os.Stderr`/`os.Stdin` connected directly, so output streams to the terminal.

### ServiceOptions

```go
type ServiceOptions struct {
    DefaultTools []string  // default tool list for TaskPrompt
    AllowEdit    bool      // global permission for Write/Edit tools
}
```

Default tools are `["Bash", "Read", "Glob", "Grep"]`. Write/Edit require explicit opt-in either globally or per task.

---

## Allowance Management

Allowance management enforces per-agent and per-model token quotas. It is composed of three layers:

1. `AllowanceStore` — persistence interface
2. `AllowanceService` — enforcement logic
3. Three `AllowanceStore` implementations — Memory, SQLite, Redis

### AllowanceStore Interface

```go
type AllowanceStore interface {
    GetAllowance(agentID string) (*AgentAllowance, error)
    SetAllowance(a *AgentAllowance) error
    GetUsage(agentID string) (*UsageRecord, error)
    IncrementUsage(agentID string, tokens int64, jobs int) error
    DecrementActiveJobs(agentID string) error
    ReturnTokens(agentID string, tokens int64) error
    ResetUsage(agentID string) error
    GetModelQuota(model string) (*ModelQuota, error)
    GetModelUsage(model string) (int64, error)
    IncrementModelUsage(model string, tokens int64) error
}
```

All three implementations satisfy this interface with identical semantics. The `AgentAllowance` struct holds per-agent limits; `UsageRecord` holds current period usage.

### MemoryStore

In-memory implementation using `sync.RWMutex`. State is lost on process restart. Suitable for single-process use, testing, and development. Uses defensive copy semantics: `Get` and `Set` return copies of stored structs to prevent aliasing.

### SQLiteStore

Persistent single-node implementation using `forge.lthn.ai/core/go-store` (a SQLite KV abstraction over `modernc.org/sqlite`). Data is namespaced into four groups: `allowances`, `usage`, `model_quotas`, `model_usage`. Read-modify-write operations are serialised with a `sync.Mutex`. WAL mode is enabled for concurrent reads. `time.Duration` values are serialised as int64 nanoseconds to avoid locale-dependent string parsing.

Default database path: `~/.config/agentic/allowance.db`.

### RedisStore

Multi-process persistent implementation using `github.com/redis/go-redis/v9`. Suitable for deployments where multiple agent processes share quota state. Atomic read-modify-write is implemented via Lua scripts registered on the Redis server (`EVAL`). Key pattern: `{prefix}:{type}:{id}`. Default prefix: `agentic`. Pings the server at construction time.

### Backend Selection

`NewAllowanceStoreFromConfig(cfg AllowanceConfig)` is the factory:

```go
type AllowanceConfig struct {
    StoreBackend string  // "memory" | "sqlite" | "redis"
    StorePath    string  // SQLite path (optional)
    RedisAddr    string  // host:port (optional)
}
```

An empty or unrecognised `StoreBackend` defaults to `"memory"`.

### AllowanceService

`AllowanceService` wraps an `AllowanceStore` and enforces quota policy. The check order in `Check(agentID, model string)` is:

1. Model allowlist — rejects if agent's `ModelAllowlist` is non-empty and the model is not in it
2. Daily token limit — rejects if `TokensUsed >= DailyTokenLimit`; emits `EventQuotaWarning` at 80% usage
3. Daily job limit — rejects if `JobsStarted >= DailyJobLimit`
4. Concurrent jobs — rejects if `ActiveJobs >= ConcurrentJobs`
5. Global model budget — rejects if model-level `TokensUsed >= DailyTokenBudget`

When multiple limits are simultaneously exceeded, the first in this order is reported as the reason.

`RecordUsage(report UsageReport)` handles four quota events:

| Event | Effect |
|---|---|
| `QuotaEventJobStarted` | Increments `JobsStarted` and `ActiveJobs` by 1 |
| `QuotaEventJobCompleted` | Adds `TokensIn + TokensOut` to `TokensUsed`; decrements `ActiveJobs`; records model usage |
| `QuotaEventJobFailed` | Records full token usage; returns 50% of tokens; decrements `ActiveJobs` |
| `QuotaEventJobCancelled` | Returns 100% of tokens; decrements `ActiveJobs`; no model usage recorded |

Two fields on `ModelQuota` — `HourlyRateLimit` and `CostCeiling` — are stored and round-tripped by all backends but are not currently enforced by `AllowanceService.Check`. Enforcement requires a sliding-window `GetHourlyUsage` method on `AllowanceStore`, which would be a breaking interface change. The fields are annotated as reserved in the source.

---

## Agent Registry

`AgentRegistry` tracks the set of known agents and their health state.

### Interface

```go
type AgentRegistry interface {
    Register(agent AgentInfo) error
    Deregister(id string) error
    Get(id string) (AgentInfo, error)
    List() []AgentInfo
    Heartbeat(id string) error
    Reap(ttl time.Duration) []string
}
```

`Reap` transitions agents whose `LastHeartbeat` is older than `ttl` to `AgentOffline` and returns their IDs. `Heartbeat` transitions an `AgentOffline` agent back to `AgentAvailable`.

### MemoryRegistry

In-memory implementation with `sync.RWMutex`. Uses copy-on-read semantics consistent with `MemoryStore`.

### SQLiteRegistry

Persistent implementation backed by a `database/sql` SQLite connection (using `modernc.org/sqlite` directly, not via go-store). Schema:

```sql
CREATE TABLE agents (
    id             TEXT PRIMARY KEY,
    name           TEXT NOT NULL DEFAULT '',
    capabilities   TEXT NOT NULL DEFAULT '[]',
    status         TEXT NOT NULL DEFAULT 'available',
    last_heartbeat DATETIME NOT NULL DEFAULT (datetime('now')),
    current_load   INTEGER NOT NULL DEFAULT 0,
    max_load       INTEGER NOT NULL DEFAULT 0,
    registered_at  DATETIME NOT NULL DEFAULT (datetime('now'))
)
```

`Register` uses `INSERT ... ON CONFLICT(id) DO UPDATE` (UPSERT). WAL mode and `busy_timeout=5000ms` are set at open time. Capabilities are stored as a JSON array string.

### RedisRegistry

TTL-based implementation where each agent is stored as a JSON value at key `{prefix}:agent:{id}`. Natural key expiry serves as the reap mechanism. `Heartbeat` refreshes the key's TTL. `Reap` performs an explicit SCAN for expired keys as a backup to natural expiry. Uses `WithRedisPassword`, `WithRedisDB`, and `WithRedisPrefix` functional options.

### Backend Selection

`NewAgentRegistryFromConfig(cfg RegistryConfig)` mirrors the allowance factory:

```go
type RegistryConfig struct {
    RegistryBackend   string  // "memory" | "sqlite" | "redis"
    RegistryPath      string  // SQLite path (optional)
    RegistryRedisAddr string  // host:port (optional)
}
```

Default registry path: `~/.config/agentic/registry.db`.

---

## Task Queue and Dispatcher

### Task Model

`Task` is the core data type:

```go
type Task struct {
    ID           string
    Title        string
    Description  string
    Priority     TaskPriority   // critical | high | medium | low
    Status       TaskStatus     // pending | in_progress | completed | blocked | failed
    Labels       []string       // capability requirements for routing
    Files        []string       // relevant files for context building
    Dependencies []string       // task IDs that must complete first
    MaxRetries   int            // 0 uses DefaultMaxRetries (3)
    RetryCount   int            // failed dispatch attempts so far
    LastAttempt  *time.Time     // when the last dispatch attempt occurred
    FailReason   string         // set when StatusFailed
    // ... timestamps, ClaimedBy, Project, etc.
}
```

### TaskRouter

`DefaultRouter` selects an agent using two strategies:

- For `PriorityCritical` tasks: picks the agent with the lowest `CurrentLoad`, tie-broken alphabetically by agent ID.
- For all other tasks: scores agents as `1.0 - (CurrentLoad / MaxLoad)` and picks the highest scorer. Agents with `MaxLoad == 0` score 1.0 (unlimited capacity).

Eligibility filtering runs first: only agents with status `AgentAvailable` or `AgentBusy` with remaining capacity are considered. The task's `Labels` field maps to capability requirements — all labels must be present in the agent's `Capabilities` slice.

Returns `ErrNoEligibleAgent` when no agent qualifies.

### Dispatcher

`Dispatcher` combines `AgentRegistry`, `TaskRouter`, `AllowanceService`, and `Client`:

```
Dispatch(ctx, task):
  1. registry.List()           — get candidate agents
  2. router.Route(task, agents) — select best agent
  3. allowance.Check(agentID)  — verify quota
  4. client.ClaimTask(task.ID) — mark in-progress via API
  5. allowance.RecordUsage(JobStarted) — record job start
  → emit EventTaskDispatched
```

Failure paths emit `EventDispatchFailedNoAgent` or `EventDispatchFailedQuota`.

### DispatchLoop

`DispatchLoop(ctx, interval)` polls `client.ListTasks(pending)` at the given interval and dispatches each task. Key behaviours:

- Tasks are sorted by priority (Critical first) then by `CreatedAt` (oldest first) before dispatch.
- Tasks with `RetryCount > 0` are skipped if `LastAttempt + backoff(RetryCount) > now`. Backoff is exponential starting at 5 seconds (5s, 10s, 20s, ...).
- Tasks that fail dispatch have `RetryCount` incremented. Tasks reaching `MaxRetries` (default 3) are updated to `StatusFailed` via the API (dead-lettered) and emit `EventTaskDeadLettered`.
- Transient API errors listing tasks are logged and the loop continues.
- The loop exits when the context is cancelled and returns `ctx.Err()`.

---

## Context Builder

`BuildTaskContext(task, dir)` gathers context for AI collaboration:

1. Reads files explicitly listed in `task.Files` from the working directory.
2. Runs `git status --porcelain` and `git log --oneline -10`.
3. Extracts keywords from the task title and description (stop-word filtered, minimum 3 chars, up to 5 keywords) and runs `git grep -l -i <keyword> -- *.go *.ts *.js *.py` for each.
4. Reads up to 10 matching source files, truncating files over 5,000 bytes.

`TaskContext.FormatContext()` renders the collected data as a Markdown string with sections for task metadata, task files, git status, recent commits, and related code — suitable for inclusion in a prompt sent to an AI model.

`GatherRelatedFiles(task, dir)` is the public sub-function that reads only the explicitly listed files.

---

## CLI Client

`Client` is an HTTP client for the core-agentic REST API. It sends `Bearer` token authentication on every request.

Endpoints used:

| Method | Path | Client function |
|---|---|---|
| GET | /api/tasks | ListTasks |
| GET | /api/tasks/{id} | GetTask |
| POST | /api/tasks | CreateTask |
| POST | /api/tasks/{id}/claim | ClaimTask |
| PATCH | /api/tasks/{id} | UpdateTask |
| POST | /api/tasks/{id}/complete | CompleteTask |
| GET | /api/health | Ping |

`ClaimTask` accepts both `ClaimResponse{"task": ...}` and bare `Task` JSON to handle varying API implementations. `checkResponse` parses error bodies as `APIError{code, message, details}` or falls back to a generic HTTP status message.

Default timeout: 30 seconds. Default base URL: `http://localhost:8080` (override with `AGENTIC_BASE_URL`).

### Configuration

`LoadConfig(dir)` resolves configuration in this order:

1. `.env` file in the specified directory (or current directory if `dir` is empty), reading `AGENTIC_BASE_URL`, `AGENTIC_TOKEN`, `AGENTIC_PROJECT`, `AGENTIC_AGENT_ID`.
2. `~/.core/agentic.yaml` with the same fields in YAML form.
3. Environment variable overrides applied on top of whichever file was loaded.

A missing token always returns an error. `SaveConfig` writes to `~/.core/agentic.yaml`, creating the directory if needed.

---

## Event Hooks

All lifecycle events are published through the `EventEmitter` interface:

```go
type EventEmitter interface {
    Emit(ctx context.Context, event Event) error
}
```

`Event` carries a typed string `EventType`, optional `TaskID` and `AgentID`, a `Timestamp`, and an untyped `Payload`.

### Event Types

| Type | Emitter | Trigger |
|---|---|---|
| `task_dispatched` | Dispatcher | Successful routing, quota check, and claim |
| `task_claimed` | Dispatcher | API claim call succeeded |
| `dispatch_failed_no_agent` | Dispatcher | `ErrNoEligibleAgent` from router |
| `dispatch_failed_quota` | Dispatcher | Allowance check denied |
| `task_dead_lettered` | Dispatcher | Task exceeded `MaxRetries` |
| `quota_warning` | AllowanceService | Agent at 80%+ of daily token limit |
| `quota_exceeded` | AllowanceService | Any limit check fails |
| `usage_recorded` | AllowanceService | `JobStarted` or `JobCompleted` events |

### Implementations

`ChannelEmitter` — buffered `chan Event`. Drops events silently if the buffer is full to avoid blocking the dispatch path. Default buffer size: 64. Use `Events()` to receive.

`MultiEmitter` — fans out to a slice of emitters. Failures in one emitter do not stop the others. Thread-safe: emitters can be added with `Add()` at any time.

Both `Dispatcher` and `AllowanceService` accept an emitter via `SetEventEmitter()`. A nil emitter is safe — all emit calls are no-ops.

---

## CLI Backing Functions

These functions are intended to be called by `core agent` CLI commands implemented in the `core/cli` repository.

`GetStatus(ctx, registry, client, allowanceSvc)` — aggregates `registry.List()`, counts of pending and in-progress tasks from the client, and remaining tokens per agent from the allowance service. Any argument can be nil; those sections are skipped. Returns `*StatusSummary`.

`FormatStatus(summary)` — renders a `StatusSummary` as a human-readable table with agent rows sorted alphabetically by ID.

`SubmitTask(ctx, client, title, description, labels, priority)` — validates that `title` is non-empty, constructs a `Task` with `StatusPending` and current timestamp, and calls `client.CreateTask`.

`StreamLogs(ctx, client, taskID, interval, writer)` — polls `client.GetTask` at the given interval, writing timestamped status lines to `writer`. Stops on `StatusCompleted` or `StatusBlocked`. Transient errors are written to the stream but do not stop polling. Returns `ctx.Err()` on cancellation.

---

## Prompt Templates

`embed.go` embeds all files under `prompts/*.md` at compile time. `Prompt(name)` reads a template by name (without `.md`), trims whitespace, and returns an empty string if the file does not exist. Currently one template is provided: `prompts/commit.md`, used by `TaskCommit` to guide Claude through generating a conventional commit.

---

## Dependency Graph

```
go-agentic
    forge.lthn.ai/core/go         (ServiceRuntime, framework, io, log)
    forge.lthn.ai/core/go-store   (SQLite KV — SQLiteStore)
    github.com/redis/go-redis/v9  (RedisStore, RedisRegistry)
    gopkg.in/yaml.v3              (config file parsing)
    modernc.org/sqlite            (SQLiteRegistry — direct database/sql)
    github.com/stretchr/testify   (tests only)
```

The module uses Go workspace `replace` directives to reference `../go` and `../go-store` during development.