go-agentic/docs/architecture.md
Snider 6e8ae2a1ec docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-20 15:02:20 +00:00

16 KiB

go-agentic Architecture

Module: forge.lthn.ai/core/go-agentic

This document describes the internal structure of go-agentic: how components relate, what interfaces they implement, and how data flows through the system.


Overview

go-agentic is an AI service lifecycle and task management library. It provides:

  • A Core framework service that spawns claude subprocess invocations
  • Quota enforcement (allowance management) with three storage backends
  • An agent registry with two persistence backends
  • A task router and dispatcher for multi-agent coordination
  • Event hooks for lifecycle notification
  • CLI-oriented helper functions for status display, task submission, and log streaming

The package has no dependency on go-ai, go-ml, go-rag, or go-mcp. Its only non-standard imports are forge.lthn.ai/core/go (the Core DI framework), forge.lthn.ai/core/go-store (SQLite KV), and github.com/redis/go-redis/v9.


Service Lifecycle

Service wraps framework.ServiceRuntime[ServiceOptions] and integrates with the Core dependency injection container.

type Service struct {
    *framework.ServiceRuntime[ServiceOptions]
}

On startup, OnStartup registers a task handler with the Core runtime:

func (s *Service) OnStartup(ctx context.Context) error {
    s.Core().RegisterTask(s.handleTask)
    return nil
}

The handler dispatches on task type:

  • TaskCommit — runs the embedded prompts/commit.md prompt via claude -p <prompt> --allowedTools Bash,Read,Glob,Grep in the specified directory. If CanEdit is true, Write and Edit are added to the tool list.
  • TaskPrompt — runs an arbitrary prompt. Allowed tools default to ServiceOptions.DefaultTools (Bash, Read, Glob, Grep) but can be overridden per task. Reports progress via Core().Progress() using the task ID.

Both task types spawn claude as a subprocess with os.Stdout/os.Stderr/os.Stdin connected directly, so output streams to the terminal.

ServiceOptions

type ServiceOptions struct {
    DefaultTools []string  // default tool list for TaskPrompt
    AllowEdit    bool      // global permission for Write/Edit tools
}

Default tools are ["Bash", "Read", "Glob", "Grep"]. Write/Edit require explicit opt-in either globally or per task.


Allowance Management

Allowance management enforces per-agent and per-model token quotas. It is composed of three layers:

  1. AllowanceStore — persistence interface
  2. AllowanceService — enforcement logic
  3. Three AllowanceStore implementations — Memory, SQLite, Redis

AllowanceStore Interface

type AllowanceStore interface {
    GetAllowance(agentID string) (*AgentAllowance, error)
    SetAllowance(a *AgentAllowance) error
    GetUsage(agentID string) (*UsageRecord, error)
    IncrementUsage(agentID string, tokens int64, jobs int) error
    DecrementActiveJobs(agentID string) error
    ReturnTokens(agentID string, tokens int64) error
    ResetUsage(agentID string) error
    GetModelQuota(model string) (*ModelQuota, error)
    GetModelUsage(model string) (int64, error)
    IncrementModelUsage(model string, tokens int64) error
}

All three implementations satisfy this interface with identical semantics. The AgentAllowance struct holds per-agent limits; UsageRecord holds current period usage.

MemoryStore

In-memory implementation using sync.RWMutex. State is lost on process restart. Suitable for single-process use, testing, and development. Uses defensive copy semantics: Get and Set return copies of stored structs to prevent aliasing.

SQLiteStore

Persistent single-node implementation using forge.lthn.ai/core/go-store (a SQLite KV abstraction over modernc.org/sqlite). Data is namespaced into four groups: allowances, usage, model_quotas, model_usage. Read-modify-write operations are serialised with a sync.Mutex. WAL mode is enabled for concurrent reads. time.Duration values are serialised as int64 nanoseconds to avoid locale-dependent string parsing.

Default database path: ~/.config/agentic/allowance.db.

RedisStore

Multi-process persistent implementation using github.com/redis/go-redis/v9. Suitable for deployments where multiple agent processes share quota state. Atomic read-modify-write is implemented via Lua scripts registered on the Redis server (EVAL). Key pattern: {prefix}:{type}:{id}. Default prefix: agentic. Pings the server at construction time.

Backend Selection

NewAllowanceStoreFromConfig(cfg AllowanceConfig) is the factory:

type AllowanceConfig struct {
    StoreBackend string  // "memory" | "sqlite" | "redis"
    StorePath    string  // SQLite path (optional)
    RedisAddr    string  // host:port (optional)
}

An empty or unrecognised StoreBackend defaults to "memory".

AllowanceService

AllowanceService wraps an AllowanceStore and enforces quota policy. The check order in Check(agentID, model string) is:

  1. Model allowlist — rejects if agent's ModelAllowlist is non-empty and the model is not in it
  2. Daily token limit — rejects if TokensUsed >= DailyTokenLimit; emits EventQuotaWarning at 80% usage
  3. Daily job limit — rejects if JobsStarted >= DailyJobLimit
  4. Concurrent jobs — rejects if ActiveJobs >= ConcurrentJobs
  5. Global model budget — rejects if model-level TokensUsed >= DailyTokenBudget

When multiple limits are simultaneously exceeded, the first in this order is reported as the reason.

RecordUsage(report UsageReport) handles four quota events:

Event Effect
QuotaEventJobStarted Increments JobsStarted and ActiveJobs by 1
QuotaEventJobCompleted Adds TokensIn + TokensOut to TokensUsed; decrements ActiveJobs; records model usage
QuotaEventJobFailed Records full token usage; returns 50% of tokens; decrements ActiveJobs
QuotaEventJobCancelled Returns 100% of tokens; decrements ActiveJobs; no model usage recorded

Two fields on ModelQuotaHourlyRateLimit and CostCeiling — are stored and round-tripped by all backends but are not currently enforced by AllowanceService.Check. Enforcement requires a sliding-window GetHourlyUsage method on AllowanceStore, which would be a breaking interface change. The fields are annotated as reserved in the source.


Agent Registry

AgentRegistry tracks the set of known agents and their health state.

Interface

type AgentRegistry interface {
    Register(agent AgentInfo) error
    Deregister(id string) error
    Get(id string) (AgentInfo, error)
    List() []AgentInfo
    Heartbeat(id string) error
    Reap(ttl time.Duration) []string
}

Reap transitions agents whose LastHeartbeat is older than ttl to AgentOffline and returns their IDs. Heartbeat transitions an AgentOffline agent back to AgentAvailable.

MemoryRegistry

In-memory implementation with sync.RWMutex. Uses copy-on-read semantics consistent with MemoryStore.

SQLiteRegistry

Persistent implementation backed by a database/sql SQLite connection (using modernc.org/sqlite directly, not via go-store). Schema:

CREATE TABLE agents (
    id             TEXT PRIMARY KEY,
    name           TEXT NOT NULL DEFAULT '',
    capabilities   TEXT NOT NULL DEFAULT '[]',
    status         TEXT NOT NULL DEFAULT 'available',
    last_heartbeat DATETIME NOT NULL DEFAULT (datetime('now')),
    current_load   INTEGER NOT NULL DEFAULT 0,
    max_load       INTEGER NOT NULL DEFAULT 0,
    registered_at  DATETIME NOT NULL DEFAULT (datetime('now'))
)

Register uses INSERT ... ON CONFLICT(id) DO UPDATE (UPSERT). WAL mode and busy_timeout=5000ms are set at open time. Capabilities are stored as a JSON array string.

RedisRegistry

TTL-based implementation where each agent is stored as a JSON value at key {prefix}:agent:{id}. Natural key expiry serves as the reap mechanism. Heartbeat refreshes the key's TTL. Reap performs an explicit SCAN for expired keys as a backup to natural expiry. Uses WithRedisPassword, WithRedisDB, and WithRedisPrefix functional options.

Backend Selection

NewAgentRegistryFromConfig(cfg RegistryConfig) mirrors the allowance factory:

type RegistryConfig struct {
    RegistryBackend   string  // "memory" | "sqlite" | "redis"
    RegistryPath      string  // SQLite path (optional)
    RegistryRedisAddr string  // host:port (optional)
}

Default registry path: ~/.config/agentic/registry.db.


Task Queue and Dispatcher

Task Model

Task is the core data type:

type Task struct {
    ID           string
    Title        string
    Description  string
    Priority     TaskPriority   // critical | high | medium | low
    Status       TaskStatus     // pending | in_progress | completed | blocked | failed
    Labels       []string       // capability requirements for routing
    Files        []string       // relevant files for context building
    Dependencies []string       // task IDs that must complete first
    MaxRetries   int            // 0 uses DefaultMaxRetries (3)
    RetryCount   int            // failed dispatch attempts so far
    LastAttempt  *time.Time     // when the last dispatch attempt occurred
    FailReason   string         // set when StatusFailed
    // ... timestamps, ClaimedBy, Project, etc.
}

TaskRouter

DefaultRouter selects an agent using two strategies:

  • For PriorityCritical tasks: picks the agent with the lowest CurrentLoad, tie-broken alphabetically by agent ID.
  • For all other tasks: scores agents as 1.0 - (CurrentLoad / MaxLoad) and picks the highest scorer. Agents with MaxLoad == 0 score 1.0 (unlimited capacity).

Eligibility filtering runs first: only agents with status AgentAvailable or AgentBusy with remaining capacity are considered. The task's Labels field maps to capability requirements — all labels must be present in the agent's Capabilities slice.

Returns ErrNoEligibleAgent when no agent qualifies.

Dispatcher

Dispatcher combines AgentRegistry, TaskRouter, AllowanceService, and Client:

Dispatch(ctx, task):
  1. registry.List()           — get candidate agents
  2. router.Route(task, agents) — select best agent
  3. allowance.Check(agentID)  — verify quota
  4. client.ClaimTask(task.ID) — mark in-progress via API
  5. allowance.RecordUsage(JobStarted) — record job start
  → emit EventTaskDispatched

Failure paths emit EventDispatchFailedNoAgent or EventDispatchFailedQuota.

DispatchLoop

DispatchLoop(ctx, interval) polls client.ListTasks(pending) at the given interval and dispatches each task. Key behaviours:

  • Tasks are sorted by priority (Critical first) then by CreatedAt (oldest first) before dispatch.
  • Tasks with RetryCount > 0 are skipped if LastAttempt + backoff(RetryCount) > now. Backoff is exponential starting at 5 seconds (5s, 10s, 20s, ...).
  • Tasks that fail dispatch have RetryCount incremented. Tasks reaching MaxRetries (default 3) are updated to StatusFailed via the API (dead-lettered) and emit EventTaskDeadLettered.
  • Transient API errors listing tasks are logged and the loop continues.
  • The loop exits when the context is cancelled and returns ctx.Err().

Context Builder

BuildTaskContext(task, dir) gathers context for AI collaboration:

  1. Reads files explicitly listed in task.Files from the working directory.
  2. Runs git status --porcelain and git log --oneline -10.
  3. Extracts keywords from the task title and description (stop-word filtered, minimum 3 chars, up to 5 keywords) and runs git grep -l -i <keyword> -- *.go *.ts *.js *.py for each.
  4. Reads up to 10 matching source files, truncating files over 5,000 bytes.

TaskContext.FormatContext() renders the collected data as a Markdown string with sections for task metadata, task files, git status, recent commits, and related code — suitable for inclusion in a prompt sent to an AI model.

GatherRelatedFiles(task, dir) is the public sub-function that reads only the explicitly listed files.


CLI Client

Client is an HTTP client for the core-agentic REST API. It sends Bearer token authentication on every request.

Endpoints used:

Method Path Client function
GET /api/tasks ListTasks
GET /api/tasks/{id} GetTask
POST /api/tasks CreateTask
POST /api/tasks/{id}/claim ClaimTask
PATCH /api/tasks/{id} UpdateTask
POST /api/tasks/{id}/complete CompleteTask
GET /api/health Ping

ClaimTask accepts both ClaimResponse{"task": ...} and bare Task JSON to handle varying API implementations. checkResponse parses error bodies as APIError{code, message, details} or falls back to a generic HTTP status message.

Default timeout: 30 seconds. Default base URL: http://localhost:8080 (override with AGENTIC_BASE_URL).

Configuration

LoadConfig(dir) resolves configuration in this order:

  1. .env file in the specified directory (or current directory if dir is empty), reading AGENTIC_BASE_URL, AGENTIC_TOKEN, AGENTIC_PROJECT, AGENTIC_AGENT_ID.
  2. ~/.core/agentic.yaml with the same fields in YAML form.
  3. Environment variable overrides applied on top of whichever file was loaded.

A missing token always returns an error. SaveConfig writes to ~/.core/agentic.yaml, creating the directory if needed.


Event Hooks

All lifecycle events are published through the EventEmitter interface:

type EventEmitter interface {
    Emit(ctx context.Context, event Event) error
}

Event carries a typed string EventType, optional TaskID and AgentID, a Timestamp, and an untyped Payload.

Event Types

Type Emitter Trigger
task_dispatched Dispatcher Successful routing, quota check, and claim
task_claimed Dispatcher API claim call succeeded
dispatch_failed_no_agent Dispatcher ErrNoEligibleAgent from router
dispatch_failed_quota Dispatcher Allowance check denied
task_dead_lettered Dispatcher Task exceeded MaxRetries
quota_warning AllowanceService Agent at 80%+ of daily token limit
quota_exceeded AllowanceService Any limit check fails
usage_recorded AllowanceService JobStarted or JobCompleted events

Implementations

ChannelEmitter — buffered chan Event. Drops events silently if the buffer is full to avoid blocking the dispatch path. Default buffer size: 64. Use Events() to receive.

MultiEmitter — fans out to a slice of emitters. Failures in one emitter do not stop the others. Thread-safe: emitters can be added with Add() at any time.

Both Dispatcher and AllowanceService accept an emitter via SetEventEmitter(). A nil emitter is safe — all emit calls are no-ops.


CLI Backing Functions

These functions are intended to be called by core agent CLI commands implemented in the core/cli repository.

GetStatus(ctx, registry, client, allowanceSvc) — aggregates registry.List(), counts of pending and in-progress tasks from the client, and remaining tokens per agent from the allowance service. Any argument can be nil; those sections are skipped. Returns *StatusSummary.

FormatStatus(summary) — renders a StatusSummary as a human-readable table with agent rows sorted alphabetically by ID.

SubmitTask(ctx, client, title, description, labels, priority) — validates that title is non-empty, constructs a Task with StatusPending and current timestamp, and calls client.CreateTask.

StreamLogs(ctx, client, taskID, interval, writer) — polls client.GetTask at the given interval, writing timestamped status lines to writer. Stops on StatusCompleted or StatusBlocked. Transient errors are written to the stream but do not stop polling. Returns ctx.Err() on cancellation.


Prompt Templates

embed.go embeds all files under prompts/*.md at compile time. Prompt(name) reads a template by name (without .md), trims whitespace, and returns an empty string if the file does not exist. Currently one template is provided: prompts/commit.md, used by TaskCommit to guide Claude through generating a conventional commit.


Dependency Graph

go-agentic
    forge.lthn.ai/core/go         (ServiceRuntime, framework, io, log)
    forge.lthn.ai/core/go-store   (SQLite KV — SQLiteStore)
    github.com/redis/go-redis/v9  (RedisStore, RedisRegistry)
    gopkg.in/yaml.v3              (config file parsing)
    modernc.org/sqlite            (SQLiteRegistry — direct database/sql)
    github.com/stretchr/testify   (tests only)

The module uses Go workspace replace directives to reference ../go and ../go-store during development.