Replace internal task tracking (TODO.md, FINDINGS.md) with structured documentation in docs/. Trim CLAUDE.md to agent instructions only. Co-Authored-By: Virgil <virgil@lethean.io>
173 lines
10 KiB
Markdown
173 lines
10 KiB
Markdown
# go-agentic Project History
|
|
|
|
Module: `forge.lthn.ai/core/go-agentic`
|
|
|
|
---
|
|
|
|
## Origin: Extraction from go-ai
|
|
|
|
**Date**: 19 February 2026
|
|
**Commit**: `68c108f feat: extract go-agentic from go-ai as standalone service package`
|
|
|
|
The package was extracted from `forge.lthn.ai/core/go-ai/agentic/`. The agentic subdirectory in go-ai imported only `forge.lthn.ai/core/go`, `gopkg.in/yaml.v3`, and the standard library — no coupling to `go-ai/ml`, `go-ai/rag`, `go-ai/mcp`, or any other subpackage. This made it the cleanest extraction candidate in the go-ai monolith.
|
|
|
|
What was extracted at the split point:
|
|
|
|
- 14 Go source files (~1,968 lines, excluding tests)
|
|
- 5 test files covering allowance, client, completion, config, and context
|
|
- 1 embedded prompt template (`prompts/commit.md`)
|
|
|
|
After extraction, a `go.mod` replace directive was corrected from `../core` to `../go` to match the actual sibling directory name (`af110be`).
|
|
|
|
---
|
|
|
|
## Phase 1: Test Coverage
|
|
|
|
**Commit**: `23aa635 test: achieve 85.6% coverage with 7 new test files`
|
|
**Operator**: Charon
|
|
|
|
Coverage improved from 70.1% to 85.6% with 7 new test files and over 130 tests:
|
|
|
|
| New file | Purpose |
|
|
|---|---|
|
|
| `lifecycle_test.go` | Full claim -> process -> complete integration; fail/cancel flows; concurrent agents |
|
|
| `allowance_edge_test.go` | Boundary: exact limit, one-over, zero allowance, warning threshold |
|
|
| `allowance_error_test.go` | Mock `errorStore` to exercise all error paths in RecordUsage/Check/ResetAgent |
|
|
| `embed_test.go` | Prompt() hit/miss and whitespace trimming |
|
|
| `service_test.go` | DefaultServiceOptions, TaskPrompt Set/GetTaskID, TaskCommit fields |
|
|
| `completion_git_test.go` | AutoCommit, CreateBranch, CommitAndSync, GetDiff using real git repositories |
|
|
| `context_git_test.go` | findRelatedCode in git repos: keyword search, 10-file cap, truncation |
|
|
|
|
**Discovery**: `MemoryStore` correctly uses defensive copies on `Set`/`Get`. Mutations to a struct after `SetAllowance` do not affect the stored data.
|
|
|
|
**Discovery**: `AllowanceService.Check` enforces limits in a fixed priority order: model allowlist -> daily tokens -> daily jobs -> concurrent jobs -> global model budget. When multiple limits are exceeded simultaneously, the first in this order is reported.
|
|
|
|
A follow-up commit (`5d02695`) pushed coverage further to 96.5%.
|
|
|
|
---
|
|
|
|
## Phase 2: Allowance Persistence
|
|
|
|
**Commit**: `3e43233 feat: Phase 2 — SQLite AllowanceStore backend + config wiring`
|
|
**Commit**: `0be744e feat(allowance): add Redis backend for AllowanceStore`
|
|
|
|
`MemoryStore` lost all state on process restart. Two persistent backends were added:
|
|
|
|
- `SQLiteStore` — single-node persistence via `forge.lthn.ai/core/go-store` (SQLite KV). Read-modify-write operations are serialised with `sync.Mutex`. `time.Duration` is stored as int64 nanoseconds to avoid locale-dependent string parsing.
|
|
- `RedisStore` — multi-process persistence via `github.com/redis/go-redis/v9`. Atomic increment/decrement operations use Lua scripts (`EVAL`) to avoid TOCTOU races.
|
|
|
|
`AllowanceConfig` and `NewAllowanceStoreFromConfig` were added to `config.go` as the backend selection factory.
|
|
|
|
---
|
|
|
|
## Phase 3: Multi-Agent Coordination
|
|
|
|
**Commit**: `646cc02 feat(coordination): add agent registry, task router, and dispatcher`
|
|
|
|
Three new files introduced the multi-agent layer:
|
|
|
|
- `registry.go` — `AgentInfo` struct, `AgentRegistry` interface, `MemoryRegistry` implementation. `Reap(ttl)` marks stale agents offline and returns their IDs.
|
|
- `router.go` — `TaskRouter` interface, `DefaultRouter` implementation. Capability matching (task labels must be a subset of agent capabilities), load-based scoring (1 - load/max), least-loaded selection for critical tasks. `ErrNoEligibleAgent` sentinel.
|
|
- `dispatcher.go` — `Dispatcher` combining registry, router, allowance service, and API client. `Dispatch` executes the five-step pipeline. `DispatchLoop` polls for pending tasks on a ticker.
|
|
|
|
---
|
|
|
|
## Phase 4: CLI Backing Functions
|
|
|
|
**Commit**: `ef81db7 feat(cli): add status summary, task submission, and log streaming`
|
|
|
|
Three files added to serve the `core agent` CLI commands (implemented separately in `core/cli`):
|
|
|
|
- `status.go` — `StatusSummary`, `GetStatus`, `FormatStatus`. Aggregates registry, task counts, and allowance remaining. All components are optional (nil-safe).
|
|
- `submit.go` — `SubmitTask`. Validates title, sets `StatusPending` and `CreatedAt`, delegates to `client.CreateTask`.
|
|
- `logs.go` — `StreamLogs`. Polls `GetTask` at an interval, writes timestamped status lines to an `io.Writer`, stops on terminal states.
|
|
- `client.go` gained `CreateTask` (POST /api/tasks).
|
|
|
|
---
|
|
|
|
## Phase 5: Persistent Agent Registry
|
|
|
|
**Commit**: `ce502c0 feat(registry): Phase 5 — persistent agent registry (SQLite + Redis + config factory)`
|
|
|
|
`MemoryRegistry` lost all agent registrations on restart. The same persistence pattern from Phase 2 was applied to the registry:
|
|
|
|
- `registry_sqlite.go` — `SQLiteRegistry` using `database/sql` with `modernc.org/sqlite` directly. Schema: `agents` table with UPSERT on Register, WAL mode, `busy_timeout=5000ms`.
|
|
- `registry_redis.go` — `RedisRegistry` with TTL-based natural expiry serving as the reap mechanism. SCAN-based `Reap` as a backup.
|
|
- `RegistryConfig` and `NewAgentRegistryFromConfig` added to `config.go`.
|
|
|
|
---
|
|
|
|
## Phase 6: Dead Code Cleanup
|
|
|
|
**Commit**: `779132a fix(config): change DefaultBaseURL to localhost, annotate reserved fields`
|
|
|
|
Two issues addressed:
|
|
|
|
- `DefaultBaseURL` was `api.core-agentic.dev` (a non-existent host). Changed to `http://localhost:8080`. Production deployments must set `AGENTIC_BASE_URL`.
|
|
- `HourlyRateLimit` and `CostCeiling` on `ModelQuota` were stored but never enforced in `AllowanceService.Check`. Enforcement would require `AllowanceStore.GetHourlyUsage` (a sliding window query), which would be a breaking interface change. The fields are retained and annotated as reserved. All three backends correctly store and round-trip both values.
|
|
|
|
---
|
|
|
|
## Phase 7: Priority-Ordered Dispatch and Retry
|
|
|
|
**Commit**: `ba8c19d feat(dispatch): Phase 7 — priority-ordered dispatch with retry backoff and dead-letter`
|
|
|
|
`DispatchLoop` previously dispatched tasks in arbitrary API order with no retry handling. Two improvements:
|
|
|
|
- **Priority sorting**: tasks are sorted by `priorityRank` (Critical=0, High=1, Medium=2, Low=3) then by `CreatedAt` ascending (oldest first for tasks of equal priority). `sort.SliceStable` is used for determinism.
|
|
- **Exponential backoff and dead-letter**: tasks with `RetryCount > 0` are skipped until `LastAttempt + backoffDuration(RetryCount) > now`. Backoff starts at 5 seconds and doubles per retry (5s, 10s, 20s, ...). Tasks reaching `MaxRetries` (default 3) are updated to `StatusFailed` via `client.UpdateTask` and the failure reason is set to `"max retries exceeded"`. `MaxRetries` and `RetryCount` fields were added to the `Task` type.
|
|
|
|
---
|
|
|
|
## Phase 8: Event Hooks
|
|
|
|
**Commit**: `a29ded5 feat(events): Phase 8 — event hooks for task lifecycle and quota notifications`
|
|
|
|
Production orchestration required external notification of lifecycle transitions. Three new constructs:
|
|
|
|
- `events.go` — `Event` struct, `EventType` (8 constants), `EventEmitter` interface, `ChannelEmitter` (buffered channel, drops on overflow), `MultiEmitter` (fan-out, failure-tolerant).
|
|
- `Dispatcher.SetEventEmitter` and `emit` helper — emits `task_dispatched`, `task_claimed`, `dispatch_failed_no_agent`, `dispatch_failed_quota`, `task_dead_lettered`.
|
|
- `AllowanceService.SetEventEmitter` and `emitEvent` helper — emits `quota_warning` (at 80% of daily token limit), `quota_exceeded` (five distinct check paths), `usage_recorded` (on job started and completed).
|
|
|
|
Both `SetEventEmitter` callers are nil-safe: emission is always a no-op when no emitter is set.
|
|
|
|
12 integration tests verify all emission points in `events_integration_test.go`.
|
|
|
|
---
|
|
|
|
## Known Limitations
|
|
|
|
The following limitations were documented during development and have not yet been addressed.
|
|
|
|
### service.go Coverage Gap
|
|
|
|
`NewService`, `OnStartup`, `handleTask`, `doCommit`, and `doPrompt` are at 0% test coverage. These functions require a full `framework.Core` DI container and spawn a `claude` subprocess. A mock subprocess approach (`exec.Command` with a test binary) is possible but was deferred to a later phase. A minimal mock binary approach was explored in `service_test.go` (commit `9636cdb`) for `HandleTask` tests.
|
|
|
|
### completion.go CreatePR
|
|
|
|
`CreatePR` calls `gh pr create` as a subprocess. It is at 14.3% coverage. Testing requires `gh` installed and authenticated against a real or stub GitHub API. Not suitable for unit tests.
|
|
|
|
### HourlyRateLimit and CostCeiling Not Enforced
|
|
|
|
Both fields on `ModelQuota` are stored correctly by all three backends but are never checked in `AllowanceService.Check`. Enforcement would require a new `AllowanceStore.GetHourlyUsage(agentID string, since time.Time) (int64, error)` method — a breaking interface change that would require updates to all three implementations plus their tests. This was deferred indefinitely.
|
|
|
|
### DispatchLoop Task State Not Re-fetched
|
|
|
|
After a failed dispatch that increments `RetryCount` and sets `LastAttempt`, the loop modifies the local copy of the task in the `tasks` slice. It does not call `client.UpdateTask` to persist `RetryCount`/`LastAttempt`. If the process restarts, those values are lost and the task will be dispatched again from a zero retry count. Fixing this requires either persisting retry state via the API or a separate task state store.
|
|
|
|
### No Heartbeat Loop
|
|
|
|
`AgentRegistry.Heartbeat` and `Reap` exist but there is no built-in background goroutine to call them. Callers must schedule heartbeats and reaping externally.
|
|
|
|
---
|
|
|
|
## Future Considerations
|
|
|
|
Items identified during development but not yet scoped:
|
|
|
|
- **Interface extraction for `service.go`**: Extract the subprocess invocation into a `ClaudeRunner` interface to enable unit testing without a real `claude` binary.
|
|
- **`AllowanceStore.GetHourlyUsage`**: Required to enforce `HourlyRateLimit`. Would enable per-hour rate limiting across all three backends.
|
|
- **Persist retry state via API**: `DispatchLoop` should call `client.UpdateTask` to persist `RetryCount` and `LastAttempt` so that retries survive process restarts.
|
|
- **Built-in heartbeat loop**: A `StartHeartbeat(ctx, registry, agentID, interval)` helper to call `Heartbeat` on a ticker and `Reap` on a longer interval.
|
|
- **WebSocket event emitter**: A `WebSocketEmitter` backed by `forge.lthn.ai/core/go/pkg/ws` for real-time event streaming to external consumers.
|
|
- **Allowance daily reset scheduler**: A background goroutine that calls `AllowanceService.ResetAgent` at midnight UTC for each registered agent.
|