docs: add Phases 5-8 to TODO — registry persistence, rate enforcement, retry, events

Phase 5: SQLite + Redis AgentRegistry (mirrors AllowanceStore pattern)
Phase 6: Enforce or remove dead HourlyRateLimit/CostCeiling fields
Phase 7: Priority-ordered dispatch with retry backoff and dead-letter
Phase 8: EventEmitter interface for task lifecycle notifications

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-02-20 11:35:14 +00:00
parent 092f19e7a0
commit e86cb1c6a3

77
TODO.md
View file

@ -60,6 +60,83 @@ Phase 4 provides the data-fetching and formatting functions that `core agent` CL
- [x] **Create `logs.go`**`StreamLogs(ctx, client, taskID, writer) error` — polls task updates and writes progress to io.Writer
- [x] **Tests** — mock client with progress updates, context cancellation
## Phase 5: Persistent Agent Registry
The `AgentRegistry` interface only has `MemoryRegistry` — a restart drops all agent registrations. This mirrors the AllowanceStore pattern: memory → SQLite → Redis.
### 5.1 SQLite Registry
- [ ] **Create `registry_sqlite.go`**`SQLiteRegistry` implementing `AgentRegistry` interface
- [ ] Schema: `agents` table (id TEXT PK, name TEXT, capabilities TEXT JSON, status INT, last_heartbeat DATETIME, current_load INT, max_load INT, registered_at DATETIME)
- [ ] Use `modernc.org/sqlite` (already a transitive dep via go-store) with WAL mode
- [ ] `Register` → UPSERT, `Deregister` → DELETE, `Get` → SELECT, `List` → SELECT all, `Heartbeat` → UPDATE last_heartbeat, `Reap(ttl)` → UPDATE status=Offline WHERE last_heartbeat < now-ttl RETURNING id
- [ ] **Tests** — full parity with `registry_test.go` using `:memory:` SQLite, concurrent access under `-race`
### 5.2 Redis Registry
- [ ] **Create `registry_redis.go`**`RedisRegistry` implementing `AgentRegistry` with TTL-based reaping
- [ ] Key pattern: `{prefix}:agent:{id}` → JSON AgentInfo, with TTL = heartbeat interval * 3
- [ ] `Heartbeat` → re-SET with TTL refresh (natural expiry = auto-reap)
- [ ] `List` → SCAN `{prefix}:agent:*`, `Reap` → explicit scan for expired (backup to natural TTL)
- [ ] **Tests** — skip-if-no-Redis pattern, unique prefix per test
### 5.3 Config Factory
- [ ] **Add `RegistryConfig`** to `config.go``RegistryBackend string` (memory/sqlite/redis), `RegistryPath string`, `RegistryRedisAddr string`
- [ ] **`NewAgentRegistryFromConfig(cfg) (AgentRegistry, error)`** — factory mirroring `NewAllowanceStoreFromConfig`
- [ ] **Tests** — all backends, unknown backend error
## Phase 6: Dead Code Cleanup + Rate Enforcement
`HourlyRateLimit` and `CostCeiling` on `ModelQuota` are stored but never enforced in `AllowanceService.Check`. Either implement or remove.
### 6.1 Enforce or Remove Dead Fields
- [ ] **Audit `HourlyRateLimit` and `CostCeiling`** in `allowance_service.go` — these fields exist in `ModelQuota` but `Check()` never evaluates them
- [ ] If keeping: add hourly sliding window check in `AllowanceService.Check` before daily limit. Add `CostCeiling` as cumulative spend cap. Add `HourlyUsage` tracking to `AllowanceStore` interface (new method: `GetHourlyUsage(agentID, model string) (int64, error)`)
- [ ] If removing: delete fields from `ModelQuota`, update tests, document decision in FINDINGS.md
- [ ] **Tests** — hourly rate exceeded, cost ceiling exceeded, both combined with existing daily limits
### 6.2 Fix DefaultBaseURL
- [ ] **`DefaultBaseURL`** in `config.go` points to `api.core-agentic.dev` which doesn't exist. Change to empty string (require explicit config) or `http://localhost:8080` for local dev
- [ ] **Test** — verify default config doesn't silently fail
## Phase 7: Priority-Ordered Dispatch + Retry
`DispatchLoop` dispatches tasks in arbitrary order with no retry backoff.
### 7.1 Priority Sorting
- [ ] **Sort pending tasks** in `DispatchLoop` by `Priority` (Critical > High > Normal > Low) before dispatching
- [ ] **Tie-break** by `CreatedAt` (oldest first within same priority)
- [ ] **Tests** — 5 tasks with mixed priorities dispatched in correct order
### 7.2 Retry Backoff
- [ ] **Add `MaxRetries` and `RetryCount` fields** to `Task` type
- [ ] **Exponential backoff** in `DispatchLoop` — skip tasks where `RetryCount > 0` and `LastAttempt + backoff(RetryCount) > now`
- [ ] **Dead-letter** — tasks exceeding `MaxRetries` (default 3) get status `TaskFailed` with reason "max retries exceeded"
- [ ] **Tests** — retry delay respected, dead-letter after max retries, backoff calculation
## Phase 8: Event Hooks
Production orchestration needs event notifications for task lifecycle transitions.
### 8.1 EventEmitter Interface
- [ ] **Create `events.go`**`Event` struct (Type string, TaskID string, AgentID string, Timestamp time.Time, Payload any)
- [ ] **`EventEmitter` interface** — `Emit(ctx context.Context, event Event) error`
- [ ] **`ChannelEmitter`** — in-process `chan Event` for local subscribers (buffered, non-blocking)
- [ ] **`MultiEmitter`** — fans out to multiple emitters
- [ ] **Tests** — emit and receive, buffer overflow drops, multi-emitter fan-out
### 8.2 Dispatcher Integration
- [ ] **Wire `EventEmitter`** into `Dispatcher` — emit on: task_dispatched, task_claimed, dispatch_failed (no agent), dispatch_failed (quota)
- [ ] **Wire into `AllowanceService`** — emit on: quota_warning (80%), quota_exceeded, usage_recorded
- [ ] **Tests** — verify events emitted at correct lifecycle points
---
## Workflow