diff --git a/TODO.md b/TODO.md index 3a59e05..5d5b36a 100644 --- a/TODO.md +++ b/TODO.md @@ -60,6 +60,83 @@ Phase 4 provides the data-fetching and formatting functions that `core agent` CL - [x] **Create `logs.go`** — `StreamLogs(ctx, client, taskID, writer) error` — polls task updates and writes progress to io.Writer - [x] **Tests** — mock client with progress updates, context cancellation +## Phase 5: Persistent Agent Registry + +The `AgentRegistry` interface only has `MemoryRegistry` — a restart drops all agent registrations. This mirrors the AllowanceStore pattern: memory → SQLite → Redis. + +### 5.1 SQLite Registry + +- [ ] **Create `registry_sqlite.go`** — `SQLiteRegistry` implementing `AgentRegistry` interface +- [ ] Schema: `agents` table (id TEXT PK, name TEXT, capabilities TEXT JSON, status INT, last_heartbeat DATETIME, current_load INT, max_load INT, registered_at DATETIME) +- [ ] Use `modernc.org/sqlite` (already a transitive dep via go-store) with WAL mode +- [ ] `Register` → UPSERT, `Deregister` → DELETE, `Get` → SELECT, `List` → SELECT all, `Heartbeat` → UPDATE last_heartbeat, `Reap(ttl)` → UPDATE status=Offline WHERE last_heartbeat < now-ttl RETURNING id +- [ ] **Tests** — full parity with `registry_test.go` using `:memory:` SQLite, concurrent access under `-race` + +### 5.2 Redis Registry + +- [ ] **Create `registry_redis.go`** — `RedisRegistry` implementing `AgentRegistry` with TTL-based reaping +- [ ] Key pattern: `{prefix}:agent:{id}` → JSON AgentInfo, with TTL = heartbeat interval * 3 +- [ ] `Heartbeat` → re-SET with TTL refresh (natural expiry = auto-reap) +- [ ] `List` → SCAN `{prefix}:agent:*`, `Reap` → explicit scan for expired (backup to natural TTL) +- [ ] **Tests** — skip-if-no-Redis pattern, unique prefix per test + +### 5.3 Config Factory + +- [ ] **Add `RegistryConfig`** to `config.go` — `RegistryBackend string` (memory/sqlite/redis), `RegistryPath string`, `RegistryRedisAddr string` +- [ ] **`NewAgentRegistryFromConfig(cfg) (AgentRegistry, error)`** — factory mirroring `NewAllowanceStoreFromConfig` +- [ ] **Tests** — all backends, unknown backend error + +## Phase 6: Dead Code Cleanup + Rate Enforcement + +`HourlyRateLimit` and `CostCeiling` on `ModelQuota` are stored but never enforced in `AllowanceService.Check`. Either implement or remove. + +### 6.1 Enforce or Remove Dead Fields + +- [ ] **Audit `HourlyRateLimit` and `CostCeiling`** in `allowance_service.go` — these fields exist in `ModelQuota` but `Check()` never evaluates them +- [ ] If keeping: add hourly sliding window check in `AllowanceService.Check` before daily limit. Add `CostCeiling` as cumulative spend cap. Add `HourlyUsage` tracking to `AllowanceStore` interface (new method: `GetHourlyUsage(agentID, model string) (int64, error)`) +- [ ] If removing: delete fields from `ModelQuota`, update tests, document decision in FINDINGS.md +- [ ] **Tests** — hourly rate exceeded, cost ceiling exceeded, both combined with existing daily limits + +### 6.2 Fix DefaultBaseURL + +- [ ] **`DefaultBaseURL`** in `config.go` points to `api.core-agentic.dev` which doesn't exist. Change to empty string (require explicit config) or `http://localhost:8080` for local dev +- [ ] **Test** — verify default config doesn't silently fail + +## Phase 7: Priority-Ordered Dispatch + Retry + +`DispatchLoop` dispatches tasks in arbitrary order with no retry backoff. + +### 7.1 Priority Sorting + +- [ ] **Sort pending tasks** in `DispatchLoop` by `Priority` (Critical > High > Normal > Low) before dispatching +- [ ] **Tie-break** by `CreatedAt` (oldest first within same priority) +- [ ] **Tests** — 5 tasks with mixed priorities dispatched in correct order + +### 7.2 Retry Backoff + +- [ ] **Add `MaxRetries` and `RetryCount` fields** to `Task` type +- [ ] **Exponential backoff** in `DispatchLoop` — skip tasks where `RetryCount > 0` and `LastAttempt + backoff(RetryCount) > now` +- [ ] **Dead-letter** — tasks exceeding `MaxRetries` (default 3) get status `TaskFailed` with reason "max retries exceeded" +- [ ] **Tests** — retry delay respected, dead-letter after max retries, backoff calculation + +## Phase 8: Event Hooks + +Production orchestration needs event notifications for task lifecycle transitions. + +### 8.1 EventEmitter Interface + +- [ ] **Create `events.go`** — `Event` struct (Type string, TaskID string, AgentID string, Timestamp time.Time, Payload any) +- [ ] **`EventEmitter` interface** — `Emit(ctx context.Context, event Event) error` +- [ ] **`ChannelEmitter`** — in-process `chan Event` for local subscribers (buffered, non-blocking) +- [ ] **`MultiEmitter`** — fans out to multiple emitters +- [ ] **Tests** — emit and receive, buffer overflow drops, multi-emitter fan-out + +### 8.2 Dispatcher Integration + +- [ ] **Wire `EventEmitter`** into `Dispatcher` — emit on: task_dispatched, task_claimed, dispatch_failed (no agent), dispatch_failed (quota) +- [ ] **Wire into `AllowanceService`** — emit on: quota_warning (80%), quota_exceeded, usage_recorded +- [ ] **Tests** — verify events emitted at correct lifecycle points + --- ## Workflow