docs: add TODO.md and FINDINGS.md for fleet delegation

Co-Authored-By: Virgil <virgil@lethean.io>
2026-02-19 21:36:13 +00:00 · 2026-02-19 21:36:13 +00:00 · fb531af79a
commit fb531af79a
parent 9b53632e3d
2 changed files with 50 additions and 0 deletions
--- a/FINDINGS.md
+++ b/FINDINGS.md
@ -0,0 +1,23 @@
+# FINDINGS.md -- go-ratelimit
+
+## 2026-02-19: Split from core/go (Virgil)
+
+### Origin
+
+Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026.
+
+### Architecture
+
+- Sliding window rate limiter (1-minute window)
+- Daily request caps per model
+- Token counting via Google `CountTokens` API
+- Model-specific quota configuration
+
+### Gemini-Specific Defaults
+
+- `gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD
+- Quotas are currently hardcoded -- needs generalisation (see TODO Phase 1)
+
+### Tests
+
+- 1 test file covering sliding window and quota enforcement
--- a/TODO.md
+++ b/TODO.md
@ -0,0 +1,27 @@
+# TODO.md -- go-ratelimit
+
+## Phase 1: Generalise Beyond Gemini
+
+- [ ] Hardcoded model quotas are Gemini-specific -- abstract to provider-agnostic config
+- [ ] Add quota profiles for OpenAI, Anthropic, and local (Ollama/MLX) backends
+- [ ] Make default quotas configurable via YAML or environment variables
+
+## Phase 2: Persistent State
+
+- [ ] Currently stores state in YAML file -- not safe for multi-process access
+- [ ] Consider SQLite for concurrent read/write safety (WAL mode)
+- [ ] Add state recovery on restart (reload sliding window from persisted data)
+
+## Phase 3: Integration
+
+- [ ] Wire into go-ml backends for automatic rate limiting on inference calls
+- [ ] Wire into go-ai facade so all providers share a unified rate limit layer
+- [ ] Add metrics export (requests/minute, tokens/minute, rejections) for monitoring
+
+---
+
+## Workflow
+
+1. Virgil in core/go writes tasks here after research
+2. This repo's dedicated session picks up tasks in phase order
+3. Mark `[x]` when done, note commit hash