docs: add TODO.md and FINDINGS.md for fleet delegation

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Virgil 2026-02-19 21:36:13 +00:00 committed by Snider
parent 9b53632e3d
commit fb531af79a
2 changed files with 50 additions and 0 deletions

23
FINDINGS.md Normal file
View file

@ -0,0 +1,23 @@
# FINDINGS.md -- go-ratelimit
## 2026-02-19: Split from core/go (Virgil)
### Origin
Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026.
### Architecture
- Sliding window rate limiter (1-minute window)
- Daily request caps per model
- Token counting via Google `CountTokens` API
- Model-specific quota configuration
### Gemini-Specific Defaults
- `gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD
- Quotas are currently hardcoded -- needs generalisation (see TODO Phase 1)
### Tests
- 1 test file covering sliding window and quota enforcement

27
TODO.md Normal file
View file

@ -0,0 +1,27 @@
# TODO.md -- go-ratelimit
## Phase 1: Generalise Beyond Gemini
- [ ] Hardcoded model quotas are Gemini-specific -- abstract to provider-agnostic config
- [ ] Add quota profiles for OpenAI, Anthropic, and local (Ollama/MLX) backends
- [ ] Make default quotas configurable via YAML or environment variables
## Phase 2: Persistent State
- [ ] Currently stores state in YAML file -- not safe for multi-process access
- [ ] Consider SQLite for concurrent read/write safety (WAL mode)
- [ ] Add state recovery on restart (reload sliding window from persisted data)
## Phase 3: Integration
- [ ] Wire into go-ml backends for automatic rate limiting on inference calls
- [ ] Wire into go-ai facade so all providers share a unified rate limit layer
- [ ] Add metrics export (requests/minute, tokens/minute, rejections) for monitoring
---
## Workflow
1. Virgil in core/go writes tasks here after research
2. This repo's dedicated session picks up tasks in phase order
3. Mark `[x]` when done, note commit hash