docs: graduate TODO/FINDINGS into production documentation
Replace internal task tracking (TODO.md, FINDINGS.md) with structured documentation in docs/. Trim CLAUDE.md to agent instructions only. Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
parent
1afb1d636a
commit
cde6443e4c
6 changed files with 705 additions and 203 deletions
21
CLAUDE.md
21
CLAUDE.md
|
|
@ -1,19 +1,28 @@
|
|||
# CLAUDE.md
|
||||
|
||||
## What This Is
|
||||
Token counting, model quotas, and sliding window rate limiter.
|
||||
|
||||
Token counting, model quotas, and sliding window rate limiter. Module: `forge.lthn.ai/core/go-ratelimit`
|
||||
Module: `forge.lthn.ai/core/go-ratelimit`
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
go test ./... # Run all tests
|
||||
go test -v -run Name # Run single test
|
||||
go test ./... # run all tests
|
||||
go test -race ./... # race detector (required before commit)
|
||||
go test -v -run Name ./... # single test
|
||||
go vet ./... # vet check
|
||||
```
|
||||
|
||||
## Coding Standards
|
||||
## Standards
|
||||
|
||||
- UK English
|
||||
- `go test ./...` must pass before commit
|
||||
- `go test -race ./...` and `go vet ./...` must pass before commit
|
||||
- Conventional commits: `type(scope): description`
|
||||
- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
|
||||
- Coverage must not drop below 95%
|
||||
|
||||
## Docs
|
||||
|
||||
- `docs/architecture.md` — sliding window algorithm, provider quotas, YAML/SQLite backends
|
||||
- `docs/development.md` — prerequisites, test patterns, coding standards
|
||||
- `docs/history.md` — completed phases with commit hashes, known limitations
|
||||
|
|
|
|||
106
FINDINGS.md
106
FINDINGS.md
|
|
@ -1,106 +0,0 @@
|
|||
# FINDINGS.md -- go-ratelimit
|
||||
|
||||
## 2026-02-19: Split from core/go (Virgil)
|
||||
|
||||
### Origin
|
||||
|
||||
Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026.
|
||||
|
||||
### Architecture
|
||||
|
||||
- Sliding window rate limiter (1-minute window)
|
||||
- Daily request caps per model
|
||||
- Token counting via Google `CountTokens` API
|
||||
- Model-specific quota configuration
|
||||
|
||||
### Gemini-Specific Defaults
|
||||
|
||||
- `gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD
|
||||
- Quotas are currently hardcoded -- needs generalisation (see TODO Phase 1)
|
||||
|
||||
### Tests
|
||||
|
||||
- 1 test file covering sliding window and quota enforcement
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-20: Phase 0 -- Hardening (Charon)
|
||||
|
||||
### Coverage: 77.1% -> 95.1%
|
||||
|
||||
Rewrote test suite with testify assert/require. Table-driven subtests throughout.
|
||||
|
||||
#### Tests added
|
||||
|
||||
- **CanSend boundaries**: exact RPM/TPM/RPD limits, RPM-only, TPM-only, zero-token estimates, unknown models, unlimited models
|
||||
- **Prune**: keeps recent entries, prunes old ones, daily reset at 24h, boundary-exact timestamps, noop on non-existent model
|
||||
- **RecordUsage**: fresh state, accumulation, existing state
|
||||
- **Reset**: single model, all models (empty string), non-existent model
|
||||
- **WaitForCapacity**: immediate capacity, context cancellation, pre-cancelled context, unknown model
|
||||
- **Stats/AllStats**: known/unknown/quota-only models, pruning in AllStats, daily reset in AllStats
|
||||
- **Persist/Load**: round-trip, non-existent file, corrupt YAML, unreadable file, nested directory creation, unwritable directory
|
||||
- **Concurrency**: 20 goroutines x 50 ops (CanSend + RecordUsage + Stats), concurrent Reset + RecordUsage + AllStats
|
||||
- **Benchmarks**: BenchmarkCanSend (1000-entry window), BenchmarkRecordUsage, BenchmarkCanSendConcurrent
|
||||
|
||||
#### Remaining uncovered (5%)
|
||||
|
||||
- `CountTokens` success path: hardcoded Google URL prevents unit testing without URL injection. Only the connection-error path is covered.
|
||||
- `yaml.Marshal` error in `Persist()`: virtually impossible to trigger with valid structs.
|
||||
- `os.UserHomeDir` error in `NewWithConfig()`: only fails when `$HOME` is unset.
|
||||
|
||||
### Race detector
|
||||
|
||||
`go test -race ./...` passes clean. The `sync.RWMutex` correctly guards all shared state.
|
||||
|
||||
### go vet
|
||||
|
||||
No warnings.
|
||||
|
||||
---
|
||||
|
||||
## 2026-02-20: Phase 1 -- Generalisation (Charon)
|
||||
|
||||
### Problem
|
||||
|
||||
Hardcoded Gemini-specific quotas in `New()`. No way to configure for other providers.
|
||||
|
||||
### Solution
|
||||
|
||||
Introduced provider-agnostic configuration without breaking existing API.
|
||||
|
||||
#### New types
|
||||
|
||||
- `Provider` -- string type with constants: `ProviderGemini`, `ProviderOpenAI`, `ProviderAnthropic`, `ProviderLocal`
|
||||
- `ProviderProfile` -- bundles provider identity with model quotas map
|
||||
- `Config` -- construction config with `FilePath`, `Providers` list, `Quotas` map
|
||||
|
||||
#### New functions
|
||||
|
||||
- `DefaultProfiles()` -- returns pre-configured profiles for all four providers
|
||||
- `NewWithConfig(Config)` -- creates limiter from explicit configuration
|
||||
- `SetQuota(model, quota)` -- runtime quota modification
|
||||
- `AddProvider(provider)` -- loads all default quotas for a provider at runtime
|
||||
|
||||
#### Provider defaults (Feb 2026)
|
||||
|
||||
| Provider | Models | RPM | TPM | RPD |
|
||||
|----------|--------|-----|-----|-----|
|
||||
| Gemini | gemini-3-pro-preview, gemini-3-flash-preview, gemini-2.5-pro | 150 | 1M | 1000 |
|
||||
| Gemini | gemini-2.0-flash | 150 | 1M | unlimited |
|
||||
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
|
||||
| OpenAI | gpt-4o, gpt-4-turbo, o1 | 500 | 30K | unlimited |
|
||||
| OpenAI | gpt-4o-mini, o1-mini, o3-mini | 500 | 200K | unlimited |
|
||||
| Anthropic | claude-opus-4, claude-sonnet-4 | 50 | 40K | unlimited |
|
||||
| Anthropic | claude-haiku-3.5 | 50 | 50K | unlimited |
|
||||
| Local | (none by default) | -- | -- | -- |
|
||||
|
||||
#### Backward compatibility
|
||||
|
||||
`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Verified by `TestNewBackwardCompatibility` which asserts exact parity with the original hardcoded values.
|
||||
|
||||
#### Design notes
|
||||
|
||||
- Explicit quotas in `Config.Quotas` override provider defaults (merge-on-top pattern)
|
||||
- Local provider has no default quotas -- users add per-model limits for hardware throttling
|
||||
- `AddProvider()` is additive -- calling it does not remove existing quotas
|
||||
- All new methods are mutex-protected and safe for concurrent use
|
||||
91
TODO.md
91
TODO.md
|
|
@ -1,91 +0,0 @@
|
|||
# TODO.md -- go-ratelimit
|
||||
|
||||
Dispatched from core/go orchestration. Pick up tasks in order.
|
||||
|
||||
---
|
||||
|
||||
## Phase 0: Hardening & Test Coverage
|
||||
|
||||
- [x] **Expand test coverage** -- `ratelimit_test.go` rewritten with testify. Tests for: `CanSend()` at exact limits (RPM, TPM, RPD boundaries), `RecordUsage()` with concurrent goroutines, `WaitForCapacity()` timeout and immediate-capacity paths, `prune()` sliding window edge cases, daily reset logic (24h boundary), YAML persistence (save + reload), corrupt/unreadable state file recovery, `Reset()` single/all/nonexistent, `Stats()` known/unknown/quota-only models, `AllStats()` with pruning and daily reset.
|
||||
- [x] **Race condition test** -- `go test -race ./...` with 20 goroutines calling `CanSend()` + `RecordUsage()` + `Stats()` concurrently. Additional tests: concurrent `Reset()` + `RecordUsage()` + `AllStats()`, concurrent multi-model access (5 models), concurrent `Persist()` + `Load()` filesystem race, concurrent `AllStats()` + `RecordUsage()`, concurrent `WaitForCapacity()` + `RecordUsage()`. All pass clean.
|
||||
- [x] **Benchmark** -- 7 benchmarks: `BenchmarkCanSend` (1000-entry window), `BenchmarkRecordUsage`, `BenchmarkCanSendConcurrent` (parallel), `BenchmarkCanSendWithPrune` (500 old + 500 new), `BenchmarkStats` (1000 entries), `BenchmarkAllStats` (5 models x 200 entries), `BenchmarkPersist` (YAML I/O). Zero allocs on hot paths.
|
||||
- [x] **`go vet ./...` clean** -- No warnings.
|
||||
- **Coverage: 95.1%** (up from 77.1%). Remaining uncovered: `CountTokens` success path (hardcoded Google URL), `yaml.Marshal` error path in `Persist()`, `os.UserHomeDir` error path in `NewWithConfig`.
|
||||
|
||||
## Phase 1: Generalise Beyond Gemini
|
||||
|
||||
- [x] **Provider-agnostic config** -- Added `Provider` type, `ProviderProfile`, `Config` struct, `NewWithConfig()` constructor. Quotas are no longer hardcoded in `New()`.
|
||||
- [x] **Quota profiles** -- `DefaultProfiles()` returns pre-configured profiles for Gemini, OpenAI (gpt-4o, o1, o3-mini), Anthropic (claude-opus-4, claude-sonnet-4, claude-haiku-3.5), and Local (empty, user-configurable).
|
||||
- [x] **Configurable defaults** -- `Config` struct accepts `FilePath`, `Providers` list, and explicit `Quotas` map. Explicit quotas override provider defaults. YAML-serialisable.
|
||||
- [x] **Backward compatibility** -- `New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Existing API unchanged. Test `TestNewBackwardCompatibility` verifies exact parity.
|
||||
- [x] **Runtime configuration** -- `SetQuota()` and `AddProvider()` allow modifying quotas after construction. Both are mutex-protected.
|
||||
|
||||
## Phase 2: SQLite Persistent State
|
||||
|
||||
Current YAML persistence is single-process only. Phase 2 adds multi-process safe SQLite storage following the go-store pattern (`modernc.org/sqlite`, pure Go, no CGO).
|
||||
|
||||
### 2.1 SQLite Backend
|
||||
|
||||
- [x] **Add `modernc.org/sqlite` dependency** — `go get modernc.org/sqlite`. Pure Go, compiles everywhere.
|
||||
- [x] **Create `sqlite.go`** — Internal SQLite persistence layer:
|
||||
- `type sqliteStore struct { db *sql.DB }` — wraps database/sql connection
|
||||
- `func newSQLiteStore(dbPath string) (*sqliteStore, error)` — Open DB, set `PRAGMA journal_mode=WAL`, `PRAGMA busy_timeout=5000`, `db.SetMaxOpenConns(1)`. Create schema:
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS quotas (
|
||||
model TEXT PRIMARY KEY,
|
||||
max_rpm INTEGER NOT NULL DEFAULT 0,
|
||||
max_tpm INTEGER NOT NULL DEFAULT 0,
|
||||
max_rpd INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
CREATE TABLE IF NOT EXISTS requests (
|
||||
model TEXT NOT NULL,
|
||||
ts INTEGER NOT NULL -- UnixNano
|
||||
);
|
||||
CREATE TABLE IF NOT EXISTS tokens (
|
||||
model TEXT NOT NULL,
|
||||
ts INTEGER NOT NULL, -- UnixNano
|
||||
count INTEGER NOT NULL
|
||||
);
|
||||
CREATE TABLE IF NOT EXISTS daily (
|
||||
model TEXT PRIMARY KEY,
|
||||
day_start INTEGER NOT NULL,
|
||||
day_count INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts);
|
||||
CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
|
||||
```
|
||||
- `func (s *sqliteStore) saveQuotas(quotas map[string]ModelQuota) error` — UPSERT all quotas
|
||||
- `func (s *sqliteStore) loadQuotas() (map[string]ModelQuota, error)` — SELECT all quotas
|
||||
- `func (s *sqliteStore) saveState(state map[string]*UsageStats) error` — Transaction: DELETE old + INSERT requests/tokens/daily for each model
|
||||
- `func (s *sqliteStore) loadState() (map[string]*UsageStats, error)` — SELECT and reconstruct UsageStats map
|
||||
- `func (s *sqliteStore) close() error` — Close DB connection
|
||||
|
||||
### 2.2 Wire Into RateLimiter
|
||||
|
||||
- [x] **Add `Backend` field to Config** — `Backend string` with values `"yaml"` (default), `"sqlite"`. Default `""` maps to `"yaml"` for backward compat.
|
||||
- [x] **Update `Persist()` and `Load()`** — Check internal backend type. If SQLite, use `sqliteStore`; otherwise use existing YAML. Keep both paths working.
|
||||
- [x] **Add `NewWithSQLite(dbPath string) (*RateLimiter, error)`** — Convenience constructor that creates a SQLite-backed limiter. Sets backend type, initialises DB.
|
||||
- [x] **Graceful close** — Add `Close() error` method that closes SQLite DB if open. No-op for YAML backend.
|
||||
|
||||
### 2.3 Tests
|
||||
|
||||
- [x] **SQLite basic tests** — newSQLiteStore, saveQuotas/loadQuotas round-trip, saveState/loadState round-trip, close.
|
||||
- [x] **SQLite integration** — NewWithSQLite, RecordUsage → Persist → Load → verify state preserved. Same test matrix as existing YAML tests but with SQLite backend.
|
||||
- [x] **Concurrent SQLite** — 10 goroutines x 20 ops (RecordUsage + CanSend + Persist). Race-clean.
|
||||
- [x] **YAML backward compat** — Existing tests pass unchanged (still default to YAML).
|
||||
- [x] **Migration helper** — `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — reads YAML state, writes to SQLite. Test with sample YAML.
|
||||
- [x] **Corrupt DB recovery** — Truncated DB file → graceful error, fresh start.
|
||||
|
||||
## Phase 3: Integration
|
||||
|
||||
- [ ] Wire into go-ml backends for automatic rate limiting on inference calls
|
||||
- [ ] Wire into go-ai facade so all providers share a unified rate limit layer
|
||||
- [ ] Add metrics export (requests/minute, tokens/minute, rejections) for monitoring
|
||||
|
||||
---
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Virgil in core/go writes tasks here after research
|
||||
2. This repo's dedicated session picks up tasks in phase order
|
||||
3. Mark `[x]` when done, note commit hash
|
||||
286
docs/architecture.md
Normal file
286
docs/architecture.md
Normal file
|
|
@ -0,0 +1,286 @@
|
|||
# Architecture
|
||||
|
||||
go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
|
||||
three independent quota dimensions per model — requests per minute (RPM), tokens
|
||||
per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
|
||||
that can be persisted across process restarts via YAML or SQLite.
|
||||
|
||||
Module path: `forge.lthn.ai/core/go-ratelimit`
|
||||
|
||||
---
|
||||
|
||||
## Sliding Window Algorithm
|
||||
|
||||
The limiter maintains per-model `UsageStats` structs in memory:
|
||||
|
||||
```go
|
||||
type UsageStats struct {
|
||||
Requests []time.Time // timestamps of recent requests (1-minute window)
|
||||
Tokens []TokenEntry // token counts with timestamps (1-minute window)
|
||||
DayStart time.Time // when the current daily window started
|
||||
DayCount int // total requests recorded since DayStart
|
||||
}
|
||||
```
|
||||
|
||||
Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
|
||||
slices and discards entries older than `now - 1 minute`. Pruning is done
|
||||
in-place to avoid allocation on the hot path:
|
||||
|
||||
```go
|
||||
validReqs := 0
|
||||
for _, t := range stats.Requests {
|
||||
if t.After(window) {
|
||||
stats.Requests[validReqs] = t
|
||||
validReqs++
|
||||
}
|
||||
}
|
||||
stats.Requests = stats.Requests[:validReqs]
|
||||
```
|
||||
|
||||
The same loop runs for token entries. After pruning, `CanSend()` checks each
|
||||
quota dimension in priority order: RPD first (cheapest check), then RPM, then
|
||||
TPM. A zero value for any dimension means that dimension is unlimited. If all
|
||||
three are zero the model is treated as fully unlimited and the check short-circuits
|
||||
before touching any state.
|
||||
|
||||
### Daily Reset
|
||||
|
||||
The daily counter resets automatically inside `prune()`. When
|
||||
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
|
||||
to the current time. This means the daily window is a rolling 24-hour period
|
||||
anchored to the first request of the day, not a calendar boundary.
|
||||
|
||||
### Concurrency
|
||||
|
||||
All reads and writes are protected by a single `sync.RWMutex`. Methods that
|
||||
write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
|
||||
full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
|
||||
where possible. The `CanSend()` method acquires a write lock because it calls
|
||||
`prune()`, which mutates the state slices.
|
||||
|
||||
`go test -race ./...` passes clean with 20 goroutines performing concurrent
|
||||
`CanSend()`, `RecordUsage()`, and `Stats()` calls.
|
||||
|
||||
---
|
||||
|
||||
## Provider and Quota Configuration
|
||||
|
||||
### Types
|
||||
|
||||
```go
|
||||
type Provider string // "gemini", "openai", "anthropic", "local"
|
||||
|
||||
type ModelQuota struct {
|
||||
MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
|
||||
MaxTPM int `yaml:"max_tpm"`
|
||||
MaxRPD int `yaml:"max_rpd"`
|
||||
}
|
||||
|
||||
type Config struct {
|
||||
FilePath string // default: ~/.core/ratelimits.yaml
|
||||
Backend string // "yaml" (default) or "sqlite"
|
||||
Quotas map[string]ModelQuota // explicit per-model overrides
|
||||
Providers []Provider // provider profiles to load
|
||||
}
|
||||
```
|
||||
|
||||
### Quota Resolution
|
||||
|
||||
1. Provider profiles are loaded first (from `DefaultProfiles()`).
|
||||
2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
|
||||
3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
|
||||
|
||||
`SetQuota()` and `AddProvider()` allow runtime modification; both are
|
||||
mutex-protected. `AddProvider()` is additive — it does not remove existing
|
||||
quotas for models outside the new provider's profile.
|
||||
|
||||
### Default Quotas (as of February 2026)
|
||||
|
||||
| Provider | Model | MaxRPM | MaxTPM | MaxRPD |
|
||||
|-----------|------------------------|-----------|-----------|-----------|
|
||||
| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 |
|
||||
| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 |
|
||||
| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 |
|
||||
| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited |
|
||||
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
|
||||
| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited |
|
||||
| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited |
|
||||
| OpenAI | o1, o3-mini | 500 | varies | unlimited |
|
||||
| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited |
|
||||
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited |
|
||||
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited |
|
||||
| Local | (none by default) | user-defined |
|
||||
|
||||
The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
|
||||
where the throttle limit is hardware rather than an API quota. No defaults are
|
||||
provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
|
||||
|
||||
---
|
||||
|
||||
## YAML Persistence (Legacy)
|
||||
|
||||
The default backend serialises the entire `RateLimiter` struct — both the
|
||||
`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
|
||||
|
||||
```yaml
|
||||
quotas:
|
||||
gemini-3-pro-preview:
|
||||
max_rpm: 150
|
||||
max_tpm: 1000000
|
||||
max_rpd: 1000
|
||||
state:
|
||||
gemini-3-pro-preview:
|
||||
requests:
|
||||
- 2026-02-20T14:32:01.123456789Z
|
||||
tokens:
|
||||
- time: 2026-02-20T14:32:01.123456789Z
|
||||
count: 1500
|
||||
day_start: 2026-02-20T00:00:00Z
|
||||
day_count: 42
|
||||
```
|
||||
|
||||
`Persist()` creates parent directories with `os.MkdirAll` before writing.
|
||||
`Load()` treats a missing file as an empty state (no error). Corrupt or
|
||||
unreadable files return an error.
|
||||
|
||||
**Limitations of YAML backend:**
|
||||
- Single-process only. Concurrent writes from multiple processes corrupt the
|
||||
file because the write is not atomic at the OS level.
|
||||
- The entire state is serialised on every `Persist()` call, which grows linearly
|
||||
with the number of tracked models and entries.
|
||||
- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
|
||||
preserved by Go's time marshaller but depends on the YAML library.
|
||||
|
||||
---
|
||||
|
||||
## SQLite Backend
|
||||
|
||||
The SQLite backend was added in Phase 2 to support multi-process scenarios and
|
||||
provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
|
||||
Go port of SQLite that compiles without CGO.
|
||||
|
||||
### Connection Settings
|
||||
|
||||
```go
|
||||
db.SetMaxOpenConns(1) // single connection for PRAGMA consistency
|
||||
db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers
|
||||
db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout
|
||||
```
|
||||
|
||||
WAL mode allows one writer and multiple concurrent readers. The 5-second busy
|
||||
timeout prevents immediate failure when a second process is mid-commit. A single
|
||||
`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
|
||||
at the file level; multiple Go connections to the same file through a single
|
||||
process would not add throughput but would complicate locking.
|
||||
|
||||
### Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS quotas (
|
||||
model TEXT PRIMARY KEY,
|
||||
max_rpm INTEGER NOT NULL DEFAULT 0,
|
||||
max_tpm INTEGER NOT NULL DEFAULT 0,
|
||||
max_rpd INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS requests (
|
||||
model TEXT NOT NULL,
|
||||
ts INTEGER NOT NULL -- UnixNano
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS tokens (
|
||||
model TEXT NOT NULL,
|
||||
ts INTEGER NOT NULL, -- UnixNano
|
||||
count INTEGER NOT NULL
|
||||
);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS daily (
|
||||
model TEXT PRIMARY KEY,
|
||||
day_start INTEGER NOT NULL, -- UnixNano
|
||||
day_count INTEGER NOT NULL DEFAULT 0
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts);
|
||||
CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
|
||||
```
|
||||
|
||||
Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
|
||||
precision without relying on SQLite's text date format, and allows efficient
|
||||
range queries using the composite indices.
|
||||
|
||||
### Save Strategy
|
||||
|
||||
`saveState()` uses a delete-then-insert pattern inside a single transaction.
|
||||
All three state tables are truncated and rewritten atomically:
|
||||
|
||||
```go
|
||||
tx.Exec("DELETE FROM requests")
|
||||
tx.Exec("DELETE FROM tokens")
|
||||
tx.Exec("DELETE FROM daily")
|
||||
// then INSERT for every model in state
|
||||
tx.Commit()
|
||||
```
|
||||
|
||||
`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
|
||||
existing quota rows are updated in place without deleting unrelated models.
|
||||
|
||||
### Constructors
|
||||
|
||||
```go
|
||||
// YAML backend (default)
|
||||
rl, err := ratelimit.New()
|
||||
rl, err := ratelimit.NewWithConfig(cfg)
|
||||
|
||||
// SQLite backend
|
||||
rl, err := ratelimit.NewWithSQLite(dbPath)
|
||||
rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
|
||||
|
||||
defer rl.Close() // releases the database connection
|
||||
```
|
||||
|
||||
`Close()` is a no-op on YAML-backed limiters.
|
||||
|
||||
---
|
||||
|
||||
## Migration Path
|
||||
|
||||
`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
|
||||
state file and writes all quotas and usage state to a new SQLite database. The
|
||||
function is idempotent — running it again on the same YAML file overwrites the
|
||||
SQLite database state.
|
||||
|
||||
Typical one-time migration:
|
||||
|
||||
```go
|
||||
err := ratelimit.MigrateYAMLToSQLite(
|
||||
filepath.Join(home, ".core", "ratelimits.yaml"),
|
||||
filepath.Join(home, ".core", "ratelimits.db"),
|
||||
)
|
||||
```
|
||||
|
||||
After migration, switch the constructor:
|
||||
|
||||
```go
|
||||
// Before
|
||||
rl, _ := ratelimit.New()
|
||||
|
||||
// After
|
||||
rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
|
||||
defer rl.Close()
|
||||
```
|
||||
|
||||
The YAML file can be kept as a backup; the two backends do not share state.
|
||||
|
||||
---
|
||||
|
||||
## CountTokens
|
||||
|
||||
`CountTokens(apiKey, model, text string) (int, error)` calls the Google
|
||||
Generative Language API to obtain an exact token count for a prompt string. It
|
||||
is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
|
||||
endpoint. The URL is not configurable, which prevents unit testing of the
|
||||
success path without network access.
|
||||
|
||||
For other providers, callers must supply `estimatedTokens` directly to
|
||||
`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
|
||||
in API response metadata after a call completes.
|
||||
207
docs/development.md
Normal file
207
docs/development.md
Normal file
|
|
@ -0,0 +1,207 @@
|
|||
# Development Guide
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Go 1.25 or later (the module declares `go 1.25.5`)
|
||||
- No CGO required — `modernc.org/sqlite` is a pure Go port
|
||||
|
||||
No C toolchain, no system SQLite library, no external build tools. A plain
|
||||
`go build ./...` is sufficient.
|
||||
|
||||
---
|
||||
|
||||
## Build and Test
|
||||
|
||||
```bash
|
||||
# Run all tests
|
||||
go test ./...
|
||||
|
||||
# Run all tests with the race detector (required before every commit)
|
||||
go test -race ./...
|
||||
|
||||
# Run a single test by name
|
||||
go test -v -run TestCanSend ./...
|
||||
|
||||
# Run a single subtest
|
||||
go test -v -run "TestCanSend/RPM_at_exact_limit_is_rejected" ./...
|
||||
|
||||
# Run benchmarks
|
||||
go test -bench=. -benchmem ./...
|
||||
|
||||
# Run a specific benchmark
|
||||
go test -bench=BenchmarkCanSend -benchmem ./...
|
||||
|
||||
# Check for vet issues
|
||||
go vet ./...
|
||||
|
||||
# Tidy dependencies
|
||||
go mod tidy
|
||||
```
|
||||
|
||||
All three commands (`go test -race ./...`, `go vet ./...`, and `go mod tidy`)
|
||||
must produce no errors or warnings before a commit is pushed.
|
||||
|
||||
---
|
||||
|
||||
## Test Patterns
|
||||
|
||||
### File Organisation
|
||||
|
||||
- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
|
||||
- `sqlite_test.go` — Phase 2 (SQLite backend)
|
||||
|
||||
Both files are in `package ratelimit` (white-box tests) so they can access
|
||||
unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
|
||||
|
||||
### Naming Convention
|
||||
|
||||
SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
|
||||
|
||||
- `_Good` — happy path
|
||||
- `_Bad` — expected error conditions (invalid paths, corrupt input)
|
||||
- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
|
||||
|
||||
Core logic tests use plain descriptive names without suffixes, grouped by
|
||||
method with table-driven subtests.
|
||||
|
||||
### Test Helpers
|
||||
|
||||
`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
|
||||
redirects the YAML file path into `t.TempDir()`:
|
||||
|
||||
```go
|
||||
func newTestLimiter(t *testing.T) *RateLimiter {
|
||||
t.Helper()
|
||||
rl, err := New()
|
||||
require.NoError(t, err)
|
||||
rl.filePath = filepath.Join(t.TempDir(), "ratelimits.yaml")
|
||||
return rl
|
||||
}
|
||||
```
|
||||
|
||||
Use `t.TempDir()` for all file paths in tests. Go cleans these up automatically
|
||||
after each test completes.
|
||||
|
||||
### Testify Usage
|
||||
|
||||
Tests use `github.com/stretchr/testify` exclusively:
|
||||
|
||||
- `require.NoError(t, err)` — fail immediately on setup errors
|
||||
- `assert.NoError(t, err)` — record failure but continue
|
||||
- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons
|
||||
- `assert.True / assert.False` — for boolean checks
|
||||
- `assert.Empty / assert.Len` — for slice length checks
|
||||
- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
|
||||
|
||||
Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
|
||||
|
||||
### Race Tests
|
||||
|
||||
Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
|
||||
assert anything beyond absence of data races (the race detector does the work):
|
||||
|
||||
```go
|
||||
var wg sync.WaitGroup
|
||||
for i := 0; i < 20; i++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
// concurrent operations
|
||||
}()
|
||||
}
|
||||
wg.Wait()
|
||||
```
|
||||
|
||||
Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
|
||||
clean.
|
||||
|
||||
### Coverage
|
||||
|
||||
Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
|
||||
be covered in unit tests without modifying the production code:
|
||||
|
||||
1. `CountTokens` success path — hardcoded Google API URL requires network access
|
||||
2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
|
||||
3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
|
||||
|
||||
Do not lower coverage below 95% without a documented reason.
|
||||
|
||||
---
|
||||
|
||||
## Coding Standards
|
||||
|
||||
### Language
|
||||
|
||||
UK English throughout: colour, organisation, serialise, initialise, behaviour.
|
||||
Do not use American spellings in identifiers, comments, or documentation.
|
||||
|
||||
### Go Style
|
||||
|
||||
- All exported types, functions, and fields must have doc comments
|
||||
- Error strings must be lowercase and not end with punctuation (Go convention)
|
||||
- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
|
||||
prefix `ratelimit.` is included so errors identify their origin clearly
|
||||
- No `init()` functions
|
||||
- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
|
||||
map on each call)
|
||||
|
||||
### Mutex Discipline
|
||||
|
||||
The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
|
||||
|
||||
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
|
||||
even if they appear read-only, because `prune()` mutates slices
|
||||
- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
|
||||
snapshot of state
|
||||
- Lock acquisition always happens at the top of the public method, never inside
|
||||
a helper — helpers document "Caller must hold the lock"
|
||||
- Never call a public method from inside another public method while holding
|
||||
the lock (deadlock risk)
|
||||
|
||||
### Dependencies
|
||||
|
||||
Direct dependencies are intentionally minimal:
|
||||
|
||||
| Dependency | Purpose |
|
||||
|------------|---------|
|
||||
| `gopkg.in/yaml.v3` | YAML serialisation for legacy backend |
|
||||
| `modernc.org/sqlite` | Pure Go SQLite for persistent backend |
|
||||
| `github.com/stretchr/testify` | Test assertions (test-only) |
|
||||
|
||||
Do not add `database/sql` drivers beyond `modernc.org/sqlite`. Do not add HTTP
|
||||
client libraries; the existing `CountTokens` function uses the standard library.
|
||||
|
||||
---
|
||||
|
||||
## Licence
|
||||
|
||||
EUPL-1.2. Every new source file must carry the standard header if the project
|
||||
adopts per-file headers in future. Confirm with the project lead before adding
|
||||
files under a different licence.
|
||||
|
||||
---
|
||||
|
||||
## Commit Convention
|
||||
|
||||
Format: `type(scope): description`
|
||||
|
||||
Common types: `feat`, `fix`, `test`, `refactor`, `docs`, `perf`, `chore`
|
||||
|
||||
Common scopes: `ratelimit`, `sqlite`, `persist`, `config`
|
||||
|
||||
Every commit must include:
|
||||
|
||||
```
|
||||
Co-Authored-By: Virgil <virgil@lethean.io>
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
feat(sqlite): add WAL-mode SQLite backend with migration helper
|
||||
|
||||
Co-Authored-By: Virgil <virgil@lethean.io>
|
||||
```
|
||||
|
||||
Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
|
||||
pass. `go mod tidy` must produce no changes.
|
||||
197
docs/history.md
Normal file
197
docs/history.md
Normal file
|
|
@ -0,0 +1,197 @@
|
|||
# Project History
|
||||
|
||||
## Origin
|
||||
|
||||
go-ratelimit was extracted from the `pkg/ratelimit` package inside
|
||||
`forge.lthn.ai/core/go` on 19 February 2026. The extraction gave the package
|
||||
its own module path, repository, and independent development cadence.
|
||||
|
||||
Initial commit: `fa1a6fc` — `feat: extract go-ratelimit from core/go pkg/ratelimit`
|
||||
|
||||
At extraction the package implemented:
|
||||
|
||||
- Sliding window rate limiter with 1-minute window
|
||||
- Daily request caps per model
|
||||
- Token counting via Google `CountTokens` API
|
||||
- Hardcoded Gemini quota defaults (`gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD)
|
||||
- YAML persistence to `~/.core/ratelimits.yaml`
|
||||
- Single test file with basic sliding window and quota enforcement tests
|
||||
|
||||
---
|
||||
|
||||
## Phase 0 — Hardening and Test Coverage
|
||||
|
||||
Commit: `3c63b10` — `feat(ratelimit): generalise beyond Gemini with provider profiles and push coverage to 95%`
|
||||
|
||||
Supplementary commit: `db958f2` — `test: expand race coverage and benchmarks`
|
||||
|
||||
Coverage increased from 77.1% to 95.1%. The test suite was rewritten using
|
||||
testify with table-driven subtests throughout.
|
||||
|
||||
### Tests added
|
||||
|
||||
- `TestCanSend` — boundary conditions at exact RPM, TPM, and RPD limits;
|
||||
RPM-only and TPM-only quotas; zero-token estimates; unknown and unlimited models
|
||||
- `TestPrune` — pruning of old entries, retention of recent entries, daily reset
|
||||
at 24-hour boundary, no-op on non-existent model, boundary-exact timestamps
|
||||
- `TestRecordUsage` — fresh state, accumulation, insertion into existing state
|
||||
- `TestReset` — single model, all models (empty string argument), non-existent model
|
||||
- `TestWaitForCapacity` — context cancellation, pre-cancelled context,
|
||||
immediate capacity, unknown model
|
||||
- `TestStats` / `TestAllStats` — known, unknown, and quota-only models; pruning
|
||||
and daily reset inside `AllStats()`
|
||||
- `TestPersistAndLoad` — round-trip, missing file, corrupt YAML, unreadable file,
|
||||
nested directory creation, unwritable directory
|
||||
- `TestConcurrentAccess` — 20 goroutines x 50 ops each (CanSend + RecordUsage + Stats)
|
||||
- `TestConcurrentResetAndRecord` — concurrent Reset + RecordUsage + AllStats
|
||||
- `TestConcurrentMultipleModels` — 5 models, concurrent access
|
||||
- `TestConcurrentPersistAndLoad` — filesystem race between Persist and Load
|
||||
- `TestConcurrentWaitForCapacityAndRecordUsage` — WaitForCapacity racing RecordUsage
|
||||
|
||||
### Benchmarks added
|
||||
|
||||
- `BenchmarkCanSend` — 1,000-entry sliding window
|
||||
- `BenchmarkRecordUsage`
|
||||
- `BenchmarkCanSendConcurrent` — parallel goroutines
|
||||
- `BenchmarkCanSendWithPrune` — 500 old + 500 new entries
|
||||
- `BenchmarkStats` — 1,000-entry window
|
||||
- `BenchmarkAllStats` — 5 models x 200 entries
|
||||
- `BenchmarkPersist` — YAML I/O
|
||||
|
||||
### Remaining uncovered paths (5%)
|
||||
|
||||
These three paths are structurally impossible to cover in unit tests without
|
||||
modifying production code:
|
||||
|
||||
1. `CountTokens` success path — the Google API URL is hardcoded; unit tests
|
||||
cannot intercept the HTTP call without URL injection support
|
||||
2. `yaml.Marshal` error path in `Persist()` — `yaml.Marshal` does not fail on
|
||||
valid Go structs; the error branch exists for correctness only
|
||||
3. `os.UserHomeDir()` error path in `NewWithConfig()` — triggered only when
|
||||
`$HOME` is unset, which test infrastructure prevents
|
||||
|
||||
`go test -race ./...` passed clean. `go vet ./...` produced no warnings.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Generalisation Beyond Gemini
|
||||
|
||||
Commit: `3c63b10` — included in the same commit as Phase 0
|
||||
|
||||
The hardcoded Gemini quotas in `New()` were replaced with a provider-agnostic
|
||||
configuration system without breaking the existing API.
|
||||
|
||||
### New types and functions
|
||||
|
||||
- `Provider` string type with constants: `ProviderGemini`, `ProviderOpenAI`,
|
||||
`ProviderAnthropic`, `ProviderLocal`
|
||||
- `ProviderProfile` — bundles a provider identifier with its model quota map
|
||||
- `Config` — construction configuration accepting `FilePath`, `Backend`,
|
||||
`Providers`, and `Quotas` fields
|
||||
- `DefaultProfiles()` — returns fresh pre-configured profiles for all four providers
|
||||
- `NewWithConfig(Config)` — creates a limiter from explicit configuration
|
||||
- `SetQuota(model, quota)` — runtime quota modification, mutex-protected
|
||||
- `AddProvider(provider)` — loads all default quotas for a provider at runtime,
|
||||
additive, mutex-protected
|
||||
|
||||
### Backward compatibility
|
||||
|
||||
`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`.
|
||||
`TestNewBackwardCompatibility` asserts exact parity with the original hardcoded
|
||||
values. No existing call sites required modification.
|
||||
|
||||
### Design decision: merge-on-top
|
||||
|
||||
Explicit `Config.Quotas` override provider profile defaults. This allows callers
|
||||
to use a provider profile for most models while customising specific model limits
|
||||
without forking the entire profile.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — SQLite Persistent State
|
||||
|
||||
Commit: `1afb1d6` — `feat(persist): Phase 2 — SQLite backend with WAL mode`
|
||||
|
||||
The YAML backend serialises the full state on every `Persist()` call and is
|
||||
not safe for concurrent multi-process access. Phase 2 added a SQLite backend
|
||||
using `modernc.org/sqlite` (pure Go, no CGO) following the go-store pattern
|
||||
established elsewhere in the ecosystem.
|
||||
|
||||
### New constructors
|
||||
|
||||
- `NewWithSQLite(dbPath string)` — SQLite-backed limiter with Gemini defaults
|
||||
- `NewWithSQLiteConfig(dbPath string, cfg Config)` — SQLite-backed with custom config
|
||||
- `Close() error` — releases the database connection; no-op on YAML-backed limiters
|
||||
|
||||
### Migration
|
||||
|
||||
- `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — one-shot migration
|
||||
helper that reads an existing YAML state file and writes all quotas and usage
|
||||
state to a new SQLite database
|
||||
|
||||
### SQLite connection settings
|
||||
|
||||
- `PRAGMA journal_mode=WAL` — enables concurrent reads alongside a single writer
|
||||
- `PRAGMA busy_timeout=5000` — 5-second wait on lock contention before returning an error
|
||||
- `db.SetMaxOpenConns(1)` — single connection for PRAGMA consistency
|
||||
|
||||
### Tests added (sqlite_test.go)
|
||||
|
||||
- `TestNewSQLiteStore_Good / _Bad` — creation and invalid path handling
|
||||
- `TestSQLiteQuotasRoundTrip_Good` — save/load round-trip
|
||||
- `TestSQLiteQuotasUpsert_Good` — upsert replaces existing rows
|
||||
- `TestSQLiteStateRoundTrip_Good` — multi-model state with nanosecond precision
|
||||
- `TestSQLiteStateOverwrite_Good` — delete-then-insert atomicity
|
||||
- `TestSQLiteEmptyState_Good` — fresh database returns empty maps
|
||||
- `TestNewWithSQLite_Good / TestNewWithSQLiteConfig_Good` — constructor tests
|
||||
- `TestSQLitePersistAndLoad_Good` — full persist + reload cycle
|
||||
- `TestSQLitePersistMultipleModels_Good` — multi-provider persistence
|
||||
- `TestSQLiteConcurrent_Good` — 10 goroutines x 20 ops, race-clean
|
||||
- `TestYAMLBackwardCompat_Good` — existing YAML tests pass unchanged
|
||||
- `TestMigrateYAMLToSQLite_Good / _Bad` — migration round-trip and error paths
|
||||
- `TestSQLiteCorruptDB_Ugly / TestSQLiteTruncatedDB_Ugly` — graceful corrupt DB recovery
|
||||
- `TestSQLiteEndToEnd_Good` — full two-session scenario
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Integration (Planned)
|
||||
|
||||
Not yet implemented. Intended downstream integrations:
|
||||
|
||||
- Wire into `go-ml` backends so rate limiting is enforced automatically on
|
||||
inference calls without caller involvement
|
||||
- Wire into the `go-ai` facade so all providers share a single rate limit layer
|
||||
- Export metrics (requests/minute, tokens/minute, rejection counts) for
|
||||
monitoring dashboards
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
**CountTokens URL is hardcoded.** The `CountTokens` helper calls
|
||||
`generativelanguage.googleapis.com` directly. There is no way to override the
|
||||
base URL, which prevents testing the success path in unit tests and prevents
|
||||
use with Gemini-compatible proxies. A future refactor would accept a base URL
|
||||
parameter or an `http.Client`.
|
||||
|
||||
**saveState is a full table replace.** On every `Persist()` call, the `requests`,
|
||||
`tokens`, and `daily` tables are truncated and rewritten. For a limiter tracking
|
||||
many models with high RPM, this means writing hundreds of rows on every persist
|
||||
call. A future optimisation would use incremental writes (insert-only, with
|
||||
periodic vacuuming of expired rows).
|
||||
|
||||
**No TTL on SQLite rows.** Historical rows older than one minute are pruned from
|
||||
the in-memory `UsageStats` on every operation but are written wholesale to
|
||||
SQLite on `Persist()`. The database does not grow unboundedly between persist
|
||||
cycles because `saveState` replaces all rows, but if `Persist()` is called
|
||||
frequently the WAL file can grow transiently.
|
||||
|
||||
**WaitForCapacity polling interval is fixed at 1 second.** This is appropriate
|
||||
for RPM-scale limits but is coarse for sub-second limits. If a caller needs
|
||||
finer-grained waiting (e.g., smoothing requests within a minute), they must
|
||||
implement their own loop.
|
||||
|
||||
**No automatic persistence.** `Persist()` must be called explicitly. If a
|
||||
process exits without calling `Persist()`, any usage recorded since the last
|
||||
persist is lost. Callers are responsible for calling `Persist()` at appropriate
|
||||
intervals (e.g., after each `RecordUsage()` call, or on a ticker).
|
||||
Loading…
Add table
Reference in a new issue