docs: graduate TODO/FINDINGS into production documentation

Replace internal task tracking (TODO.md, FINDINGS.md) with structured
documentation in docs/. Trim CLAUDE.md to agent instructions only.

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Snider 2026-02-20 15:01:55 +00:00
parent 1afb1d636a
commit cde6443e4c
6 changed files with 705 additions and 203 deletions

View file

@ -1,19 +1,28 @@
# CLAUDE.md
## What This Is
Token counting, model quotas, and sliding window rate limiter.
Token counting, model quotas, and sliding window rate limiter. Module: `forge.lthn.ai/core/go-ratelimit`
Module: `forge.lthn.ai/core/go-ratelimit`
## Commands
```bash
go test ./... # Run all tests
go test -v -run Name # Run single test
go test ./... # run all tests
go test -race ./... # race detector (required before commit)
go test -v -run Name ./... # single test
go vet ./... # vet check
```
## Coding Standards
## Standards
- UK English
- `go test ./...` must pass before commit
- `go test -race ./...` and `go vet ./...` must pass before commit
- Conventional commits: `type(scope): description`
- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
- Coverage must not drop below 95%
## Docs
- `docs/architecture.md` — sliding window algorithm, provider quotas, YAML/SQLite backends
- `docs/development.md` — prerequisites, test patterns, coding standards
- `docs/history.md` — completed phases with commit hashes, known limitations

View file

@ -1,106 +0,0 @@
# FINDINGS.md -- go-ratelimit
## 2026-02-19: Split from core/go (Virgil)
### Origin
Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026.
### Architecture
- Sliding window rate limiter (1-minute window)
- Daily request caps per model
- Token counting via Google `CountTokens` API
- Model-specific quota configuration
### Gemini-Specific Defaults
- `gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD
- Quotas are currently hardcoded -- needs generalisation (see TODO Phase 1)
### Tests
- 1 test file covering sliding window and quota enforcement
---
## 2026-02-20: Phase 0 -- Hardening (Charon)
### Coverage: 77.1% -> 95.1%
Rewrote test suite with testify assert/require. Table-driven subtests throughout.
#### Tests added
- **CanSend boundaries**: exact RPM/TPM/RPD limits, RPM-only, TPM-only, zero-token estimates, unknown models, unlimited models
- **Prune**: keeps recent entries, prunes old ones, daily reset at 24h, boundary-exact timestamps, noop on non-existent model
- **RecordUsage**: fresh state, accumulation, existing state
- **Reset**: single model, all models (empty string), non-existent model
- **WaitForCapacity**: immediate capacity, context cancellation, pre-cancelled context, unknown model
- **Stats/AllStats**: known/unknown/quota-only models, pruning in AllStats, daily reset in AllStats
- **Persist/Load**: round-trip, non-existent file, corrupt YAML, unreadable file, nested directory creation, unwritable directory
- **Concurrency**: 20 goroutines x 50 ops (CanSend + RecordUsage + Stats), concurrent Reset + RecordUsage + AllStats
- **Benchmarks**: BenchmarkCanSend (1000-entry window), BenchmarkRecordUsage, BenchmarkCanSendConcurrent
#### Remaining uncovered (5%)
- `CountTokens` success path: hardcoded Google URL prevents unit testing without URL injection. Only the connection-error path is covered.
- `yaml.Marshal` error in `Persist()`: virtually impossible to trigger with valid structs.
- `os.UserHomeDir` error in `NewWithConfig()`: only fails when `$HOME` is unset.
### Race detector
`go test -race ./...` passes clean. The `sync.RWMutex` correctly guards all shared state.
### go vet
No warnings.
---
## 2026-02-20: Phase 1 -- Generalisation (Charon)
### Problem
Hardcoded Gemini-specific quotas in `New()`. No way to configure for other providers.
### Solution
Introduced provider-agnostic configuration without breaking existing API.
#### New types
- `Provider` -- string type with constants: `ProviderGemini`, `ProviderOpenAI`, `ProviderAnthropic`, `ProviderLocal`
- `ProviderProfile` -- bundles provider identity with model quotas map
- `Config` -- construction config with `FilePath`, `Providers` list, `Quotas` map
#### New functions
- `DefaultProfiles()` -- returns pre-configured profiles for all four providers
- `NewWithConfig(Config)` -- creates limiter from explicit configuration
- `SetQuota(model, quota)` -- runtime quota modification
- `AddProvider(provider)` -- loads all default quotas for a provider at runtime
#### Provider defaults (Feb 2026)
| Provider | Models | RPM | TPM | RPD |
|----------|--------|-----|-----|-----|
| Gemini | gemini-3-pro-preview, gemini-3-flash-preview, gemini-2.5-pro | 150 | 1M | 1000 |
| Gemini | gemini-2.0-flash | 150 | 1M | unlimited |
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
| OpenAI | gpt-4o, gpt-4-turbo, o1 | 500 | 30K | unlimited |
| OpenAI | gpt-4o-mini, o1-mini, o3-mini | 500 | 200K | unlimited |
| Anthropic | claude-opus-4, claude-sonnet-4 | 50 | 40K | unlimited |
| Anthropic | claude-haiku-3.5 | 50 | 50K | unlimited |
| Local | (none by default) | -- | -- | -- |
#### Backward compatibility
`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Verified by `TestNewBackwardCompatibility` which asserts exact parity with the original hardcoded values.
#### Design notes
- Explicit quotas in `Config.Quotas` override provider defaults (merge-on-top pattern)
- Local provider has no default quotas -- users add per-model limits for hardware throttling
- `AddProvider()` is additive -- calling it does not remove existing quotas
- All new methods are mutex-protected and safe for concurrent use

91
TODO.md
View file

@ -1,91 +0,0 @@
# TODO.md -- go-ratelimit
Dispatched from core/go orchestration. Pick up tasks in order.
---
## Phase 0: Hardening & Test Coverage
- [x] **Expand test coverage** -- `ratelimit_test.go` rewritten with testify. Tests for: `CanSend()` at exact limits (RPM, TPM, RPD boundaries), `RecordUsage()` with concurrent goroutines, `WaitForCapacity()` timeout and immediate-capacity paths, `prune()` sliding window edge cases, daily reset logic (24h boundary), YAML persistence (save + reload), corrupt/unreadable state file recovery, `Reset()` single/all/nonexistent, `Stats()` known/unknown/quota-only models, `AllStats()` with pruning and daily reset.
- [x] **Race condition test** -- `go test -race ./...` with 20 goroutines calling `CanSend()` + `RecordUsage()` + `Stats()` concurrently. Additional tests: concurrent `Reset()` + `RecordUsage()` + `AllStats()`, concurrent multi-model access (5 models), concurrent `Persist()` + `Load()` filesystem race, concurrent `AllStats()` + `RecordUsage()`, concurrent `WaitForCapacity()` + `RecordUsage()`. All pass clean.
- [x] **Benchmark** -- 7 benchmarks: `BenchmarkCanSend` (1000-entry window), `BenchmarkRecordUsage`, `BenchmarkCanSendConcurrent` (parallel), `BenchmarkCanSendWithPrune` (500 old + 500 new), `BenchmarkStats` (1000 entries), `BenchmarkAllStats` (5 models x 200 entries), `BenchmarkPersist` (YAML I/O). Zero allocs on hot paths.
- [x] **`go vet ./...` clean** -- No warnings.
- **Coverage: 95.1%** (up from 77.1%). Remaining uncovered: `CountTokens` success path (hardcoded Google URL), `yaml.Marshal` error path in `Persist()`, `os.UserHomeDir` error path in `NewWithConfig`.
## Phase 1: Generalise Beyond Gemini
- [x] **Provider-agnostic config** -- Added `Provider` type, `ProviderProfile`, `Config` struct, `NewWithConfig()` constructor. Quotas are no longer hardcoded in `New()`.
- [x] **Quota profiles** -- `DefaultProfiles()` returns pre-configured profiles for Gemini, OpenAI (gpt-4o, o1, o3-mini), Anthropic (claude-opus-4, claude-sonnet-4, claude-haiku-3.5), and Local (empty, user-configurable).
- [x] **Configurable defaults** -- `Config` struct accepts `FilePath`, `Providers` list, and explicit `Quotas` map. Explicit quotas override provider defaults. YAML-serialisable.
- [x] **Backward compatibility** -- `New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Existing API unchanged. Test `TestNewBackwardCompatibility` verifies exact parity.
- [x] **Runtime configuration** -- `SetQuota()` and `AddProvider()` allow modifying quotas after construction. Both are mutex-protected.
## Phase 2: SQLite Persistent State
Current YAML persistence is single-process only. Phase 2 adds multi-process safe SQLite storage following the go-store pattern (`modernc.org/sqlite`, pure Go, no CGO).
### 2.1 SQLite Backend
- [x] **Add `modernc.org/sqlite` dependency**`go get modernc.org/sqlite`. Pure Go, compiles everywhere.
- [x] **Create `sqlite.go`** — Internal SQLite persistence layer:
- `type sqliteStore struct { db *sql.DB }` — wraps database/sql connection
- `func newSQLiteStore(dbPath string) (*sqliteStore, error)` — Open DB, set `PRAGMA journal_mode=WAL`, `PRAGMA busy_timeout=5000`, `db.SetMaxOpenConns(1)`. Create schema:
```sql
CREATE TABLE IF NOT EXISTS quotas (
model TEXT PRIMARY KEY,
max_rpm INTEGER NOT NULL DEFAULT 0,
max_tpm INTEGER NOT NULL DEFAULT 0,
max_rpd INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS requests (
model TEXT NOT NULL,
ts INTEGER NOT NULL -- UnixNano
);
CREATE TABLE IF NOT EXISTS tokens (
model TEXT NOT NULL,
ts INTEGER NOT NULL, -- UnixNano
count INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS daily (
model TEXT PRIMARY KEY,
day_start INTEGER NOT NULL,
day_count INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts);
CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
```
- `func (s *sqliteStore) saveQuotas(quotas map[string]ModelQuota) error` — UPSERT all quotas
- `func (s *sqliteStore) loadQuotas() (map[string]ModelQuota, error)` — SELECT all quotas
- `func (s *sqliteStore) saveState(state map[string]*UsageStats) error` — Transaction: DELETE old + INSERT requests/tokens/daily for each model
- `func (s *sqliteStore) loadState() (map[string]*UsageStats, error)` — SELECT and reconstruct UsageStats map
- `func (s *sqliteStore) close() error` — Close DB connection
### 2.2 Wire Into RateLimiter
- [x] **Add `Backend` field to Config**`Backend string` with values `"yaml"` (default), `"sqlite"`. Default `""` maps to `"yaml"` for backward compat.
- [x] **Update `Persist()` and `Load()`** — Check internal backend type. If SQLite, use `sqliteStore`; otherwise use existing YAML. Keep both paths working.
- [x] **Add `NewWithSQLite(dbPath string) (*RateLimiter, error)`** — Convenience constructor that creates a SQLite-backed limiter. Sets backend type, initialises DB.
- [x] **Graceful close** — Add `Close() error` method that closes SQLite DB if open. No-op for YAML backend.
### 2.3 Tests
- [x] **SQLite basic tests** — newSQLiteStore, saveQuotas/loadQuotas round-trip, saveState/loadState round-trip, close.
- [x] **SQLite integration** — NewWithSQLite, RecordUsage → Persist → Load → verify state preserved. Same test matrix as existing YAML tests but with SQLite backend.
- [x] **Concurrent SQLite** — 10 goroutines x 20 ops (RecordUsage + CanSend + Persist). Race-clean.
- [x] **YAML backward compat** — Existing tests pass unchanged (still default to YAML).
- [x] **Migration helper**`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — reads YAML state, writes to SQLite. Test with sample YAML.
- [x] **Corrupt DB recovery** — Truncated DB file → graceful error, fresh start.
## Phase 3: Integration
- [ ] Wire into go-ml backends for automatic rate limiting on inference calls
- [ ] Wire into go-ai facade so all providers share a unified rate limit layer
- [ ] Add metrics export (requests/minute, tokens/minute, rejections) for monitoring
---
## Workflow
1. Virgil in core/go writes tasks here after research
2. This repo's dedicated session picks up tasks in phase order
3. Mark `[x]` when done, note commit hash

286
docs/architecture.md Normal file
View file

@ -0,0 +1,286 @@
# Architecture
go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
three independent quota dimensions per model — requests per minute (RPM), tokens
per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
that can be persisted across process restarts via YAML or SQLite.
Module path: `forge.lthn.ai/core/go-ratelimit`
---
## Sliding Window Algorithm
The limiter maintains per-model `UsageStats` structs in memory:
```go
type UsageStats struct {
Requests []time.Time // timestamps of recent requests (1-minute window)
Tokens []TokenEntry // token counts with timestamps (1-minute window)
DayStart time.Time // when the current daily window started
DayCount int // total requests recorded since DayStart
}
```
Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
slices and discards entries older than `now - 1 minute`. Pruning is done
in-place to avoid allocation on the hot path:
```go
validReqs := 0
for _, t := range stats.Requests {
if t.After(window) {
stats.Requests[validReqs] = t
validReqs++
}
}
stats.Requests = stats.Requests[:validReqs]
```
The same loop runs for token entries. After pruning, `CanSend()` checks each
quota dimension in priority order: RPD first (cheapest check), then RPM, then
TPM. A zero value for any dimension means that dimension is unlimited. If all
three are zero the model is treated as fully unlimited and the check short-circuits
before touching any state.
### Daily Reset
The daily counter resets automatically inside `prune()`. When
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
to the current time. This means the daily window is a rolling 24-hour period
anchored to the first request of the day, not a calendar boundary.
### Concurrency
All reads and writes are protected by a single `sync.RWMutex`. Methods that
write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
where possible. The `CanSend()` method acquires a write lock because it calls
`prune()`, which mutates the state slices.
`go test -race ./...` passes clean with 20 goroutines performing concurrent
`CanSend()`, `RecordUsage()`, and `Stats()` calls.
---
## Provider and Quota Configuration
### Types
```go
type Provider string // "gemini", "openai", "anthropic", "local"
type ModelQuota struct {
MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
MaxTPM int `yaml:"max_tpm"`
MaxRPD int `yaml:"max_rpd"`
}
type Config struct {
FilePath string // default: ~/.core/ratelimits.yaml
Backend string // "yaml" (default) or "sqlite"
Quotas map[string]ModelQuota // explicit per-model overrides
Providers []Provider // provider profiles to load
}
```
### Quota Resolution
1. Provider profiles are loaded first (from `DefaultProfiles()`).
2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
`SetQuota()` and `AddProvider()` allow runtime modification; both are
mutex-protected. `AddProvider()` is additive — it does not remove existing
quotas for models outside the new provider's profile.
### Default Quotas (as of February 2026)
| Provider | Model | MaxRPM | MaxTPM | MaxRPD |
|-----------|------------------------|-----------|-----------|-----------|
| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited |
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited |
| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited |
| OpenAI | o1, o3-mini | 500 | varies | unlimited |
| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited |
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited |
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited |
| Local | (none by default) | user-defined |
The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
where the throttle limit is hardware rather than an API quota. No defaults are
provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
---
## YAML Persistence (Legacy)
The default backend serialises the entire `RateLimiter` struct — both the
`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
```yaml
quotas:
gemini-3-pro-preview:
max_rpm: 150
max_tpm: 1000000
max_rpd: 1000
state:
gemini-3-pro-preview:
requests:
- 2026-02-20T14:32:01.123456789Z
tokens:
- time: 2026-02-20T14:32:01.123456789Z
count: 1500
day_start: 2026-02-20T00:00:00Z
day_count: 42
```
`Persist()` creates parent directories with `os.MkdirAll` before writing.
`Load()` treats a missing file as an empty state (no error). Corrupt or
unreadable files return an error.
**Limitations of YAML backend:**
- Single-process only. Concurrent writes from multiple processes corrupt the
file because the write is not atomic at the OS level.
- The entire state is serialised on every `Persist()` call, which grows linearly
with the number of tracked models and entries.
- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
preserved by Go's time marshaller but depends on the YAML library.
---
## SQLite Backend
The SQLite backend was added in Phase 2 to support multi-process scenarios and
provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
Go port of SQLite that compiles without CGO.
### Connection Settings
```go
db.SetMaxOpenConns(1) // single connection for PRAGMA consistency
db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers
db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout
```
WAL mode allows one writer and multiple concurrent readers. The 5-second busy
timeout prevents immediate failure when a second process is mid-commit. A single
`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
at the file level; multiple Go connections to the same file through a single
process would not add throughput but would complicate locking.
### Schema
```sql
CREATE TABLE IF NOT EXISTS quotas (
model TEXT PRIMARY KEY,
max_rpm INTEGER NOT NULL DEFAULT 0,
max_tpm INTEGER NOT NULL DEFAULT 0,
max_rpd INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS requests (
model TEXT NOT NULL,
ts INTEGER NOT NULL -- UnixNano
);
CREATE TABLE IF NOT EXISTS tokens (
model TEXT NOT NULL,
ts INTEGER NOT NULL, -- UnixNano
count INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS daily (
model TEXT PRIMARY KEY,
day_start INTEGER NOT NULL, -- UnixNano
day_count INTEGER NOT NULL DEFAULT 0
);
CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts);
CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
```
Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
precision without relying on SQLite's text date format, and allows efficient
range queries using the composite indices.
### Save Strategy
`saveState()` uses a delete-then-insert pattern inside a single transaction.
All three state tables are truncated and rewritten atomically:
```go
tx.Exec("DELETE FROM requests")
tx.Exec("DELETE FROM tokens")
tx.Exec("DELETE FROM daily")
// then INSERT for every model in state
tx.Commit()
```
`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
existing quota rows are updated in place without deleting unrelated models.
### Constructors
```go
// YAML backend (default)
rl, err := ratelimit.New()
rl, err := ratelimit.NewWithConfig(cfg)
// SQLite backend
rl, err := ratelimit.NewWithSQLite(dbPath)
rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
defer rl.Close() // releases the database connection
```
`Close()` is a no-op on YAML-backed limiters.
---
## Migration Path
`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
state file and writes all quotas and usage state to a new SQLite database. The
function is idempotent — running it again on the same YAML file overwrites the
SQLite database state.
Typical one-time migration:
```go
err := ratelimit.MigrateYAMLToSQLite(
filepath.Join(home, ".core", "ratelimits.yaml"),
filepath.Join(home, ".core", "ratelimits.db"),
)
```
After migration, switch the constructor:
```go
// Before
rl, _ := ratelimit.New()
// After
rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
defer rl.Close()
```
The YAML file can be kept as a backup; the two backends do not share state.
---
## CountTokens
`CountTokens(apiKey, model, text string) (int, error)` calls the Google
Generative Language API to obtain an exact token count for a prompt string. It
is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
endpoint. The URL is not configurable, which prevents unit testing of the
success path without network access.
For other providers, callers must supply `estimatedTokens` directly to
`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
in API response metadata after a call completes.

207
docs/development.md Normal file
View file

@ -0,0 +1,207 @@
# Development Guide
## Prerequisites
- Go 1.25 or later (the module declares `go 1.25.5`)
- No CGO required — `modernc.org/sqlite` is a pure Go port
No C toolchain, no system SQLite library, no external build tools. A plain
`go build ./...` is sufficient.
---
## Build and Test
```bash
# Run all tests
go test ./...
# Run all tests with the race detector (required before every commit)
go test -race ./...
# Run a single test by name
go test -v -run TestCanSend ./...
# Run a single subtest
go test -v -run "TestCanSend/RPM_at_exact_limit_is_rejected" ./...
# Run benchmarks
go test -bench=. -benchmem ./...
# Run a specific benchmark
go test -bench=BenchmarkCanSend -benchmem ./...
# Check for vet issues
go vet ./...
# Tidy dependencies
go mod tidy
```
All three commands (`go test -race ./...`, `go vet ./...`, and `go mod tidy`)
must produce no errors or warnings before a commit is pushed.
---
## Test Patterns
### File Organisation
- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
- `sqlite_test.go` — Phase 2 (SQLite backend)
Both files are in `package ratelimit` (white-box tests) so they can access
unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
### Naming Convention
SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
- `_Good` — happy path
- `_Bad` — expected error conditions (invalid paths, corrupt input)
- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
Core logic tests use plain descriptive names without suffixes, grouped by
method with table-driven subtests.
### Test Helpers
`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
redirects the YAML file path into `t.TempDir()`:
```go
func newTestLimiter(t *testing.T) *RateLimiter {
t.Helper()
rl, err := New()
require.NoError(t, err)
rl.filePath = filepath.Join(t.TempDir(), "ratelimits.yaml")
return rl
}
```
Use `t.TempDir()` for all file paths in tests. Go cleans these up automatically
after each test completes.
### Testify Usage
Tests use `github.com/stretchr/testify` exclusively:
- `require.NoError(t, err)` — fail immediately on setup errors
- `assert.NoError(t, err)` — record failure but continue
- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons
- `assert.True / assert.False` — for boolean checks
- `assert.Empty / assert.Len` — for slice length checks
- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
### Race Tests
Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
assert anything beyond absence of data races (the race detector does the work):
```go
var wg sync.WaitGroup
for i := 0; i < 20; i++ {
wg.Add(1)
go func() {
defer wg.Done()
// concurrent operations
}()
}
wg.Wait()
```
Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
clean.
### Coverage
Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
be covered in unit tests without modifying the production code:
1. `CountTokens` success path — hardcoded Google API URL requires network access
2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
Do not lower coverage below 95% without a documented reason.
---
## Coding Standards
### Language
UK English throughout: colour, organisation, serialise, initialise, behaviour.
Do not use American spellings in identifiers, comments, or documentation.
### Go Style
- All exported types, functions, and fields must have doc comments
- Error strings must be lowercase and not end with punctuation (Go convention)
- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
prefix `ratelimit.` is included so errors identify their origin clearly
- No `init()` functions
- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
map on each call)
### Mutex Discipline
The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
even if they appear read-only, because `prune()` mutates slices
- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
snapshot of state
- Lock acquisition always happens at the top of the public method, never inside
a helper — helpers document "Caller must hold the lock"
- Never call a public method from inside another public method while holding
the lock (deadlock risk)
### Dependencies
Direct dependencies are intentionally minimal:
| Dependency | Purpose |
|------------|---------|
| `gopkg.in/yaml.v3` | YAML serialisation for legacy backend |
| `modernc.org/sqlite` | Pure Go SQLite for persistent backend |
| `github.com/stretchr/testify` | Test assertions (test-only) |
Do not add `database/sql` drivers beyond `modernc.org/sqlite`. Do not add HTTP
client libraries; the existing `CountTokens` function uses the standard library.
---
## Licence
EUPL-1.2. Every new source file must carry the standard header if the project
adopts per-file headers in future. Confirm with the project lead before adding
files under a different licence.
---
## Commit Convention
Format: `type(scope): description`
Common types: `feat`, `fix`, `test`, `refactor`, `docs`, `perf`, `chore`
Common scopes: `ratelimit`, `sqlite`, `persist`, `config`
Every commit must include:
```
Co-Authored-By: Virgil <virgil@lethean.io>
```
Example:
```
feat(sqlite): add WAL-mode SQLite backend with migration helper
Co-Authored-By: Virgil <virgil@lethean.io>
```
Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
pass. `go mod tidy` must produce no changes.

197
docs/history.md Normal file
View file

@ -0,0 +1,197 @@
# Project History
## Origin
go-ratelimit was extracted from the `pkg/ratelimit` package inside
`forge.lthn.ai/core/go` on 19 February 2026. The extraction gave the package
its own module path, repository, and independent development cadence.
Initial commit: `fa1a6fc``feat: extract go-ratelimit from core/go pkg/ratelimit`
At extraction the package implemented:
- Sliding window rate limiter with 1-minute window
- Daily request caps per model
- Token counting via Google `CountTokens` API
- Hardcoded Gemini quota defaults (`gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD)
- YAML persistence to `~/.core/ratelimits.yaml`
- Single test file with basic sliding window and quota enforcement tests
---
## Phase 0 — Hardening and Test Coverage
Commit: `3c63b10``feat(ratelimit): generalise beyond Gemini with provider profiles and push coverage to 95%`
Supplementary commit: `db958f2``test: expand race coverage and benchmarks`
Coverage increased from 77.1% to 95.1%. The test suite was rewritten using
testify with table-driven subtests throughout.
### Tests added
- `TestCanSend` — boundary conditions at exact RPM, TPM, and RPD limits;
RPM-only and TPM-only quotas; zero-token estimates; unknown and unlimited models
- `TestPrune` — pruning of old entries, retention of recent entries, daily reset
at 24-hour boundary, no-op on non-existent model, boundary-exact timestamps
- `TestRecordUsage` — fresh state, accumulation, insertion into existing state
- `TestReset` — single model, all models (empty string argument), non-existent model
- `TestWaitForCapacity` — context cancellation, pre-cancelled context,
immediate capacity, unknown model
- `TestStats` / `TestAllStats` — known, unknown, and quota-only models; pruning
and daily reset inside `AllStats()`
- `TestPersistAndLoad` — round-trip, missing file, corrupt YAML, unreadable file,
nested directory creation, unwritable directory
- `TestConcurrentAccess` — 20 goroutines x 50 ops each (CanSend + RecordUsage + Stats)
- `TestConcurrentResetAndRecord` — concurrent Reset + RecordUsage + AllStats
- `TestConcurrentMultipleModels` — 5 models, concurrent access
- `TestConcurrentPersistAndLoad` — filesystem race between Persist and Load
- `TestConcurrentWaitForCapacityAndRecordUsage` — WaitForCapacity racing RecordUsage
### Benchmarks added
- `BenchmarkCanSend` — 1,000-entry sliding window
- `BenchmarkRecordUsage`
- `BenchmarkCanSendConcurrent` — parallel goroutines
- `BenchmarkCanSendWithPrune` — 500 old + 500 new entries
- `BenchmarkStats` — 1,000-entry window
- `BenchmarkAllStats` — 5 models x 200 entries
- `BenchmarkPersist` — YAML I/O
### Remaining uncovered paths (5%)
These three paths are structurally impossible to cover in unit tests without
modifying production code:
1. `CountTokens` success path — the Google API URL is hardcoded; unit tests
cannot intercept the HTTP call without URL injection support
2. `yaml.Marshal` error path in `Persist()``yaml.Marshal` does not fail on
valid Go structs; the error branch exists for correctness only
3. `os.UserHomeDir()` error path in `NewWithConfig()` — triggered only when
`$HOME` is unset, which test infrastructure prevents
`go test -race ./...` passed clean. `go vet ./...` produced no warnings.
---
## Phase 1 — Generalisation Beyond Gemini
Commit: `3c63b10` — included in the same commit as Phase 0
The hardcoded Gemini quotas in `New()` were replaced with a provider-agnostic
configuration system without breaking the existing API.
### New types and functions
- `Provider` string type with constants: `ProviderGemini`, `ProviderOpenAI`,
`ProviderAnthropic`, `ProviderLocal`
- `ProviderProfile` — bundles a provider identifier with its model quota map
- `Config` — construction configuration accepting `FilePath`, `Backend`,
`Providers`, and `Quotas` fields
- `DefaultProfiles()` — returns fresh pre-configured profiles for all four providers
- `NewWithConfig(Config)` — creates a limiter from explicit configuration
- `SetQuota(model, quota)` — runtime quota modification, mutex-protected
- `AddProvider(provider)` — loads all default quotas for a provider at runtime,
additive, mutex-protected
### Backward compatibility
`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`.
`TestNewBackwardCompatibility` asserts exact parity with the original hardcoded
values. No existing call sites required modification.
### Design decision: merge-on-top
Explicit `Config.Quotas` override provider profile defaults. This allows callers
to use a provider profile for most models while customising specific model limits
without forking the entire profile.
---
## Phase 2 — SQLite Persistent State
Commit: `1afb1d6``feat(persist): Phase 2 — SQLite backend with WAL mode`
The YAML backend serialises the full state on every `Persist()` call and is
not safe for concurrent multi-process access. Phase 2 added a SQLite backend
using `modernc.org/sqlite` (pure Go, no CGO) following the go-store pattern
established elsewhere in the ecosystem.
### New constructors
- `NewWithSQLite(dbPath string)` — SQLite-backed limiter with Gemini defaults
- `NewWithSQLiteConfig(dbPath string, cfg Config)` — SQLite-backed with custom config
- `Close() error` — releases the database connection; no-op on YAML-backed limiters
### Migration
- `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — one-shot migration
helper that reads an existing YAML state file and writes all quotas and usage
state to a new SQLite database
### SQLite connection settings
- `PRAGMA journal_mode=WAL` — enables concurrent reads alongside a single writer
- `PRAGMA busy_timeout=5000` — 5-second wait on lock contention before returning an error
- `db.SetMaxOpenConns(1)` — single connection for PRAGMA consistency
### Tests added (sqlite_test.go)
- `TestNewSQLiteStore_Good / _Bad` — creation and invalid path handling
- `TestSQLiteQuotasRoundTrip_Good` — save/load round-trip
- `TestSQLiteQuotasUpsert_Good` — upsert replaces existing rows
- `TestSQLiteStateRoundTrip_Good` — multi-model state with nanosecond precision
- `TestSQLiteStateOverwrite_Good` — delete-then-insert atomicity
- `TestSQLiteEmptyState_Good` — fresh database returns empty maps
- `TestNewWithSQLite_Good / TestNewWithSQLiteConfig_Good` — constructor tests
- `TestSQLitePersistAndLoad_Good` — full persist + reload cycle
- `TestSQLitePersistMultipleModels_Good` — multi-provider persistence
- `TestSQLiteConcurrent_Good` — 10 goroutines x 20 ops, race-clean
- `TestYAMLBackwardCompat_Good` — existing YAML tests pass unchanged
- `TestMigrateYAMLToSQLite_Good / _Bad` — migration round-trip and error paths
- `TestSQLiteCorruptDB_Ugly / TestSQLiteTruncatedDB_Ugly` — graceful corrupt DB recovery
- `TestSQLiteEndToEnd_Good` — full two-session scenario
---
## Phase 3 — Integration (Planned)
Not yet implemented. Intended downstream integrations:
- Wire into `go-ml` backends so rate limiting is enforced automatically on
inference calls without caller involvement
- Wire into the `go-ai` facade so all providers share a single rate limit layer
- Export metrics (requests/minute, tokens/minute, rejection counts) for
monitoring dashboards
---
## Known Limitations
**CountTokens URL is hardcoded.** The `CountTokens` helper calls
`generativelanguage.googleapis.com` directly. There is no way to override the
base URL, which prevents testing the success path in unit tests and prevents
use with Gemini-compatible proxies. A future refactor would accept a base URL
parameter or an `http.Client`.
**saveState is a full table replace.** On every `Persist()` call, the `requests`,
`tokens`, and `daily` tables are truncated and rewritten. For a limiter tracking
many models with high RPM, this means writing hundreds of rows on every persist
call. A future optimisation would use incremental writes (insert-only, with
periodic vacuuming of expired rows).
**No TTL on SQLite rows.** Historical rows older than one minute are pruned from
the in-memory `UsageStats` on every operation but are written wholesale to
SQLite on `Persist()`. The database does not grow unboundedly between persist
cycles because `saveState` replaces all rows, but if `Persist()` is called
frequently the WAL file can grow transiently.
**WaitForCapacity polling interval is fixed at 1 second.** This is appropriate
for RPM-scale limits but is coarse for sub-second limits. If a caller needs
finer-grained waiting (e.g., smoothing requests within a minute), they must
implement their own loop.
**No automatic persistence.** `Persist()` must be called explicitly. If a
process exits without calling `Persist()`, any usage recorded since the last
persist is lost. Callers are responsible for calling `Persist()` at appropriate
intervals (e.g., after each `RecordUsage()` call, or on a ticker).