diff --git a/CLAUDE.md b/CLAUDE.md index a3bff6a..4c5a6dd 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,19 +1,28 @@ # CLAUDE.md -## What This Is +Token counting, model quotas, and sliding window rate limiter. -Token counting, model quotas, and sliding window rate limiter. Module: `forge.lthn.ai/core/go-ratelimit` +Module: `forge.lthn.ai/core/go-ratelimit` ## Commands ```bash -go test ./... # Run all tests -go test -v -run Name # Run single test +go test ./... # run all tests +go test -race ./... # race detector (required before commit) +go test -v -run Name ./... # single test +go vet ./... # vet check ``` -## Coding Standards +## Standards - UK English -- `go test ./...` must pass before commit +- `go test -race ./...` and `go vet ./...` must pass before commit - Conventional commits: `type(scope): description` - Co-Author: `Co-Authored-By: Virgil ` +- Coverage must not drop below 95% + +## Docs + +- `docs/architecture.md` — sliding window algorithm, provider quotas, YAML/SQLite backends +- `docs/development.md` — prerequisites, test patterns, coding standards +- `docs/history.md` — completed phases with commit hashes, known limitations diff --git a/FINDINGS.md b/FINDINGS.md deleted file mode 100644 index 0c5d194..0000000 --- a/FINDINGS.md +++ /dev/null @@ -1,106 +0,0 @@ -# FINDINGS.md -- go-ratelimit - -## 2026-02-19: Split from core/go (Virgil) - -### Origin - -Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026. - -### Architecture - -- Sliding window rate limiter (1-minute window) -- Daily request caps per model -- Token counting via Google `CountTokens` API -- Model-specific quota configuration - -### Gemini-Specific Defaults - -- `gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD -- Quotas are currently hardcoded -- needs generalisation (see TODO Phase 1) - -### Tests - -- 1 test file covering sliding window and quota enforcement - ---- - -## 2026-02-20: Phase 0 -- Hardening (Charon) - -### Coverage: 77.1% -> 95.1% - -Rewrote test suite with testify assert/require. Table-driven subtests throughout. - -#### Tests added - -- **CanSend boundaries**: exact RPM/TPM/RPD limits, RPM-only, TPM-only, zero-token estimates, unknown models, unlimited models -- **Prune**: keeps recent entries, prunes old ones, daily reset at 24h, boundary-exact timestamps, noop on non-existent model -- **RecordUsage**: fresh state, accumulation, existing state -- **Reset**: single model, all models (empty string), non-existent model -- **WaitForCapacity**: immediate capacity, context cancellation, pre-cancelled context, unknown model -- **Stats/AllStats**: known/unknown/quota-only models, pruning in AllStats, daily reset in AllStats -- **Persist/Load**: round-trip, non-existent file, corrupt YAML, unreadable file, nested directory creation, unwritable directory -- **Concurrency**: 20 goroutines x 50 ops (CanSend + RecordUsage + Stats), concurrent Reset + RecordUsage + AllStats -- **Benchmarks**: BenchmarkCanSend (1000-entry window), BenchmarkRecordUsage, BenchmarkCanSendConcurrent - -#### Remaining uncovered (5%) - -- `CountTokens` success path: hardcoded Google URL prevents unit testing without URL injection. Only the connection-error path is covered. -- `yaml.Marshal` error in `Persist()`: virtually impossible to trigger with valid structs. -- `os.UserHomeDir` error in `NewWithConfig()`: only fails when `$HOME` is unset. - -### Race detector - -`go test -race ./...` passes clean. The `sync.RWMutex` correctly guards all shared state. - -### go vet - -No warnings. - ---- - -## 2026-02-20: Phase 1 -- Generalisation (Charon) - -### Problem - -Hardcoded Gemini-specific quotas in `New()`. No way to configure for other providers. - -### Solution - -Introduced provider-agnostic configuration without breaking existing API. - -#### New types - -- `Provider` -- string type with constants: `ProviderGemini`, `ProviderOpenAI`, `ProviderAnthropic`, `ProviderLocal` -- `ProviderProfile` -- bundles provider identity with model quotas map -- `Config` -- construction config with `FilePath`, `Providers` list, `Quotas` map - -#### New functions - -- `DefaultProfiles()` -- returns pre-configured profiles for all four providers -- `NewWithConfig(Config)` -- creates limiter from explicit configuration -- `SetQuota(model, quota)` -- runtime quota modification -- `AddProvider(provider)` -- loads all default quotas for a provider at runtime - -#### Provider defaults (Feb 2026) - -| Provider | Models | RPM | TPM | RPD | -|----------|--------|-----|-----|-----| -| Gemini | gemini-3-pro-preview, gemini-3-flash-preview, gemini-2.5-pro | 150 | 1M | 1000 | -| Gemini | gemini-2.0-flash | 150 | 1M | unlimited | -| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited | -| OpenAI | gpt-4o, gpt-4-turbo, o1 | 500 | 30K | unlimited | -| OpenAI | gpt-4o-mini, o1-mini, o3-mini | 500 | 200K | unlimited | -| Anthropic | claude-opus-4, claude-sonnet-4 | 50 | 40K | unlimited | -| Anthropic | claude-haiku-3.5 | 50 | 50K | unlimited | -| Local | (none by default) | -- | -- | -- | - -#### Backward compatibility - -`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Verified by `TestNewBackwardCompatibility` which asserts exact parity with the original hardcoded values. - -#### Design notes - -- Explicit quotas in `Config.Quotas` override provider defaults (merge-on-top pattern) -- Local provider has no default quotas -- users add per-model limits for hardware throttling -- `AddProvider()` is additive -- calling it does not remove existing quotas -- All new methods are mutex-protected and safe for concurrent use diff --git a/TODO.md b/TODO.md deleted file mode 100644 index bb8b103..0000000 --- a/TODO.md +++ /dev/null @@ -1,91 +0,0 @@ -# TODO.md -- go-ratelimit - -Dispatched from core/go orchestration. Pick up tasks in order. - ---- - -## Phase 0: Hardening & Test Coverage - -- [x] **Expand test coverage** -- `ratelimit_test.go` rewritten with testify. Tests for: `CanSend()` at exact limits (RPM, TPM, RPD boundaries), `RecordUsage()` with concurrent goroutines, `WaitForCapacity()` timeout and immediate-capacity paths, `prune()` sliding window edge cases, daily reset logic (24h boundary), YAML persistence (save + reload), corrupt/unreadable state file recovery, `Reset()` single/all/nonexistent, `Stats()` known/unknown/quota-only models, `AllStats()` with pruning and daily reset. -- [x] **Race condition test** -- `go test -race ./...` with 20 goroutines calling `CanSend()` + `RecordUsage()` + `Stats()` concurrently. Additional tests: concurrent `Reset()` + `RecordUsage()` + `AllStats()`, concurrent multi-model access (5 models), concurrent `Persist()` + `Load()` filesystem race, concurrent `AllStats()` + `RecordUsage()`, concurrent `WaitForCapacity()` + `RecordUsage()`. All pass clean. -- [x] **Benchmark** -- 7 benchmarks: `BenchmarkCanSend` (1000-entry window), `BenchmarkRecordUsage`, `BenchmarkCanSendConcurrent` (parallel), `BenchmarkCanSendWithPrune` (500 old + 500 new), `BenchmarkStats` (1000 entries), `BenchmarkAllStats` (5 models x 200 entries), `BenchmarkPersist` (YAML I/O). Zero allocs on hot paths. -- [x] **`go vet ./...` clean** -- No warnings. -- **Coverage: 95.1%** (up from 77.1%). Remaining uncovered: `CountTokens` success path (hardcoded Google URL), `yaml.Marshal` error path in `Persist()`, `os.UserHomeDir` error path in `NewWithConfig`. - -## Phase 1: Generalise Beyond Gemini - -- [x] **Provider-agnostic config** -- Added `Provider` type, `ProviderProfile`, `Config` struct, `NewWithConfig()` constructor. Quotas are no longer hardcoded in `New()`. -- [x] **Quota profiles** -- `DefaultProfiles()` returns pre-configured profiles for Gemini, OpenAI (gpt-4o, o1, o3-mini), Anthropic (claude-opus-4, claude-sonnet-4, claude-haiku-3.5), and Local (empty, user-configurable). -- [x] **Configurable defaults** -- `Config` struct accepts `FilePath`, `Providers` list, and explicit `Quotas` map. Explicit quotas override provider defaults. YAML-serialisable. -- [x] **Backward compatibility** -- `New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Existing API unchanged. Test `TestNewBackwardCompatibility` verifies exact parity. -- [x] **Runtime configuration** -- `SetQuota()` and `AddProvider()` allow modifying quotas after construction. Both are mutex-protected. - -## Phase 2: SQLite Persistent State - -Current YAML persistence is single-process only. Phase 2 adds multi-process safe SQLite storage following the go-store pattern (`modernc.org/sqlite`, pure Go, no CGO). - -### 2.1 SQLite Backend - -- [x] **Add `modernc.org/sqlite` dependency** — `go get modernc.org/sqlite`. Pure Go, compiles everywhere. -- [x] **Create `sqlite.go`** — Internal SQLite persistence layer: - - `type sqliteStore struct { db *sql.DB }` — wraps database/sql connection - - `func newSQLiteStore(dbPath string) (*sqliteStore, error)` — Open DB, set `PRAGMA journal_mode=WAL`, `PRAGMA busy_timeout=5000`, `db.SetMaxOpenConns(1)`. Create schema: - ```sql - CREATE TABLE IF NOT EXISTS quotas ( - model TEXT PRIMARY KEY, - max_rpm INTEGER NOT NULL DEFAULT 0, - max_tpm INTEGER NOT NULL DEFAULT 0, - max_rpd INTEGER NOT NULL DEFAULT 0 - ); - CREATE TABLE IF NOT EXISTS requests ( - model TEXT NOT NULL, - ts INTEGER NOT NULL -- UnixNano - ); - CREATE TABLE IF NOT EXISTS tokens ( - model TEXT NOT NULL, - ts INTEGER NOT NULL, -- UnixNano - count INTEGER NOT NULL - ); - CREATE TABLE IF NOT EXISTS daily ( - model TEXT PRIMARY KEY, - day_start INTEGER NOT NULL, - day_count INTEGER NOT NULL DEFAULT 0 - ); - CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts); - CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts); - ``` - - `func (s *sqliteStore) saveQuotas(quotas map[string]ModelQuota) error` — UPSERT all quotas - - `func (s *sqliteStore) loadQuotas() (map[string]ModelQuota, error)` — SELECT all quotas - - `func (s *sqliteStore) saveState(state map[string]*UsageStats) error` — Transaction: DELETE old + INSERT requests/tokens/daily for each model - - `func (s *sqliteStore) loadState() (map[string]*UsageStats, error)` — SELECT and reconstruct UsageStats map - - `func (s *sqliteStore) close() error` — Close DB connection - -### 2.2 Wire Into RateLimiter - -- [x] **Add `Backend` field to Config** — `Backend string` with values `"yaml"` (default), `"sqlite"`. Default `""` maps to `"yaml"` for backward compat. -- [x] **Update `Persist()` and `Load()`** — Check internal backend type. If SQLite, use `sqliteStore`; otherwise use existing YAML. Keep both paths working. -- [x] **Add `NewWithSQLite(dbPath string) (*RateLimiter, error)`** — Convenience constructor that creates a SQLite-backed limiter. Sets backend type, initialises DB. -- [x] **Graceful close** — Add `Close() error` method that closes SQLite DB if open. No-op for YAML backend. - -### 2.3 Tests - -- [x] **SQLite basic tests** — newSQLiteStore, saveQuotas/loadQuotas round-trip, saveState/loadState round-trip, close. -- [x] **SQLite integration** — NewWithSQLite, RecordUsage → Persist → Load → verify state preserved. Same test matrix as existing YAML tests but with SQLite backend. -- [x] **Concurrent SQLite** — 10 goroutines x 20 ops (RecordUsage + CanSend + Persist). Race-clean. -- [x] **YAML backward compat** — Existing tests pass unchanged (still default to YAML). -- [x] **Migration helper** — `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — reads YAML state, writes to SQLite. Test with sample YAML. -- [x] **Corrupt DB recovery** — Truncated DB file → graceful error, fresh start. - -## Phase 3: Integration - -- [ ] Wire into go-ml backends for automatic rate limiting on inference calls -- [ ] Wire into go-ai facade so all providers share a unified rate limit layer -- [ ] Add metrics export (requests/minute, tokens/minute, rejections) for monitoring - ---- - -## Workflow - -1. Virgil in core/go writes tasks here after research -2. This repo's dedicated session picks up tasks in phase order -3. Mark `[x]` when done, note commit hash diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..4cefb7a --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,286 @@ +# Architecture + +go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces +three independent quota dimensions per model — requests per minute (RPM), tokens +per minute (TPM), and requests per day (RPD) — using an in-memory sliding window +that can be persisted across process restarts via YAML or SQLite. + +Module path: `forge.lthn.ai/core/go-ratelimit` + +--- + +## Sliding Window Algorithm + +The limiter maintains per-model `UsageStats` structs in memory: + +```go +type UsageStats struct { + Requests []time.Time // timestamps of recent requests (1-minute window) + Tokens []TokenEntry // token counts with timestamps (1-minute window) + DayStart time.Time // when the current daily window started + DayCount int // total requests recorded since DayStart +} +``` + +Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both +slices and discards entries older than `now - 1 minute`. Pruning is done +in-place to avoid allocation on the hot path: + +```go +validReqs := 0 +for _, t := range stats.Requests { + if t.After(window) { + stats.Requests[validReqs] = t + validReqs++ + } +} +stats.Requests = stats.Requests[:validReqs] +``` + +The same loop runs for token entries. After pruning, `CanSend()` checks each +quota dimension in priority order: RPD first (cheapest check), then RPM, then +TPM. A zero value for any dimension means that dimension is unlimited. If all +three are zero the model is treated as fully unlimited and the check short-circuits +before touching any state. + +### Daily Reset + +The daily counter resets automatically inside `prune()`. When +`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set +to the current time. This means the daily window is a rolling 24-hour period +anchored to the first request of the day, not a calendar boundary. + +### Concurrency + +All reads and writes are protected by a single `sync.RWMutex`. Methods that +write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a +full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock +where possible. The `CanSend()` method acquires a write lock because it calls +`prune()`, which mutates the state slices. + +`go test -race ./...` passes clean with 20 goroutines performing concurrent +`CanSend()`, `RecordUsage()`, and `Stats()` calls. + +--- + +## Provider and Quota Configuration + +### Types + +```go +type Provider string // "gemini", "openai", "anthropic", "local" + +type ModelQuota struct { + MaxRPM int `yaml:"max_rpm"` // 0 = unlimited + MaxTPM int `yaml:"max_tpm"` + MaxRPD int `yaml:"max_rpd"` +} + +type Config struct { + FilePath string // default: ~/.core/ratelimits.yaml + Backend string // "yaml" (default) or "sqlite" + Quotas map[string]ModelQuota // explicit per-model overrides + Providers []Provider // provider profiles to load +} +``` + +### Quota Resolution + +1. Provider profiles are loaded first (from `DefaultProfiles()`). +2. Explicit `Config.Quotas` are merged on top, overriding any matching model. +3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used. + +`SetQuota()` and `AddProvider()` allow runtime modification; both are +mutex-protected. `AddProvider()` is additive — it does not remove existing +quotas for models outside the new provider's profile. + +### Default Quotas (as of February 2026) + +| Provider | Model | MaxRPM | MaxTPM | MaxRPD | +|-----------|------------------------|-----------|-----------|-----------| +| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 | +| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 | +| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 | +| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited | +| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited | +| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited | +| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited | +| OpenAI | o1, o3-mini | 500 | varies | unlimited | +| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited | +| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited | +| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited | +| Local | (none by default) | user-defined | + +The Local provider exists for local inference backends (Ollama, MLX, llama.cpp) +where the throttle limit is hardware rather than an API quota. No defaults are +provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`. + +--- + +## YAML Persistence (Legacy) + +The default backend serialises the entire `RateLimiter` struct — both the +`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`. + +```yaml +quotas: + gemini-3-pro-preview: + max_rpm: 150 + max_tpm: 1000000 + max_rpd: 1000 +state: + gemini-3-pro-preview: + requests: + - 2026-02-20T14:32:01.123456789Z + tokens: + - time: 2026-02-20T14:32:01.123456789Z + count: 1500 + day_start: 2026-02-20T00:00:00Z + day_count: 42 +``` + +`Persist()` creates parent directories with `os.MkdirAll` before writing. +`Load()` treats a missing file as an empty state (no error). Corrupt or +unreadable files return an error. + +**Limitations of YAML backend:** +- Single-process only. Concurrent writes from multiple processes corrupt the + file because the write is not atomic at the OS level. +- The entire state is serialised on every `Persist()` call, which grows linearly + with the number of tracked models and entries. +- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is + preserved by Go's time marshaller but depends on the YAML library. + +--- + +## SQLite Backend + +The SQLite backend was added in Phase 2 to support multi-process scenarios and +provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure +Go port of SQLite that compiles without CGO. + +### Connection Settings + +```go +db.SetMaxOpenConns(1) // single connection for PRAGMA consistency +db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers +db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout +``` + +WAL mode allows one writer and multiple concurrent readers. The 5-second busy +timeout prevents immediate failure when a second process is mid-commit. A single +`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency +at the file level; multiple Go connections to the same file through a single +process would not add throughput but would complicate locking. + +### Schema + +```sql +CREATE TABLE IF NOT EXISTS quotas ( + model TEXT PRIMARY KEY, + max_rpm INTEGER NOT NULL DEFAULT 0, + max_tpm INTEGER NOT NULL DEFAULT 0, + max_rpd INTEGER NOT NULL DEFAULT 0 +); + +CREATE TABLE IF NOT EXISTS requests ( + model TEXT NOT NULL, + ts INTEGER NOT NULL -- UnixNano +); + +CREATE TABLE IF NOT EXISTS tokens ( + model TEXT NOT NULL, + ts INTEGER NOT NULL, -- UnixNano + count INTEGER NOT NULL +); + +CREATE TABLE IF NOT EXISTS daily ( + model TEXT PRIMARY KEY, + day_start INTEGER NOT NULL, -- UnixNano + day_count INTEGER NOT NULL DEFAULT 0 +); + +CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts); +CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts); +``` + +Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond +precision without relying on SQLite's text date format, and allows efficient +range queries using the composite indices. + +### Save Strategy + +`saveState()` uses a delete-then-insert pattern inside a single transaction. +All three state tables are truncated and rewritten atomically: + +```go +tx.Exec("DELETE FROM requests") +tx.Exec("DELETE FROM tokens") +tx.Exec("DELETE FROM daily") +// then INSERT for every model in state +tx.Commit() +``` + +`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so +existing quota rows are updated in place without deleting unrelated models. + +### Constructors + +```go +// YAML backend (default) +rl, err := ratelimit.New() +rl, err := ratelimit.NewWithConfig(cfg) + +// SQLite backend +rl, err := ratelimit.NewWithSQLite(dbPath) +rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg) + +defer rl.Close() // releases the database connection +``` + +`Close()` is a no-op on YAML-backed limiters. + +--- + +## Migration Path + +`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML +state file and writes all quotas and usage state to a new SQLite database. The +function is idempotent — running it again on the same YAML file overwrites the +SQLite database state. + +Typical one-time migration: + +```go +err := ratelimit.MigrateYAMLToSQLite( + filepath.Join(home, ".core", "ratelimits.yaml"), + filepath.Join(home, ".core", "ratelimits.db"), +) +``` + +After migration, switch the constructor: + +```go +// Before +rl, _ := ratelimit.New() + +// After +rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db")) +defer rl.Close() +``` + +The YAML file can be kept as a backup; the two backends do not share state. + +--- + +## CountTokens + +`CountTokens(apiKey, model, text string) (int, error)` calls the Google +Generative Language API to obtain an exact token count for a prompt string. It +is Gemini-specific and hardcodes the `generativelanguage.googleapis.com` +endpoint. The URL is not configurable, which prevents unit testing of the +success path without network access. + +For other providers, callers must supply `estimatedTokens` directly to +`CanSend()` and `RecordUsage()`. Accurate token counts are typically available +in API response metadata after a call completes. diff --git a/docs/development.md b/docs/development.md new file mode 100644 index 0000000..471d1dc --- /dev/null +++ b/docs/development.md @@ -0,0 +1,207 @@ +# Development Guide + +## Prerequisites + +- Go 1.25 or later (the module declares `go 1.25.5`) +- No CGO required — `modernc.org/sqlite` is a pure Go port + +No C toolchain, no system SQLite library, no external build tools. A plain +`go build ./...` is sufficient. + +--- + +## Build and Test + +```bash +# Run all tests +go test ./... + +# Run all tests with the race detector (required before every commit) +go test -race ./... + +# Run a single test by name +go test -v -run TestCanSend ./... + +# Run a single subtest +go test -v -run "TestCanSend/RPM_at_exact_limit_is_rejected" ./... + +# Run benchmarks +go test -bench=. -benchmem ./... + +# Run a specific benchmark +go test -bench=BenchmarkCanSend -benchmem ./... + +# Check for vet issues +go vet ./... + +# Tidy dependencies +go mod tidy +``` + +All three commands (`go test -race ./...`, `go vet ./...`, and `go mod tidy`) +must produce no errors or warnings before a commit is pushed. + +--- + +## Test Patterns + +### File Organisation + +- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles) +- `sqlite_test.go` — Phase 2 (SQLite backend) + +Both files are in `package ratelimit` (white-box tests) so they can access +unexported fields and methods such as `prune()`, `filePath`, and `sqlite`. + +### Naming Convention + +SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern: + +- `_Good` — happy path +- `_Bad` — expected error conditions (invalid paths, corrupt input) +- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files) + +Core logic tests use plain descriptive names without suffixes, grouped by +method with table-driven subtests. + +### Test Helpers + +`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and +redirects the YAML file path into `t.TempDir()`: + +```go +func newTestLimiter(t *testing.T) *RateLimiter { + t.Helper() + rl, err := New() + require.NoError(t, err) + rl.filePath = filepath.Join(t.TempDir(), "ratelimits.yaml") + return rl +} +``` + +Use `t.TempDir()` for all file paths in tests. Go cleans these up automatically +after each test completes. + +### Testify Usage + +Tests use `github.com/stretchr/testify` exclusively: + +- `require.NoError(t, err)` — fail immediately on setup errors +- `assert.NoError(t, err)` — record failure but continue +- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons +- `assert.True / assert.False` — for boolean checks +- `assert.Empty / assert.Len` — for slice length checks +- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors + +Do not use `t.Error`, `t.Fatal`, or `t.Log` directly. + +### Race Tests + +Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not +assert anything beyond absence of data races (the race detector does the work): + +```go +var wg sync.WaitGroup +for i := 0; i < 20; i++ { + wg.Add(1) + go func() { + defer wg.Done() + // concurrent operations + }() +} +wg.Wait() +``` + +Run every concurrency test with `-race`. The CI baseline is `go test -race ./...` +clean. + +### Coverage + +Current coverage: 95.1%. The remaining 5% consists of three paths that cannot +be covered in unit tests without modifying the production code: + +1. `CountTokens` success path — hardcoded Google API URL requires network access +2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs +3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME` + +Do not lower coverage below 95% without a documented reason. + +--- + +## Coding Standards + +### Language + +UK English throughout: colour, organisation, serialise, initialise, behaviour. +Do not use American spellings in identifiers, comments, or documentation. + +### Go Style + +- All exported types, functions, and fields must have doc comments +- Error strings must be lowercase and not end with punctuation (Go convention) +- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the + prefix `ratelimit.` is included so errors identify their origin clearly +- No `init()` functions +- No global mutable state outside of `DefaultProfiles()` (which returns a fresh + map on each call) + +### Mutex Discipline + +The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules: + +- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), + even if they appear read-only, because `prune()` mutates slices +- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a + snapshot of state +- Lock acquisition always happens at the top of the public method, never inside + a helper — helpers document "Caller must hold the lock" +- Never call a public method from inside another public method while holding + the lock (deadlock risk) + +### Dependencies + +Direct dependencies are intentionally minimal: + +| Dependency | Purpose | +|------------|---------| +| `gopkg.in/yaml.v3` | YAML serialisation for legacy backend | +| `modernc.org/sqlite` | Pure Go SQLite for persistent backend | +| `github.com/stretchr/testify` | Test assertions (test-only) | + +Do not add `database/sql` drivers beyond `modernc.org/sqlite`. Do not add HTTP +client libraries; the existing `CountTokens` function uses the standard library. + +--- + +## Licence + +EUPL-1.2. Every new source file must carry the standard header if the project +adopts per-file headers in future. Confirm with the project lead before adding +files under a different licence. + +--- + +## Commit Convention + +Format: `type(scope): description` + +Common types: `feat`, `fix`, `test`, `refactor`, `docs`, `perf`, `chore` + +Common scopes: `ratelimit`, `sqlite`, `persist`, `config` + +Every commit must include: + +``` +Co-Authored-By: Virgil +``` + +Example: + +``` +feat(sqlite): add WAL-mode SQLite backend with migration helper + +Co-Authored-By: Virgil +``` + +Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both +pass. `go mod tidy` must produce no changes. diff --git a/docs/history.md b/docs/history.md new file mode 100644 index 0000000..78de23e --- /dev/null +++ b/docs/history.md @@ -0,0 +1,197 @@ +# Project History + +## Origin + +go-ratelimit was extracted from the `pkg/ratelimit` package inside +`forge.lthn.ai/core/go` on 19 February 2026. The extraction gave the package +its own module path, repository, and independent development cadence. + +Initial commit: `fa1a6fc` — `feat: extract go-ratelimit from core/go pkg/ratelimit` + +At extraction the package implemented: + +- Sliding window rate limiter with 1-minute window +- Daily request caps per model +- Token counting via Google `CountTokens` API +- Hardcoded Gemini quota defaults (`gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD) +- YAML persistence to `~/.core/ratelimits.yaml` +- Single test file with basic sliding window and quota enforcement tests + +--- + +## Phase 0 — Hardening and Test Coverage + +Commit: `3c63b10` — `feat(ratelimit): generalise beyond Gemini with provider profiles and push coverage to 95%` + +Supplementary commit: `db958f2` — `test: expand race coverage and benchmarks` + +Coverage increased from 77.1% to 95.1%. The test suite was rewritten using +testify with table-driven subtests throughout. + +### Tests added + +- `TestCanSend` — boundary conditions at exact RPM, TPM, and RPD limits; + RPM-only and TPM-only quotas; zero-token estimates; unknown and unlimited models +- `TestPrune` — pruning of old entries, retention of recent entries, daily reset + at 24-hour boundary, no-op on non-existent model, boundary-exact timestamps +- `TestRecordUsage` — fresh state, accumulation, insertion into existing state +- `TestReset` — single model, all models (empty string argument), non-existent model +- `TestWaitForCapacity` — context cancellation, pre-cancelled context, + immediate capacity, unknown model +- `TestStats` / `TestAllStats` — known, unknown, and quota-only models; pruning + and daily reset inside `AllStats()` +- `TestPersistAndLoad` — round-trip, missing file, corrupt YAML, unreadable file, + nested directory creation, unwritable directory +- `TestConcurrentAccess` — 20 goroutines x 50 ops each (CanSend + RecordUsage + Stats) +- `TestConcurrentResetAndRecord` — concurrent Reset + RecordUsage + AllStats +- `TestConcurrentMultipleModels` — 5 models, concurrent access +- `TestConcurrentPersistAndLoad` — filesystem race between Persist and Load +- `TestConcurrentWaitForCapacityAndRecordUsage` — WaitForCapacity racing RecordUsage + +### Benchmarks added + +- `BenchmarkCanSend` — 1,000-entry sliding window +- `BenchmarkRecordUsage` +- `BenchmarkCanSendConcurrent` — parallel goroutines +- `BenchmarkCanSendWithPrune` — 500 old + 500 new entries +- `BenchmarkStats` — 1,000-entry window +- `BenchmarkAllStats` — 5 models x 200 entries +- `BenchmarkPersist` — YAML I/O + +### Remaining uncovered paths (5%) + +These three paths are structurally impossible to cover in unit tests without +modifying production code: + +1. `CountTokens` success path — the Google API URL is hardcoded; unit tests + cannot intercept the HTTP call without URL injection support +2. `yaml.Marshal` error path in `Persist()` — `yaml.Marshal` does not fail on + valid Go structs; the error branch exists for correctness only +3. `os.UserHomeDir()` error path in `NewWithConfig()` — triggered only when + `$HOME` is unset, which test infrastructure prevents + +`go test -race ./...` passed clean. `go vet ./...` produced no warnings. + +--- + +## Phase 1 — Generalisation Beyond Gemini + +Commit: `3c63b10` — included in the same commit as Phase 0 + +The hardcoded Gemini quotas in `New()` were replaced with a provider-agnostic +configuration system without breaking the existing API. + +### New types and functions + +- `Provider` string type with constants: `ProviderGemini`, `ProviderOpenAI`, + `ProviderAnthropic`, `ProviderLocal` +- `ProviderProfile` — bundles a provider identifier with its model quota map +- `Config` — construction configuration accepting `FilePath`, `Backend`, + `Providers`, and `Quotas` fields +- `DefaultProfiles()` — returns fresh pre-configured profiles for all four providers +- `NewWithConfig(Config)` — creates a limiter from explicit configuration +- `SetQuota(model, quota)` — runtime quota modification, mutex-protected +- `AddProvider(provider)` — loads all default quotas for a provider at runtime, + additive, mutex-protected + +### Backward compatibility + +`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. +`TestNewBackwardCompatibility` asserts exact parity with the original hardcoded +values. No existing call sites required modification. + +### Design decision: merge-on-top + +Explicit `Config.Quotas` override provider profile defaults. This allows callers +to use a provider profile for most models while customising specific model limits +without forking the entire profile. + +--- + +## Phase 2 — SQLite Persistent State + +Commit: `1afb1d6` — `feat(persist): Phase 2 — SQLite backend with WAL mode` + +The YAML backend serialises the full state on every `Persist()` call and is +not safe for concurrent multi-process access. Phase 2 added a SQLite backend +using `modernc.org/sqlite` (pure Go, no CGO) following the go-store pattern +established elsewhere in the ecosystem. + +### New constructors + +- `NewWithSQLite(dbPath string)` — SQLite-backed limiter with Gemini defaults +- `NewWithSQLiteConfig(dbPath string, cfg Config)` — SQLite-backed with custom config +- `Close() error` — releases the database connection; no-op on YAML-backed limiters + +### Migration + +- `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — one-shot migration + helper that reads an existing YAML state file and writes all quotas and usage + state to a new SQLite database + +### SQLite connection settings + +- `PRAGMA journal_mode=WAL` — enables concurrent reads alongside a single writer +- `PRAGMA busy_timeout=5000` — 5-second wait on lock contention before returning an error +- `db.SetMaxOpenConns(1)` — single connection for PRAGMA consistency + +### Tests added (sqlite_test.go) + +- `TestNewSQLiteStore_Good / _Bad` — creation and invalid path handling +- `TestSQLiteQuotasRoundTrip_Good` — save/load round-trip +- `TestSQLiteQuotasUpsert_Good` — upsert replaces existing rows +- `TestSQLiteStateRoundTrip_Good` — multi-model state with nanosecond precision +- `TestSQLiteStateOverwrite_Good` — delete-then-insert atomicity +- `TestSQLiteEmptyState_Good` — fresh database returns empty maps +- `TestNewWithSQLite_Good / TestNewWithSQLiteConfig_Good` — constructor tests +- `TestSQLitePersistAndLoad_Good` — full persist + reload cycle +- `TestSQLitePersistMultipleModels_Good` — multi-provider persistence +- `TestSQLiteConcurrent_Good` — 10 goroutines x 20 ops, race-clean +- `TestYAMLBackwardCompat_Good` — existing YAML tests pass unchanged +- `TestMigrateYAMLToSQLite_Good / _Bad` — migration round-trip and error paths +- `TestSQLiteCorruptDB_Ugly / TestSQLiteTruncatedDB_Ugly` — graceful corrupt DB recovery +- `TestSQLiteEndToEnd_Good` — full two-session scenario + +--- + +## Phase 3 — Integration (Planned) + +Not yet implemented. Intended downstream integrations: + +- Wire into `go-ml` backends so rate limiting is enforced automatically on + inference calls without caller involvement +- Wire into the `go-ai` facade so all providers share a single rate limit layer +- Export metrics (requests/minute, tokens/minute, rejection counts) for + monitoring dashboards + +--- + +## Known Limitations + +**CountTokens URL is hardcoded.** The `CountTokens` helper calls +`generativelanguage.googleapis.com` directly. There is no way to override the +base URL, which prevents testing the success path in unit tests and prevents +use with Gemini-compatible proxies. A future refactor would accept a base URL +parameter or an `http.Client`. + +**saveState is a full table replace.** On every `Persist()` call, the `requests`, +`tokens`, and `daily` tables are truncated and rewritten. For a limiter tracking +many models with high RPM, this means writing hundreds of rows on every persist +call. A future optimisation would use incremental writes (insert-only, with +periodic vacuuming of expired rows). + +**No TTL on SQLite rows.** Historical rows older than one minute are pruned from +the in-memory `UsageStats` on every operation but are written wholesale to +SQLite on `Persist()`. The database does not grow unboundedly between persist +cycles because `saveState` replaces all rows, but if `Persist()` is called +frequently the WAL file can grow transiently. + +**WaitForCapacity polling interval is fixed at 1 second.** This is appropriate +for RPM-scale limits but is coarse for sub-second limits. If a caller needs +finer-grained waiting (e.g., smoothing requests within a minute), they must +implement their own loop. + +**No automatic persistence.** `Persist()` must be called explicitly. If a +process exits without calling `Persist()`, any usage recorded since the last +persist is lost. Callers are responsible for calling `Persist()` at appropriate +intervals (e.g., after each `RecordUsage()` call, or on a ticker).