diff --git a/docs/architecture.md b/docs/architecture.md index 4cefb7a..e3d9f42 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -1,81 +1,72 @@ +--- +title: Architecture +description: Internals of go-ratelimit -- sliding window algorithm, provider quota system, persistence backends, and concurrency model. +--- + # Architecture go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces -three independent quota dimensions per model — requests per minute (RPM), tokens -per minute (TPM), and requests per day (RPD) — using an in-memory sliding window +three independent quota dimensions per model -- requests per minute (RPM), tokens +per minute (TPM), and requests per day (RPD) -- using an in-memory sliding window that can be persisted across process restarts via YAML or SQLite. Module path: `forge.lthn.ai/core/go-ratelimit` --- -## Sliding Window Algorithm +## Key Types -The limiter maintains per-model `UsageStats` structs in memory: +### RateLimiter + +The central struct. Holds the quota definitions, current usage state, a mutex for +thread safety, and an optional SQLite backend reference. + +```go +type RateLimiter struct { + mu sync.RWMutex + Quotas map[string]ModelQuota // per-model quota definitions + State map[string]*UsageStats // per-model sliding window state + filePath string // YAML file path (ignored when SQLite is active) + sqlite *sqliteStore // non-nil when using SQLite backend +} +``` + +### ModelQuota + +Defines the rate limits for a single model. A zero value in any field means that +dimension is unlimited. + +```go +type ModelQuota struct { + MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited) + MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited) + MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited) +} +``` + +### UsageStats + +Tracks the sliding window state for a single model. ```go type UsageStats struct { Requests []time.Time // timestamps of recent requests (1-minute window) Tokens []TokenEntry // token counts with timestamps (1-minute window) - DayStart time.Time // when the current daily window started - DayCount int // total requests recorded since DayStart + DayStart time.Time // when the current 24-hour window started + DayCount int // total requests since DayStart +} + +type TokenEntry struct { + Time time.Time + Count int // prompt + output tokens for this request } ``` -Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both -slices and discards entries older than `now - 1 minute`. Pruning is done -in-place to avoid allocation on the hot path: +### Config + +Controls `RateLimiter` initialisation. ```go -validReqs := 0 -for _, t := range stats.Requests { - if t.After(window) { - stats.Requests[validReqs] = t - validReqs++ - } -} -stats.Requests = stats.Requests[:validReqs] -``` - -The same loop runs for token entries. After pruning, `CanSend()` checks each -quota dimension in priority order: RPD first (cheapest check), then RPM, then -TPM. A zero value for any dimension means that dimension is unlimited. If all -three are zero the model is treated as fully unlimited and the check short-circuits -before touching any state. - -### Daily Reset - -The daily counter resets automatically inside `prune()`. When -`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set -to the current time. This means the daily window is a rolling 24-hour period -anchored to the first request of the day, not a calendar boundary. - -### Concurrency - -All reads and writes are protected by a single `sync.RWMutex`. Methods that -write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a -full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock -where possible. The `CanSend()` method acquires a write lock because it calls -`prune()`, which mutates the state slices. - -`go test -race ./...` passes clean with 20 goroutines performing concurrent -`CanSend()`, `RecordUsage()`, and `Stats()` calls. - ---- - -## Provider and Quota Configuration - -### Types - -```go -type Provider string // "gemini", "openai", "anthropic", "local" - -type ModelQuota struct { - MaxRPM int `yaml:"max_rpm"` // 0 = unlimited - MaxTPM int `yaml:"max_tpm"` - MaxRPD int `yaml:"max_rpd"` -} - type Config struct { FilePath string // default: ~/.core/ratelimits.yaml Backend string // "yaml" (default) or "sqlite" @@ -84,32 +75,110 @@ type Config struct { } ``` -### Quota Resolution +### Provider -1. Provider profiles are loaded first (from `DefaultProfiles()`). -2. Explicit `Config.Quotas` are merged on top, overriding any matching model. +A string type identifying an LLM provider. Four constants are defined: + +```go +type Provider string + +const ( + ProviderGemini Provider = "gemini" + ProviderOpenAI Provider = "openai" + ProviderAnthropic Provider = "anthropic" + ProviderLocal Provider = "local" +) +``` + +--- + +## Sliding Window Algorithm + +Every call to `CanSend()` or `Stats()` first calls `prune()`, which removes +entries older than one minute from the `Requests` and `Tokens` slices. Pruning +is done in-place using `slices.DeleteFunc` to minimise allocations: + +```go +window := now.Add(-1 * time.Minute) + +stats.Requests = slices.DeleteFunc(stats.Requests, func(t time.Time) bool { + return t.Before(window) +}) +stats.Tokens = slices.DeleteFunc(stats.Tokens, func(t TokenEntry) bool { + return t.Time.Before(window) +}) +``` + +After pruning, `CanSend()` checks each quota dimension. If all three limits +(RPM, TPM, RPD) are zero, the model is treated as fully unlimited and the +check short-circuits before touching any state. + +The check order is: RPD, then RPM, then TPM. RPD is checked first because it +is the cheapest comparison (a single integer). TPM is checked last because it +requires summing the token counts in the sliding window. + +### Daily Reset + +The daily counter resets automatically inside `prune()`. When +`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is +updated to the current time. The daily window is a rolling 24-hour period +anchored to the first request of the day, not a calendar boundary. + +### Background Pruning + +`BackgroundPrune(interval)` starts a goroutine that periodically prunes all +model states on a configurable interval. It returns a cancel function to stop +the pruner: + +```go +stop := rl.BackgroundPrune(30 * time.Second) +defer stop() +``` + +This prevents memory growth in long-running processes where some models may +accumulate stale entries between calls to `CanSend()`. + +### Memory Cleanup + +When `prune()` empties both the `Requests` and `Tokens` slices for a model, +and `DayCount` is also zero, the entire `UsageStats` entry is deleted from +the `State` map. This prevents memory leaks from models that were used once +and never again. + +--- + +## Provider and Quota Configuration + +### Quota Resolution Order + +1. Provider profiles are loaded first from `DefaultProfiles()`. +2. Explicit `Config.Quotas` are merged on top using `maps.Copy`, overriding any + matching model. 3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used. -`SetQuota()` and `AddProvider()` allow runtime modification; both are -mutex-protected. `AddProvider()` is additive — it does not remove existing -quotas for models outside the new provider's profile. +`SetQuota()` and `AddProvider()` allow runtime modification. Both acquire the +write lock. `AddProvider()` is additive -- it does not remove existing quotas for +models outside the new provider's profile. ### Default Quotas (as of February 2026) -| Provider | Model | MaxRPM | MaxTPM | MaxRPD | -|-----------|------------------------|-----------|-----------|-----------| -| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 | -| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 | -| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 | -| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited | -| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited | -| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited | -| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited | -| OpenAI | o1, o3-mini | 500 | varies | unlimited | -| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited | -| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited | -| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited | -| Local | (none by default) | user-defined | +| Provider | Model | MaxRPM | MaxTPM | MaxRPD | +|-----------|------------------------|-----------|-------------|-----------| +| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 | +| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 | +| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 | +| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited | +| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited | +| OpenAI | gpt-4o | 500 | 30,000 | unlimited | +| OpenAI | gpt-4o-mini | 500 | 200,000 | unlimited | +| OpenAI | gpt-4-turbo | 500 | 30,000 | unlimited | +| OpenAI | o1 | 500 | 30,000 | unlimited | +| OpenAI | o1-mini | 500 | 200,000 | unlimited | +| OpenAI | o3-mini | 500 | 200,000 | unlimited | +| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited | +| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited | +| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited | +| Local | (none by default) | user-defined | The Local provider exists for local inference backends (Ollama, MLX, llama.cpp) where the throttle limit is hardware rather than an API quota. No defaults are @@ -117,10 +186,54 @@ provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`. --- -## YAML Persistence (Legacy) +## Constructors -The default backend serialises the entire `RateLimiter` struct — both the -`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`. +| Function | Backend | Default Provider | +|----------|---------|------------------| +| `New()` | YAML | Gemini | +| `NewWithConfig(cfg)` | YAML | Configurable (Gemini if empty) | +| `NewWithSQLite(dbPath)` | SQLite | Gemini | +| `NewWithSQLiteConfig(dbPath, cfg)` | SQLite | Configurable (Gemini if empty) | + +`Close()` releases the database connection for SQLite-backed limiters. It is a +no-op on YAML-backed limiters. Always call `Close()` (or `defer rl.Close()`) +when using the SQLite backend. + +--- + +## Data Flow + +A typical request lifecycle: + +``` +1. CanSend(model, estimatedTokens) + |-- acquires write lock + |-- looks up ModelQuota for the model + |-- if unknown model or all-zero quota: returns true (allowed) + |-- calls prune(model) to discard stale entries + |-- checks RPD, RPM, TPM against the pruned state + '-- returns true/false + +2. (caller makes the API call) + +3. RecordUsage(model, promptTokens, outputTokens) + |-- acquires write lock + |-- calls prune(model) + |-- appends to Requests and Tokens slices + '-- increments DayCount + +4. Persist() + |-- acquires write lock, clones state, releases lock + |-- YAML: marshals to file + '-- SQLite: saves quotas and state in transactions +``` + +--- + +## YAML Persistence + +The default backend serialises both the `Quotas` map and the `State` map to a +YAML file at `~/.core/ratelimits.yaml` (configurable via `Config.FilePath`). ```yaml quotas: @@ -143,35 +256,30 @@ state: `Load()` treats a missing file as an empty state (no error). Corrupt or unreadable files return an error. -**Limitations of YAML backend:** +**Limitations of the YAML backend:** + - Single-process only. Concurrent writes from multiple processes corrupt the file because the write is not atomic at the OS level. -- The entire state is serialised on every `Persist()` call, which grows linearly - with the number of tracked models and entries. -- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is - preserved by Go's time marshaller but depends on the YAML library. +- The entire state is serialised on every `Persist()` call. +- Timestamps are serialised as RFC 3339 strings. --- ## SQLite Backend -The SQLite backend was added in Phase 2 to support multi-process scenarios and -provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure -Go port of SQLite that compiles without CGO. +The SQLite backend supports multi-process scenarios. It uses `modernc.org/sqlite`, +a pure Go port of SQLite that compiles without CGO. ### Connection Settings ```go -db.SetMaxOpenConns(1) // single connection for PRAGMA consistency -db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers -db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout +db.SetMaxOpenConns(1) // single connection for PRAGMA consistency +db.Exec("PRAGMA journal_mode=WAL") // concurrent readers alongside a single writer +db.Exec("PRAGMA busy_timeout=5000") // 5-second wait on lock contention ``` WAL mode allows one writer and multiple concurrent readers. The 5-second busy -timeout prevents immediate failure when a second process is mid-commit. A single -`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency -at the file level; multiple Go connections to the same file through a single -process would not add throughput but would complicate locking. +timeout prevents immediate failure when a second process is mid-commit. ### Schema @@ -196,7 +304,7 @@ CREATE TABLE IF NOT EXISTS tokens ( CREATE TABLE IF NOT EXISTS daily ( model TEXT PRIMARY KEY, - day_start INTEGER NOT NULL, -- UnixNano + day_start INTEGER NOT NULL, -- UnixNano day_count INTEGER NOT NULL DEFAULT 0 ); @@ -205,51 +313,22 @@ CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts); ``` Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond -precision without relying on SQLite's text date format, and allows efficient -range queries using the composite indices. +precision and allows efficient range queries using the composite indices. ### Save Strategy -`saveState()` uses a delete-then-insert pattern inside a single transaction. -All three state tables are truncated and rewritten atomically: - -```go -tx.Exec("DELETE FROM requests") -tx.Exec("DELETE FROM tokens") -tx.Exec("DELETE FROM daily") -// then INSERT for every model in state -tx.Commit() -``` - -`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so -existing quota rows are updated in place without deleting unrelated models. - -### Constructors - -```go -// YAML backend (default) -rl, err := ratelimit.New() -rl, err := ratelimit.NewWithConfig(cfg) - -// SQLite backend -rl, err := ratelimit.NewWithSQLite(dbPath) -rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg) - -defer rl.Close() // releases the database connection -``` - -`Close()` is a no-op on YAML-backed limiters. +- **Quotas**: `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert). Existing quota + rows are updated in place without deleting unrelated models. +- **State**: Delete-then-insert inside a single transaction. All three state + tables (`requests`, `tokens`, `daily`) are truncated and rewritten atomically. --- ## Migration Path -`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML -state file and writes all quotas and usage state to a new SQLite database. The -function is idempotent — running it again on the same YAML file overwrites the -SQLite database state. - -Typical one-time migration: +`MigrateYAMLToSQLite(yamlPath, sqlitePath)` reads an existing YAML state file +and writes all quotas and usage state to a new SQLite database. The function is +idempotent -- running it again overwrites the SQLite database state. ```go err := ratelimit.MigrateYAMLToSQLite( @@ -258,29 +337,59 @@ err := ratelimit.MigrateYAMLToSQLite( ) ``` -After migration, switch the constructor: +After migration, switch the constructor from `New()` to `NewWithSQLite()`. The +YAML file can be kept as a backup; the two backends do not share state. + +--- + +## Iterators + +Two Go 1.26+ iterators are provided for inspecting the limiter state: + +- `Models() iter.Seq[string]` -- returns a sorted sequence of all model names + (from both `Quotas` and `State` maps, deduplicated). +- `Iter() iter.Seq2[string, ModelStats]` -- returns sorted model names paired + with their current `ModelStats` snapshot. ```go -// Before -rl, _ := ratelimit.New() - -// After -rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db")) -defer rl.Close() +for model, stats := range rl.Iter() { + fmt.Printf("%s: %d/%d RPM, %d/%d TPM\n", + model, stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM) +} ``` -The YAML file can be kept as a backup; the two backends do not share state. - --- ## CountTokens -`CountTokens(apiKey, model, text string) (int, error)` calls the Google -Generative Language API to obtain an exact token count for a prompt string. It -is Gemini-specific and hardcodes the `generativelanguage.googleapis.com` -endpoint. The URL is not configurable, which prevents unit testing of the -success path without network access. +`CountTokens(ctx, apiKey, model, text)` calls the Google Generative Language API +to obtain an exact token count for a prompt string. It is Gemini-specific and +hardcodes the `generativelanguage.googleapis.com` endpoint. For other providers, callers must supply `estimatedTokens` directly to -`CanSend()` and `RecordUsage()`. Accurate token counts are typically available -in API response metadata after a call completes. +`CanSend()`. Accurate token counts are typically available in API response +metadata after a call completes. + +--- + +## Concurrency Model + +All reads and writes are protected by a single `sync.RWMutex` on the +`RateLimiter` struct. + +| Method | Lock type | Reason | +|--------|-----------|--------| +| `CanSend()` | Write | Calls `prune()`, which mutates state slices | +| `RecordUsage()` | Write | Appends to state slices | +| `Reset()` | Write | Deletes state entries | +| `Load()` | Write | Replaces in-memory state | +| `SetQuota()` | Write | Modifies quota map | +| `AddProvider()` | Write | Modifies quota map | +| `Persist()` | Write (brief) | Clones state, then releases lock before I/O | +| `Stats()` | Write | Calls `prune()` | +| `AllStats()` | Write | Prunes inline | +| `Models()` | Read | Reads keys only | + +`Persist()` minimises lock contention by cloning the state under a write lock, +then performing I/O after releasing the lock. The test suite passes clean under +`go test -race ./...` with 20 goroutines performing concurrent operations. diff --git a/docs/development.md b/docs/development.md index 8d656a2..1f5ceff 100644 --- a/docs/development.md +++ b/docs/development.md @@ -1,9 +1,14 @@ +--- +title: Development Guide +description: How to build, test, and contribute to go-ratelimit -- prerequisites, test patterns, coding standards, and commit conventions. +--- + # Development Guide ## Prerequisites -- Go 1.25 or later (the module declares `go 1.25.5`) -- No CGO required — `modernc.org/sqlite` is a pure Go port +- **Go 1.26** or later (the module declares `go 1.26.0`) +- No CGO required -- `modernc.org/sqlite` is a pure Go port No C toolchain, no system SQLite library, no external build tools. A plain `go build ./...` is sufficient. @@ -34,6 +39,9 @@ go test -bench=BenchmarkCanSend -benchmem ./... # Check for vet issues go vet ./... +# Lint (requires golangci-lint) +golangci-lint run ./... + # Tidy dependencies go mod tidy ``` @@ -43,31 +51,35 @@ must produce no errors or warnings before a commit is pushed. --- -## Test Patterns +## Test Organisation -### File Organisation +### File Layout -- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles) -- `sqlite_test.go` — Phase 2 (SQLite backend) +| File | Scope | +|------|-------| +| `ratelimit_test.go` | Core sliding window logic, provider profiles, concurrency, benchmarks | +| `sqlite_test.go` | SQLite backend, migration, concurrent persistence | +| `error_test.go` | SQLite and YAML error paths | +| `iter_test.go` | `Models()` and `Iter()` iterators, `CountTokens` edge cases | -Both files are in `package ratelimit` (white-box tests) so they can access +All test files are in `package ratelimit` (white-box tests), giving access to unexported fields and methods such as `prune()`, `filePath`, and `sqlite`. ### Naming Convention SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern: -- `_Good` — happy path -- `_Bad` — expected error conditions (invalid paths, corrupt input) -- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files) +- `_Good` -- happy path +- `_Bad` -- expected error conditions (invalid paths, corrupt input) +- `_Ugly` -- panic-adjacent edge cases (corrupt database files, truncated files) -Core logic tests use plain descriptive names without suffixes, grouped by -method with table-driven subtests. +Core logic tests use plain descriptive names without suffixes, grouped by method +with table-driven subtests. -### Test Helpers +### Test Helper -`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and -redirects the YAML file path into `t.TempDir()`: +`newTestLimiter(t)` creates a `RateLimiter` with Gemini defaults and redirects +the YAML file path into `t.TempDir()`: ```go func newTestLimiter(t *testing.T) *RateLimiter { @@ -86,43 +98,64 @@ after each test completes. Tests use `github.com/stretchr/testify` exclusively: -- `require.NoError(t, err)` — fail immediately on setup errors -- `assert.NoError(t, err)` — record failure but continue -- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons -- `assert.True / assert.False` — for boolean checks -- `assert.Empty / assert.Len` — for slice length checks -- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors +- `require.NoError(t, err)` -- fail immediately on setup errors +- `assert.NoError(t, err)` -- record failure but continue +- `assert.Equal(t, expected, actual, "message")` -- prefer over raw comparisons +- `assert.True` / `assert.False` -- for boolean checks +- `assert.Empty` / `assert.Len` -- for slice length checks +- `assert.ErrorIs(t, err, target)` -- for sentinel errors Do not use `t.Error`, `t.Fatal`, or `t.Log` directly. -### Race Tests +### Concurrency Tests -Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not -assert anything beyond absence of data races (the race detector does the work): +Race tests spin up goroutines and use `sync.WaitGroup`. Some assert specific +outcomes (e.g., correct RPD count after concurrent recordings), while others +rely solely on the race detector to catch data races: ```go var wg sync.WaitGroup -for i := range 20 { - wg.Add(1) - go func() { - defer wg.Done() - // concurrent operations - }() +for range 20 { + wg.Go(func() { + for range 50 { + rl.CanSend(model, 10) + rl.RecordUsage(model, 5, 5) + rl.Stats(model) + } + }) } wg.Wait() ``` -Run every concurrency test with `-race`. The CI baseline is `go test -race ./...` -clean. +Always run concurrency tests with `-race`. + +### Benchmarks + +The following benchmarks are included: + +| Benchmark | What it measures | +|-----------|------------------| +| `BenchmarkCanSend` | CanSend with a 1,000-entry sliding window | +| `BenchmarkRecordUsage` | Recording usage on a single model | +| `BenchmarkCanSendConcurrent` | Parallel CanSend across goroutines | +| `BenchmarkCanSendWithPrune` | CanSend with 500 old + 500 new entries | +| `BenchmarkStats` | Stats retrieval with a 1,000-entry window | +| `BenchmarkAllStats` | AllStats across 5 models x 200 entries each | +| `BenchmarkPersist` | YAML persistence I/O | +| `BenchmarkSQLitePersist` | SQLite persistence I/O | +| `BenchmarkSQLiteLoad` | SQLite state loading | ### Coverage -Current coverage: 95.1%. The remaining 5% consists of three paths that cannot -be covered in unit tests without modifying the production code: +Current coverage: 95.1%. The remaining paths cannot be covered in unit tests +without modifying production code: -1. `CountTokens` success path — hardcoded Google API URL requires network access -2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs -3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME` +1. `CountTokens` success path -- the Google API URL is hardcoded; unit tests + cannot intercept the HTTP call without URL injection support. +2. `yaml.Marshal` error path in `Persist()` -- `yaml.Marshal` does not fail on + valid Go structs. +3. `os.UserHomeDir()` error path in `NewWithConfig()` -- triggered only when + `$HOME` is unset, which test infrastructure prevents. Do not lower coverage below 95% without a documented reason. @@ -137,26 +170,25 @@ Do not use American spellings in identifiers, comments, or documentation. ### Go Style -- All exported types, functions, and fields must have doc comments -- Error strings must be lowercase and not end with punctuation (Go convention) -- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the - prefix `ratelimit.` is included so errors identify their origin clearly -- No `init()` functions -- No global mutable state outside of `DefaultProfiles()` (which returns a fresh - map on each call) +- All exported types, functions, and fields must have doc comments. +- Error strings must be lowercase and not end with punctuation (Go convention). +- Contextual errors use `fmt.Errorf("ratelimit.Function: what: %w", err)` so + errors identify their origin clearly. +- No `init()` functions. +- No global mutable state. `DefaultProfiles()` returns a fresh map on each call. ### Mutex Discipline The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules: -- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), - even if they appear read-only, because `prune()` mutates slices -- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a - snapshot of state +- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), even + if they appear read-only, because `prune()` mutates state slices. +- `Persist()` acquires the write lock briefly to clone state, then releases it + before performing I/O. - Lock acquisition always happens at the top of the public method, never inside - a helper — helpers document "Caller must hold the lock" -- Never call a public method from inside another public method while holding - the lock (deadlock risk) + a helper. Helpers document "Caller must hold the lock". +- Never call a public method from inside another public method while holding the + lock (deadlock risk). ### Dependencies @@ -175,9 +207,8 @@ client libraries; the existing `CountTokens` function uses the standard library. ## Licence -EUPL-1.2. Every new source file must carry the standard header if the project -adopts per-file headers in future. Confirm with the project lead before adding -files under a different licence. +EUPL-1.2. Confirm with the project lead before adding files under a different +licence. --- @@ -205,3 +236,17 @@ Co-Authored-By: Virgil Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both pass. `go mod tidy` must produce no changes. + +--- + +## Linting + +The project uses `golangci-lint` with the following enabled linters (see +`.golangci.yml`): + +- `govet`, `errcheck`, `staticcheck`, `unused`, `gosimple` +- `ineffassign`, `typecheck`, `gocritic`, `gofmt` + +Disabled linters: `exhaustive`, `wrapcheck`. + +Run `golangci-lint run ./...` to check before committing. diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..c73cc13 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,117 @@ +--- +title: go-ratelimit +description: Provider-agnostic sliding window rate limiter for LLM API calls, with YAML and SQLite persistence backends. +--- + +# go-ratelimit + +**Module**: `forge.lthn.ai/core/go-ratelimit` +**Licence**: EUPL-1.2 +**Go version**: 1.26+ + +go-ratelimit enforces requests-per-minute (RPM), tokens-per-minute (TPM), and +requests-per-day (RPD) quotas on a per-model basis using an in-memory sliding +window. It ships with default quota profiles for Gemini, OpenAI, Anthropic, and +a local inference provider. State persists across process restarts via YAML +(single-process) or SQLite with WAL mode (multi-process). A YAML-to-SQLite +migration helper is included. + +## Quick Start + +```go +import "forge.lthn.ai/core/go-ratelimit" + +// Create a limiter with Gemini defaults (YAML backend). +rl, err := ratelimit.New() +if err != nil { + log.Fatal(err) +} + +// Check capacity before sending. +if rl.CanSend("gemini-2.0-flash", 1500) { + // Make the API call... + rl.RecordUsage("gemini-2.0-flash", 1000, 500) // promptTokens, outputTokens +} + +// Persist state to disk for recovery across restarts. +if err := rl.Persist(); err != nil { + log.Printf("persist failed: %v", err) +} +``` + +### Multi-provider configuration + +```go +rl, err := ratelimit.NewWithConfig(ratelimit.Config{ + Providers: []ratelimit.Provider{ + ratelimit.ProviderGemini, + ratelimit.ProviderAnthropic, + }, + Quotas: map[string]ratelimit.ModelQuota{ + // Override a specific model's limits. + "gemini-3-pro-preview": {MaxRPM: 50, MaxTPM: 500000, MaxRPD: 200}, + // Add a custom model not in any profile. + "llama-3.3-70b": {MaxRPM: 5, MaxTPM: 50000, MaxRPD: 0}, + }, +}) +``` + +### SQLite backend (multi-process safe) + +```go +rl, err := ratelimit.NewWithSQLite("~/.core/ratelimits.db") +if err != nil { + log.Fatal(err) +} +defer rl.Close() + +// Load persisted state. +if err := rl.Load(); err != nil { + log.Fatal(err) +} + +// Use exactly as the YAML backend -- CanSend, RecordUsage, Persist, etc. +``` + +### Blocking until capacity is available + +```go +ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) +defer cancel() + +if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil { + log.Printf("timed out waiting for capacity: %v", err) + return +} +// Capacity is available; proceed with the API call. +``` + +## Package Layout + +The module is a single package with no sub-packages. + +| File | Purpose | +|------|---------| +| `ratelimit.go` | Core types (`RateLimiter`, `ModelQuota`, `Config`, `Provider`), sliding window logic, provider profiles, YAML persistence, `CountTokens` helper | +| `sqlite.go` | SQLite persistence backend (`sqliteStore`), schema creation, load/save operations | +| `ratelimit_test.go` | Tests for core logic, provider profiles, concurrency, and benchmarks | +| `sqlite_test.go` | Tests for SQLite backend, migration, and error recovery | +| `error_test.go` | Tests for SQLite and YAML error paths | +| `iter_test.go` | Tests for `Models()` and `Iter()` iterators, plus `CountTokens` edge cases | + +## Dependencies + +| Dependency | Purpose | Category | +|------------|---------|----------| +| `gopkg.in/yaml.v3` | YAML serialisation for the legacy persistence backend | Direct | +| `modernc.org/sqlite` | Pure Go SQLite driver (no CGO required) | Direct | +| `github.com/stretchr/testify` | Test assertions (`assert`, `require`) | Test only | + +All indirect dependencies are pulled in by `modernc.org/sqlite`. No C toolchain +or system SQLite library is needed. + +## Further Reading + +- [Architecture](architecture.md) -- sliding window algorithm, provider quotas, YAML and SQLite backends, concurrency model +- [Development](development.md) -- build commands, test patterns, coding standards, commit conventions +- [History](history.md) -- completed phases with commit hashes, known limitations