docs: add human-friendly documentation
All checks were successful
Security Scan / security (push) Successful in 6s
Test / test (push) Successful in 45s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Snider 2026-03-11 13:02:40 +00:00
parent ae2cb96d38
commit 9572425e89
3 changed files with 475 additions and 204 deletions

View file

@ -1,81 +1,72 @@
---
title: Architecture
description: Internals of go-ratelimit -- sliding window algorithm, provider quota system, persistence backends, and concurrency model.
---
# Architecture # Architecture
go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
three independent quota dimensions per model — requests per minute (RPM), tokens three independent quota dimensions per model -- requests per minute (RPM), tokens
per minute (TPM), and requests per day (RPD) — using an in-memory sliding window per minute (TPM), and requests per day (RPD) -- using an in-memory sliding window
that can be persisted across process restarts via YAML or SQLite. that can be persisted across process restarts via YAML or SQLite.
Module path: `forge.lthn.ai/core/go-ratelimit` Module path: `forge.lthn.ai/core/go-ratelimit`
--- ---
## Sliding Window Algorithm ## Key Types
The limiter maintains per-model `UsageStats` structs in memory: ### RateLimiter
The central struct. Holds the quota definitions, current usage state, a mutex for
thread safety, and an optional SQLite backend reference.
```go
type RateLimiter struct {
mu sync.RWMutex
Quotas map[string]ModelQuota // per-model quota definitions
State map[string]*UsageStats // per-model sliding window state
filePath string // YAML file path (ignored when SQLite is active)
sqlite *sqliteStore // non-nil when using SQLite backend
}
```
### ModelQuota
Defines the rate limits for a single model. A zero value in any field means that
dimension is unlimited.
```go
type ModelQuota struct {
MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited)
MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited)
MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
}
```
### UsageStats
Tracks the sliding window state for a single model.
```go ```go
type UsageStats struct { type UsageStats struct {
Requests []time.Time // timestamps of recent requests (1-minute window) Requests []time.Time // timestamps of recent requests (1-minute window)
Tokens []TokenEntry // token counts with timestamps (1-minute window) Tokens []TokenEntry // token counts with timestamps (1-minute window)
DayStart time.Time // when the current daily window started DayStart time.Time // when the current 24-hour window started
DayCount int // total requests recorded since DayStart DayCount int // total requests since DayStart
}
type TokenEntry struct {
Time time.Time
Count int // prompt + output tokens for this request
} }
``` ```
Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both ### Config
slices and discards entries older than `now - 1 minute`. Pruning is done
in-place to avoid allocation on the hot path: Controls `RateLimiter` initialisation.
```go ```go
validReqs := 0
for _, t := range stats.Requests {
if t.After(window) {
stats.Requests[validReqs] = t
validReqs++
}
}
stats.Requests = stats.Requests[:validReqs]
```
The same loop runs for token entries. After pruning, `CanSend()` checks each
quota dimension in priority order: RPD first (cheapest check), then RPM, then
TPM. A zero value for any dimension means that dimension is unlimited. If all
three are zero the model is treated as fully unlimited and the check short-circuits
before touching any state.
### Daily Reset
The daily counter resets automatically inside `prune()`. When
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
to the current time. This means the daily window is a rolling 24-hour period
anchored to the first request of the day, not a calendar boundary.
### Concurrency
All reads and writes are protected by a single `sync.RWMutex`. Methods that
write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
where possible. The `CanSend()` method acquires a write lock because it calls
`prune()`, which mutates the state slices.
`go test -race ./...` passes clean with 20 goroutines performing concurrent
`CanSend()`, `RecordUsage()`, and `Stats()` calls.
---
## Provider and Quota Configuration
### Types
```go
type Provider string // "gemini", "openai", "anthropic", "local"
type ModelQuota struct {
MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
MaxTPM int `yaml:"max_tpm"`
MaxRPD int `yaml:"max_rpd"`
}
type Config struct { type Config struct {
FilePath string // default: ~/.core/ratelimits.yaml FilePath string // default: ~/.core/ratelimits.yaml
Backend string // "yaml" (default) or "sqlite" Backend string // "yaml" (default) or "sqlite"
@ -84,32 +75,110 @@ type Config struct {
} }
``` ```
### Quota Resolution ### Provider
1. Provider profiles are loaded first (from `DefaultProfiles()`). A string type identifying an LLM provider. Four constants are defined:
2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
```go
type Provider string
const (
ProviderGemini Provider = "gemini"
ProviderOpenAI Provider = "openai"
ProviderAnthropic Provider = "anthropic"
ProviderLocal Provider = "local"
)
```
---
## Sliding Window Algorithm
Every call to `CanSend()` or `Stats()` first calls `prune()`, which removes
entries older than one minute from the `Requests` and `Tokens` slices. Pruning
is done in-place using `slices.DeleteFunc` to minimise allocations:
```go
window := now.Add(-1 * time.Minute)
stats.Requests = slices.DeleteFunc(stats.Requests, func(t time.Time) bool {
return t.Before(window)
})
stats.Tokens = slices.DeleteFunc(stats.Tokens, func(t TokenEntry) bool {
return t.Time.Before(window)
})
```
After pruning, `CanSend()` checks each quota dimension. If all three limits
(RPM, TPM, RPD) are zero, the model is treated as fully unlimited and the
check short-circuits before touching any state.
The check order is: RPD, then RPM, then TPM. RPD is checked first because it
is the cheapest comparison (a single integer). TPM is checked last because it
requires summing the token counts in the sliding window.
### Daily Reset
The daily counter resets automatically inside `prune()`. When
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is
updated to the current time. The daily window is a rolling 24-hour period
anchored to the first request of the day, not a calendar boundary.
### Background Pruning
`BackgroundPrune(interval)` starts a goroutine that periodically prunes all
model states on a configurable interval. It returns a cancel function to stop
the pruner:
```go
stop := rl.BackgroundPrune(30 * time.Second)
defer stop()
```
This prevents memory growth in long-running processes where some models may
accumulate stale entries between calls to `CanSend()`.
### Memory Cleanup
When `prune()` empties both the `Requests` and `Tokens` slices for a model,
and `DayCount` is also zero, the entire `UsageStats` entry is deleted from
the `State` map. This prevents memory leaks from models that were used once
and never again.
---
## Provider and Quota Configuration
### Quota Resolution Order
1. Provider profiles are loaded first from `DefaultProfiles()`.
2. Explicit `Config.Quotas` are merged on top using `maps.Copy`, overriding any
matching model.
3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used. 3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
`SetQuota()` and `AddProvider()` allow runtime modification; both are `SetQuota()` and `AddProvider()` allow runtime modification. Both acquire the
mutex-protected. `AddProvider()` is additive — it does not remove existing write lock. `AddProvider()` is additive -- it does not remove existing quotas for
quotas for models outside the new provider's profile. models outside the new provider's profile.
### Default Quotas (as of February 2026) ### Default Quotas (as of February 2026)
| Provider | Model | MaxRPM | MaxTPM | MaxRPD | | Provider | Model | MaxRPM | MaxTPM | MaxRPD |
|-----------|------------------------|-----------|-----------|-----------| |-----------|------------------------|-----------|-------------|-----------|
| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 | | Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 | | Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 | | Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited | | Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited |
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited | | Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited | | OpenAI | gpt-4o | 500 | 30,000 | unlimited |
| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited | | OpenAI | gpt-4o-mini | 500 | 200,000 | unlimited |
| OpenAI | o1, o3-mini | 500 | varies | unlimited | | OpenAI | gpt-4-turbo | 500 | 30,000 | unlimited |
| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited | | OpenAI | o1 | 500 | 30,000 | unlimited |
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited | | OpenAI | o1-mini | 500 | 200,000 | unlimited |
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited | | OpenAI | o3-mini | 500 | 200,000 | unlimited |
| Local | (none by default) | user-defined | | Anthropic | claude-opus-4 | 50 | 40,000 | unlimited |
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited |
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited |
| Local | (none by default) | user-defined |
The Local provider exists for local inference backends (Ollama, MLX, llama.cpp) The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
where the throttle limit is hardware rather than an API quota. No defaults are where the throttle limit is hardware rather than an API quota. No defaults are
@ -117,10 +186,54 @@ provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
--- ---
## YAML Persistence (Legacy) ## Constructors
The default backend serialises the entire `RateLimiter` struct — both the | Function | Backend | Default Provider |
`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`. |----------|---------|------------------|
| `New()` | YAML | Gemini |
| `NewWithConfig(cfg)` | YAML | Configurable (Gemini if empty) |
| `NewWithSQLite(dbPath)` | SQLite | Gemini |
| `NewWithSQLiteConfig(dbPath, cfg)` | SQLite | Configurable (Gemini if empty) |
`Close()` releases the database connection for SQLite-backed limiters. It is a
no-op on YAML-backed limiters. Always call `Close()` (or `defer rl.Close()`)
when using the SQLite backend.
---
## Data Flow
A typical request lifecycle:
```
1. CanSend(model, estimatedTokens)
|-- acquires write lock
|-- looks up ModelQuota for the model
|-- if unknown model or all-zero quota: returns true (allowed)
|-- calls prune(model) to discard stale entries
|-- checks RPD, RPM, TPM against the pruned state
'-- returns true/false
2. (caller makes the API call)
3. RecordUsage(model, promptTokens, outputTokens)
|-- acquires write lock
|-- calls prune(model)
|-- appends to Requests and Tokens slices
'-- increments DayCount
4. Persist()
|-- acquires write lock, clones state, releases lock
|-- YAML: marshals to file
'-- SQLite: saves quotas and state in transactions
```
---
## YAML Persistence
The default backend serialises both the `Quotas` map and the `State` map to a
YAML file at `~/.core/ratelimits.yaml` (configurable via `Config.FilePath`).
```yaml ```yaml
quotas: quotas:
@ -143,35 +256,30 @@ state:
`Load()` treats a missing file as an empty state (no error). Corrupt or `Load()` treats a missing file as an empty state (no error). Corrupt or
unreadable files return an error. unreadable files return an error.
**Limitations of YAML backend:** **Limitations of the YAML backend:**
- Single-process only. Concurrent writes from multiple processes corrupt the - Single-process only. Concurrent writes from multiple processes corrupt the
file because the write is not atomic at the OS level. file because the write is not atomic at the OS level.
- The entire state is serialised on every `Persist()` call, which grows linearly - The entire state is serialised on every `Persist()` call.
with the number of tracked models and entries. - Timestamps are serialised as RFC 3339 strings.
- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
preserved by Go's time marshaller but depends on the YAML library.
--- ---
## SQLite Backend ## SQLite Backend
The SQLite backend was added in Phase 2 to support multi-process scenarios and The SQLite backend supports multi-process scenarios. It uses `modernc.org/sqlite`,
provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure a pure Go port of SQLite that compiles without CGO.
Go port of SQLite that compiles without CGO.
### Connection Settings ### Connection Settings
```go ```go
db.SetMaxOpenConns(1) // single connection for PRAGMA consistency db.SetMaxOpenConns(1) // single connection for PRAGMA consistency
db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers db.Exec("PRAGMA journal_mode=WAL") // concurrent readers alongside a single writer
db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout db.Exec("PRAGMA busy_timeout=5000") // 5-second wait on lock contention
``` ```
WAL mode allows one writer and multiple concurrent readers. The 5-second busy WAL mode allows one writer and multiple concurrent readers. The 5-second busy
timeout prevents immediate failure when a second process is mid-commit. A single timeout prevents immediate failure when a second process is mid-commit.
`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
at the file level; multiple Go connections to the same file through a single
process would not add throughput but would complicate locking.
### Schema ### Schema
@ -196,7 +304,7 @@ CREATE TABLE IF NOT EXISTS tokens (
CREATE TABLE IF NOT EXISTS daily ( CREATE TABLE IF NOT EXISTS daily (
model TEXT PRIMARY KEY, model TEXT PRIMARY KEY,
day_start INTEGER NOT NULL, -- UnixNano day_start INTEGER NOT NULL, -- UnixNano
day_count INTEGER NOT NULL DEFAULT 0 day_count INTEGER NOT NULL DEFAULT 0
); );
@ -205,51 +313,22 @@ CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
``` ```
Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
precision without relying on SQLite's text date format, and allows efficient precision and allows efficient range queries using the composite indices.
range queries using the composite indices.
### Save Strategy ### Save Strategy
`saveState()` uses a delete-then-insert pattern inside a single transaction. - **Quotas**: `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert). Existing quota
All three state tables are truncated and rewritten atomically: rows are updated in place without deleting unrelated models.
- **State**: Delete-then-insert inside a single transaction. All three state
```go tables (`requests`, `tokens`, `daily`) are truncated and rewritten atomically.
tx.Exec("DELETE FROM requests")
tx.Exec("DELETE FROM tokens")
tx.Exec("DELETE FROM daily")
// then INSERT for every model in state
tx.Commit()
```
`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
existing quota rows are updated in place without deleting unrelated models.
### Constructors
```go
// YAML backend (default)
rl, err := ratelimit.New()
rl, err := ratelimit.NewWithConfig(cfg)
// SQLite backend
rl, err := ratelimit.NewWithSQLite(dbPath)
rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
defer rl.Close() // releases the database connection
```
`Close()` is a no-op on YAML-backed limiters.
--- ---
## Migration Path ## Migration Path
`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML `MigrateYAMLToSQLite(yamlPath, sqlitePath)` reads an existing YAML state file
state file and writes all quotas and usage state to a new SQLite database. The and writes all quotas and usage state to a new SQLite database. The function is
function is idempotent — running it again on the same YAML file overwrites the idempotent -- running it again overwrites the SQLite database state.
SQLite database state.
Typical one-time migration:
```go ```go
err := ratelimit.MigrateYAMLToSQLite( err := ratelimit.MigrateYAMLToSQLite(
@ -258,29 +337,59 @@ err := ratelimit.MigrateYAMLToSQLite(
) )
``` ```
After migration, switch the constructor: After migration, switch the constructor from `New()` to `NewWithSQLite()`. The
YAML file can be kept as a backup; the two backends do not share state.
---
## Iterators
Two Go 1.26+ iterators are provided for inspecting the limiter state:
- `Models() iter.Seq[string]` -- returns a sorted sequence of all model names
(from both `Quotas` and `State` maps, deduplicated).
- `Iter() iter.Seq2[string, ModelStats]` -- returns sorted model names paired
with their current `ModelStats` snapshot.
```go ```go
// Before for model, stats := range rl.Iter() {
rl, _ := ratelimit.New() fmt.Printf("%s: %d/%d RPM, %d/%d TPM\n",
model, stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM)
// After }
rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
defer rl.Close()
``` ```
The YAML file can be kept as a backup; the two backends do not share state.
--- ---
## CountTokens ## CountTokens
`CountTokens(apiKey, model, text string) (int, error)` calls the Google `CountTokens(ctx, apiKey, model, text)` calls the Google Generative Language API
Generative Language API to obtain an exact token count for a prompt string. It to obtain an exact token count for a prompt string. It is Gemini-specific and
is Gemini-specific and hardcodes the `generativelanguage.googleapis.com` hardcodes the `generativelanguage.googleapis.com` endpoint.
endpoint. The URL is not configurable, which prevents unit testing of the
success path without network access.
For other providers, callers must supply `estimatedTokens` directly to For other providers, callers must supply `estimatedTokens` directly to
`CanSend()` and `RecordUsage()`. Accurate token counts are typically available `CanSend()`. Accurate token counts are typically available in API response
in API response metadata after a call completes. metadata after a call completes.
---
## Concurrency Model
All reads and writes are protected by a single `sync.RWMutex` on the
`RateLimiter` struct.
| Method | Lock type | Reason |
|--------|-----------|--------|
| `CanSend()` | Write | Calls `prune()`, which mutates state slices |
| `RecordUsage()` | Write | Appends to state slices |
| `Reset()` | Write | Deletes state entries |
| `Load()` | Write | Replaces in-memory state |
| `SetQuota()` | Write | Modifies quota map |
| `AddProvider()` | Write | Modifies quota map |
| `Persist()` | Write (brief) | Clones state, then releases lock before I/O |
| `Stats()` | Write | Calls `prune()` |
| `AllStats()` | Write | Prunes inline |
| `Models()` | Read | Reads keys only |
`Persist()` minimises lock contention by cloning the state under a write lock,
then performing I/O after releasing the lock. The test suite passes clean under
`go test -race ./...` with 20 goroutines performing concurrent operations.

View file

@ -1,9 +1,14 @@
---
title: Development Guide
description: How to build, test, and contribute to go-ratelimit -- prerequisites, test patterns, coding standards, and commit conventions.
---
# Development Guide # Development Guide
## Prerequisites ## Prerequisites
- Go 1.25 or later (the module declares `go 1.25.5`) - **Go 1.26** or later (the module declares `go 1.26.0`)
- No CGO required `modernc.org/sqlite` is a pure Go port - No CGO required -- `modernc.org/sqlite` is a pure Go port
No C toolchain, no system SQLite library, no external build tools. A plain No C toolchain, no system SQLite library, no external build tools. A plain
`go build ./...` is sufficient. `go build ./...` is sufficient.
@ -34,6 +39,9 @@ go test -bench=BenchmarkCanSend -benchmem ./...
# Check for vet issues # Check for vet issues
go vet ./... go vet ./...
# Lint (requires golangci-lint)
golangci-lint run ./...
# Tidy dependencies # Tidy dependencies
go mod tidy go mod tidy
``` ```
@ -43,31 +51,35 @@ must produce no errors or warnings before a commit is pushed.
--- ---
## Test Patterns ## Test Organisation
### File Organisation ### File Layout
- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles) | File | Scope |
- `sqlite_test.go` — Phase 2 (SQLite backend) |------|-------|
| `ratelimit_test.go` | Core sliding window logic, provider profiles, concurrency, benchmarks |
| `sqlite_test.go` | SQLite backend, migration, concurrent persistence |
| `error_test.go` | SQLite and YAML error paths |
| `iter_test.go` | `Models()` and `Iter()` iterators, `CountTokens` edge cases |
Both files are in `package ratelimit` (white-box tests) so they can access All test files are in `package ratelimit` (white-box tests), giving access to
unexported fields and methods such as `prune()`, `filePath`, and `sqlite`. unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
### Naming Convention ### Naming Convention
SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern: SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
- `_Good` happy path - `_Good` -- happy path
- `_Bad` expected error conditions (invalid paths, corrupt input) - `_Bad` -- expected error conditions (invalid paths, corrupt input)
- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files) - `_Ugly` -- panic-adjacent edge cases (corrupt database files, truncated files)
Core logic tests use plain descriptive names without suffixes, grouped by Core logic tests use plain descriptive names without suffixes, grouped by method
method with table-driven subtests. with table-driven subtests.
### Test Helpers ### Test Helper
`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and `newTestLimiter(t)` creates a `RateLimiter` with Gemini defaults and redirects
redirects the YAML file path into `t.TempDir()`: the YAML file path into `t.TempDir()`:
```go ```go
func newTestLimiter(t *testing.T) *RateLimiter { func newTestLimiter(t *testing.T) *RateLimiter {
@ -86,43 +98,64 @@ after each test completes.
Tests use `github.com/stretchr/testify` exclusively: Tests use `github.com/stretchr/testify` exclusively:
- `require.NoError(t, err)` fail immediately on setup errors - `require.NoError(t, err)` -- fail immediately on setup errors
- `assert.NoError(t, err)` record failure but continue - `assert.NoError(t, err)` -- record failure but continue
- `assert.Equal(t, expected, actual, "message")` prefer over raw comparisons - `assert.Equal(t, expected, actual, "message")` -- prefer over raw comparisons
- `assert.True / assert.False` — for boolean checks - `assert.True` / `assert.False` -- for boolean checks
- `assert.Empty / assert.Len` — for slice length checks - `assert.Empty` / `assert.Len` -- for slice length checks
- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors - `assert.ErrorIs(t, err, target)` -- for sentinel errors
Do not use `t.Error`, `t.Fatal`, or `t.Log` directly. Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
### Race Tests ### Concurrency Tests
Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not Race tests spin up goroutines and use `sync.WaitGroup`. Some assert specific
assert anything beyond absence of data races (the race detector does the work): outcomes (e.g., correct RPD count after concurrent recordings), while others
rely solely on the race detector to catch data races:
```go ```go
var wg sync.WaitGroup var wg sync.WaitGroup
for i := range 20 { for range 20 {
wg.Add(1) wg.Go(func() {
go func() { for range 50 {
defer wg.Done() rl.CanSend(model, 10)
// concurrent operations rl.RecordUsage(model, 5, 5)
}() rl.Stats(model)
}
})
} }
wg.Wait() wg.Wait()
``` ```
Run every concurrency test with `-race`. The CI baseline is `go test -race ./...` Always run concurrency tests with `-race`.
clean.
### Benchmarks
The following benchmarks are included:
| Benchmark | What it measures |
|-----------|------------------|
| `BenchmarkCanSend` | CanSend with a 1,000-entry sliding window |
| `BenchmarkRecordUsage` | Recording usage on a single model |
| `BenchmarkCanSendConcurrent` | Parallel CanSend across goroutines |
| `BenchmarkCanSendWithPrune` | CanSend with 500 old + 500 new entries |
| `BenchmarkStats` | Stats retrieval with a 1,000-entry window |
| `BenchmarkAllStats` | AllStats across 5 models x 200 entries each |
| `BenchmarkPersist` | YAML persistence I/O |
| `BenchmarkSQLitePersist` | SQLite persistence I/O |
| `BenchmarkSQLiteLoad` | SQLite state loading |
### Coverage ### Coverage
Current coverage: 95.1%. The remaining 5% consists of three paths that cannot Current coverage: 95.1%. The remaining paths cannot be covered in unit tests
be covered in unit tests without modifying the production code: without modifying production code:
1. `CountTokens` success path — hardcoded Google API URL requires network access 1. `CountTokens` success path -- the Google API URL is hardcoded; unit tests
2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs cannot intercept the HTTP call without URL injection support.
3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME` 2. `yaml.Marshal` error path in `Persist()` -- `yaml.Marshal` does not fail on
valid Go structs.
3. `os.UserHomeDir()` error path in `NewWithConfig()` -- triggered only when
`$HOME` is unset, which test infrastructure prevents.
Do not lower coverage below 95% without a documented reason. Do not lower coverage below 95% without a documented reason.
@ -137,26 +170,25 @@ Do not use American spellings in identifiers, comments, or documentation.
### Go Style ### Go Style
- All exported types, functions, and fields must have doc comments - All exported types, functions, and fields must have doc comments.
- Error strings must be lowercase and not end with punctuation (Go convention) - Error strings must be lowercase and not end with punctuation (Go convention).
- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the - Contextual errors use `fmt.Errorf("ratelimit.Function: what: %w", err)` so
prefix `ratelimit.` is included so errors identify their origin clearly errors identify their origin clearly.
- No `init()` functions - No `init()` functions.
- No global mutable state outside of `DefaultProfiles()` (which returns a fresh - No global mutable state. `DefaultProfiles()` returns a fresh map on each call.
map on each call)
### Mutex Discipline ### Mutex Discipline
The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules: The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), - Methods that call `prune()` always acquire the write lock (`mu.Lock()`), even
even if they appear read-only, because `prune()` mutates slices if they appear read-only, because `prune()` mutates state slices.
- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a - `Persist()` acquires the write lock briefly to clone state, then releases it
snapshot of state before performing I/O.
- Lock acquisition always happens at the top of the public method, never inside - Lock acquisition always happens at the top of the public method, never inside
a helper — helpers document "Caller must hold the lock" a helper. Helpers document "Caller must hold the lock".
- Never call a public method from inside another public method while holding - Never call a public method from inside another public method while holding the
the lock (deadlock risk) lock (deadlock risk).
### Dependencies ### Dependencies
@ -175,9 +207,8 @@ client libraries; the existing `CountTokens` function uses the standard library.
## Licence ## Licence
EUPL-1.2. Every new source file must carry the standard header if the project EUPL-1.2. Confirm with the project lead before adding files under a different
adopts per-file headers in future. Confirm with the project lead before adding licence.
files under a different licence.
--- ---
@ -205,3 +236,17 @@ Co-Authored-By: Virgil <virgil@lethean.io>
Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
pass. `go mod tidy` must produce no changes. pass. `go mod tidy` must produce no changes.
---
## Linting
The project uses `golangci-lint` with the following enabled linters (see
`.golangci.yml`):
- `govet`, `errcheck`, `staticcheck`, `unused`, `gosimple`
- `ineffassign`, `typecheck`, `gocritic`, `gofmt`
Disabled linters: `exhaustive`, `wrapcheck`.
Run `golangci-lint run ./...` to check before committing.

117
docs/index.md Normal file
View file

@ -0,0 +1,117 @@
---
title: go-ratelimit
description: Provider-agnostic sliding window rate limiter for LLM API calls, with YAML and SQLite persistence backends.
---
# go-ratelimit
**Module**: `forge.lthn.ai/core/go-ratelimit`
**Licence**: EUPL-1.2
**Go version**: 1.26+
go-ratelimit enforces requests-per-minute (RPM), tokens-per-minute (TPM), and
requests-per-day (RPD) quotas on a per-model basis using an in-memory sliding
window. It ships with default quota profiles for Gemini, OpenAI, Anthropic, and
a local inference provider. State persists across process restarts via YAML
(single-process) or SQLite with WAL mode (multi-process). A YAML-to-SQLite
migration helper is included.
## Quick Start
```go
import "forge.lthn.ai/core/go-ratelimit"
// Create a limiter with Gemini defaults (YAML backend).
rl, err := ratelimit.New()
if err != nil {
log.Fatal(err)
}
// Check capacity before sending.
if rl.CanSend("gemini-2.0-flash", 1500) {
// Make the API call...
rl.RecordUsage("gemini-2.0-flash", 1000, 500) // promptTokens, outputTokens
}
// Persist state to disk for recovery across restarts.
if err := rl.Persist(); err != nil {
log.Printf("persist failed: %v", err)
}
```
### Multi-provider configuration
```go
rl, err := ratelimit.NewWithConfig(ratelimit.Config{
Providers: []ratelimit.Provider{
ratelimit.ProviderGemini,
ratelimit.ProviderAnthropic,
},
Quotas: map[string]ratelimit.ModelQuota{
// Override a specific model's limits.
"gemini-3-pro-preview": {MaxRPM: 50, MaxTPM: 500000, MaxRPD: 200},
// Add a custom model not in any profile.
"llama-3.3-70b": {MaxRPM: 5, MaxTPM: 50000, MaxRPD: 0},
},
})
```
### SQLite backend (multi-process safe)
```go
rl, err := ratelimit.NewWithSQLite("~/.core/ratelimits.db")
if err != nil {
log.Fatal(err)
}
defer rl.Close()
// Load persisted state.
if err := rl.Load(); err != nil {
log.Fatal(err)
}
// Use exactly as the YAML backend -- CanSend, RecordUsage, Persist, etc.
```
### Blocking until capacity is available
```go
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
log.Printf("timed out waiting for capacity: %v", err)
return
}
// Capacity is available; proceed with the API call.
```
## Package Layout
The module is a single package with no sub-packages.
| File | Purpose |
|------|---------|
| `ratelimit.go` | Core types (`RateLimiter`, `ModelQuota`, `Config`, `Provider`), sliding window logic, provider profiles, YAML persistence, `CountTokens` helper |
| `sqlite.go` | SQLite persistence backend (`sqliteStore`), schema creation, load/save operations |
| `ratelimit_test.go` | Tests for core logic, provider profiles, concurrency, and benchmarks |
| `sqlite_test.go` | Tests for SQLite backend, migration, and error recovery |
| `error_test.go` | Tests for SQLite and YAML error paths |
| `iter_test.go` | Tests for `Models()` and `Iter()` iterators, plus `CountTokens` edge cases |
## Dependencies
| Dependency | Purpose | Category |
|------------|---------|----------|
| `gopkg.in/yaml.v3` | YAML serialisation for the legacy persistence backend | Direct |
| `modernc.org/sqlite` | Pure Go SQLite driver (no CGO required) | Direct |
| `github.com/stretchr/testify` | Test assertions (`assert`, `require`) | Test only |
All indirect dependencies are pulled in by `modernc.org/sqlite`. No C toolchain
or system SQLite library is needed.
## Further Reading
- [Architecture](architecture.md) -- sliding window algorithm, provider quotas, YAML and SQLite backends, concurrency model
- [Development](development.md) -- build commands, test patterns, coding standards, commit conventions
- [History](history.md) -- completed phases with commit hashes, known limitations