docs: add human-friendly documentation
All checks were successful
Security Scan / security (push) Successful in 6s
Test / test (push) Successful in 45s

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Snider 2026-03-11 13:02:40 +00:00
parent ae2cb96d38
commit 9572425e89
3 changed files with 475 additions and 204 deletions

View file

@ -1,81 +1,72 @@
---
title: Architecture
description: Internals of go-ratelimit -- sliding window algorithm, provider quota system, persistence backends, and concurrency model.
---
# Architecture
go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
three independent quota dimensions per model — requests per minute (RPM), tokens
per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
three independent quota dimensions per model -- requests per minute (RPM), tokens
per minute (TPM), and requests per day (RPD) -- using an in-memory sliding window
that can be persisted across process restarts via YAML or SQLite.
Module path: `forge.lthn.ai/core/go-ratelimit`
---
## Sliding Window Algorithm
## Key Types
The limiter maintains per-model `UsageStats` structs in memory:
### RateLimiter
The central struct. Holds the quota definitions, current usage state, a mutex for
thread safety, and an optional SQLite backend reference.
```go
type RateLimiter struct {
mu sync.RWMutex
Quotas map[string]ModelQuota // per-model quota definitions
State map[string]*UsageStats // per-model sliding window state
filePath string // YAML file path (ignored when SQLite is active)
sqlite *sqliteStore // non-nil when using SQLite backend
}
```
### ModelQuota
Defines the rate limits for a single model. A zero value in any field means that
dimension is unlimited.
```go
type ModelQuota struct {
MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited)
MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited)
MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
}
```
### UsageStats
Tracks the sliding window state for a single model.
```go
type UsageStats struct {
Requests []time.Time // timestamps of recent requests (1-minute window)
Tokens []TokenEntry // token counts with timestamps (1-minute window)
DayStart time.Time // when the current daily window started
DayCount int // total requests recorded since DayStart
DayStart time.Time // when the current 24-hour window started
DayCount int // total requests since DayStart
}
type TokenEntry struct {
Time time.Time
Count int // prompt + output tokens for this request
}
```
Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
slices and discards entries older than `now - 1 minute`. Pruning is done
in-place to avoid allocation on the hot path:
### Config
Controls `RateLimiter` initialisation.
```go
validReqs := 0
for _, t := range stats.Requests {
if t.After(window) {
stats.Requests[validReqs] = t
validReqs++
}
}
stats.Requests = stats.Requests[:validReqs]
```
The same loop runs for token entries. After pruning, `CanSend()` checks each
quota dimension in priority order: RPD first (cheapest check), then RPM, then
TPM. A zero value for any dimension means that dimension is unlimited. If all
three are zero the model is treated as fully unlimited and the check short-circuits
before touching any state.
### Daily Reset
The daily counter resets automatically inside `prune()`. When
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
to the current time. This means the daily window is a rolling 24-hour period
anchored to the first request of the day, not a calendar boundary.
### Concurrency
All reads and writes are protected by a single `sync.RWMutex`. Methods that
write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
where possible. The `CanSend()` method acquires a write lock because it calls
`prune()`, which mutates the state slices.
`go test -race ./...` passes clean with 20 goroutines performing concurrent
`CanSend()`, `RecordUsage()`, and `Stats()` calls.
---
## Provider and Quota Configuration
### Types
```go
type Provider string // "gemini", "openai", "anthropic", "local"
type ModelQuota struct {
MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
MaxTPM int `yaml:"max_tpm"`
MaxRPD int `yaml:"max_rpd"`
}
type Config struct {
FilePath string // default: ~/.core/ratelimits.yaml
Backend string // "yaml" (default) or "sqlite"
@ -84,28 +75,106 @@ type Config struct {
}
```
### Quota Resolution
### Provider
1. Provider profiles are loaded first (from `DefaultProfiles()`).
2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
A string type identifying an LLM provider. Four constants are defined:
```go
type Provider string
const (
ProviderGemini Provider = "gemini"
ProviderOpenAI Provider = "openai"
ProviderAnthropic Provider = "anthropic"
ProviderLocal Provider = "local"
)
```
---
## Sliding Window Algorithm
Every call to `CanSend()` or `Stats()` first calls `prune()`, which removes
entries older than one minute from the `Requests` and `Tokens` slices. Pruning
is done in-place using `slices.DeleteFunc` to minimise allocations:
```go
window := now.Add(-1 * time.Minute)
stats.Requests = slices.DeleteFunc(stats.Requests, func(t time.Time) bool {
return t.Before(window)
})
stats.Tokens = slices.DeleteFunc(stats.Tokens, func(t TokenEntry) bool {
return t.Time.Before(window)
})
```
After pruning, `CanSend()` checks each quota dimension. If all three limits
(RPM, TPM, RPD) are zero, the model is treated as fully unlimited and the
check short-circuits before touching any state.
The check order is: RPD, then RPM, then TPM. RPD is checked first because it
is the cheapest comparison (a single integer). TPM is checked last because it
requires summing the token counts in the sliding window.
### Daily Reset
The daily counter resets automatically inside `prune()`. When
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is
updated to the current time. The daily window is a rolling 24-hour period
anchored to the first request of the day, not a calendar boundary.
### Background Pruning
`BackgroundPrune(interval)` starts a goroutine that periodically prunes all
model states on a configurable interval. It returns a cancel function to stop
the pruner:
```go
stop := rl.BackgroundPrune(30 * time.Second)
defer stop()
```
This prevents memory growth in long-running processes where some models may
accumulate stale entries between calls to `CanSend()`.
### Memory Cleanup
When `prune()` empties both the `Requests` and `Tokens` slices for a model,
and `DayCount` is also zero, the entire `UsageStats` entry is deleted from
the `State` map. This prevents memory leaks from models that were used once
and never again.
---
## Provider and Quota Configuration
### Quota Resolution Order
1. Provider profiles are loaded first from `DefaultProfiles()`.
2. Explicit `Config.Quotas` are merged on top using `maps.Copy`, overriding any
matching model.
3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
`SetQuota()` and `AddProvider()` allow runtime modification; both are
mutex-protected. `AddProvider()` is additive — it does not remove existing
quotas for models outside the new provider's profile.
`SetQuota()` and `AddProvider()` allow runtime modification. Both acquire the
write lock. `AddProvider()` is additive -- it does not remove existing quotas for
models outside the new provider's profile.
### Default Quotas (as of February 2026)
| Provider | Model | MaxRPM | MaxTPM | MaxRPD |
|-----------|------------------------|-----------|-----------|-----------|
|-----------|------------------------|-----------|-------------|-----------|
| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 |
| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited |
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited |
| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited |
| OpenAI | o1, o3-mini | 500 | varies | unlimited |
| OpenAI | gpt-4o | 500 | 30,000 | unlimited |
| OpenAI | gpt-4o-mini | 500 | 200,000 | unlimited |
| OpenAI | gpt-4-turbo | 500 | 30,000 | unlimited |
| OpenAI | o1 | 500 | 30,000 | unlimited |
| OpenAI | o1-mini | 500 | 200,000 | unlimited |
| OpenAI | o3-mini | 500 | 200,000 | unlimited |
| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited |
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited |
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited |
@ -117,10 +186,54 @@ provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
---
## YAML Persistence (Legacy)
## Constructors
The default backend serialises the entire `RateLimiter` struct — both the
`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
| Function | Backend | Default Provider |
|----------|---------|------------------|
| `New()` | YAML | Gemini |
| `NewWithConfig(cfg)` | YAML | Configurable (Gemini if empty) |
| `NewWithSQLite(dbPath)` | SQLite | Gemini |
| `NewWithSQLiteConfig(dbPath, cfg)` | SQLite | Configurable (Gemini if empty) |
`Close()` releases the database connection for SQLite-backed limiters. It is a
no-op on YAML-backed limiters. Always call `Close()` (or `defer rl.Close()`)
when using the SQLite backend.
---
## Data Flow
A typical request lifecycle:
```
1. CanSend(model, estimatedTokens)
|-- acquires write lock
|-- looks up ModelQuota for the model
|-- if unknown model or all-zero quota: returns true (allowed)
|-- calls prune(model) to discard stale entries
|-- checks RPD, RPM, TPM against the pruned state
'-- returns true/false
2. (caller makes the API call)
3. RecordUsage(model, promptTokens, outputTokens)
|-- acquires write lock
|-- calls prune(model)
|-- appends to Requests and Tokens slices
'-- increments DayCount
4. Persist()
|-- acquires write lock, clones state, releases lock
|-- YAML: marshals to file
'-- SQLite: saves quotas and state in transactions
```
---
## YAML Persistence
The default backend serialises both the `Quotas` map and the `State` map to a
YAML file at `~/.core/ratelimits.yaml` (configurable via `Config.FilePath`).
```yaml
quotas:
@ -143,35 +256,30 @@ state:
`Load()` treats a missing file as an empty state (no error). Corrupt or
unreadable files return an error.
**Limitations of YAML backend:**
**Limitations of the YAML backend:**
- Single-process only. Concurrent writes from multiple processes corrupt the
file because the write is not atomic at the OS level.
- The entire state is serialised on every `Persist()` call, which grows linearly
with the number of tracked models and entries.
- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
preserved by Go's time marshaller but depends on the YAML library.
- The entire state is serialised on every `Persist()` call.
- Timestamps are serialised as RFC 3339 strings.
---
## SQLite Backend
The SQLite backend was added in Phase 2 to support multi-process scenarios and
provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
Go port of SQLite that compiles without CGO.
The SQLite backend supports multi-process scenarios. It uses `modernc.org/sqlite`,
a pure Go port of SQLite that compiles without CGO.
### Connection Settings
```go
db.SetMaxOpenConns(1) // single connection for PRAGMA consistency
db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers
db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout
db.Exec("PRAGMA journal_mode=WAL") // concurrent readers alongside a single writer
db.Exec("PRAGMA busy_timeout=5000") // 5-second wait on lock contention
```
WAL mode allows one writer and multiple concurrent readers. The 5-second busy
timeout prevents immediate failure when a second process is mid-commit. A single
`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
at the file level; multiple Go connections to the same file through a single
process would not add throughput but would complicate locking.
timeout prevents immediate failure when a second process is mid-commit.
### Schema
@ -205,51 +313,22 @@ CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
```
Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
precision without relying on SQLite's text date format, and allows efficient
range queries using the composite indices.
precision and allows efficient range queries using the composite indices.
### Save Strategy
`saveState()` uses a delete-then-insert pattern inside a single transaction.
All three state tables are truncated and rewritten atomically:
```go
tx.Exec("DELETE FROM requests")
tx.Exec("DELETE FROM tokens")
tx.Exec("DELETE FROM daily")
// then INSERT for every model in state
tx.Commit()
```
`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
existing quota rows are updated in place without deleting unrelated models.
### Constructors
```go
// YAML backend (default)
rl, err := ratelimit.New()
rl, err := ratelimit.NewWithConfig(cfg)
// SQLite backend
rl, err := ratelimit.NewWithSQLite(dbPath)
rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
defer rl.Close() // releases the database connection
```
`Close()` is a no-op on YAML-backed limiters.
- **Quotas**: `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert). Existing quota
rows are updated in place without deleting unrelated models.
- **State**: Delete-then-insert inside a single transaction. All three state
tables (`requests`, `tokens`, `daily`) are truncated and rewritten atomically.
---
## Migration Path
`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
state file and writes all quotas and usage state to a new SQLite database. The
function is idempotent — running it again on the same YAML file overwrites the
SQLite database state.
Typical one-time migration:
`MigrateYAMLToSQLite(yamlPath, sqlitePath)` reads an existing YAML state file
and writes all quotas and usage state to a new SQLite database. The function is
idempotent -- running it again overwrites the SQLite database state.
```go
err := ratelimit.MigrateYAMLToSQLite(
@ -258,29 +337,59 @@ err := ratelimit.MigrateYAMLToSQLite(
)
```
After migration, switch the constructor:
After migration, switch the constructor from `New()` to `NewWithSQLite()`. The
YAML file can be kept as a backup; the two backends do not share state.
---
## Iterators
Two Go 1.26+ iterators are provided for inspecting the limiter state:
- `Models() iter.Seq[string]` -- returns a sorted sequence of all model names
(from both `Quotas` and `State` maps, deduplicated).
- `Iter() iter.Seq2[string, ModelStats]` -- returns sorted model names paired
with their current `ModelStats` snapshot.
```go
// Before
rl, _ := ratelimit.New()
// After
rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
defer rl.Close()
for model, stats := range rl.Iter() {
fmt.Printf("%s: %d/%d RPM, %d/%d TPM\n",
model, stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM)
}
```
The YAML file can be kept as a backup; the two backends do not share state.
---
## CountTokens
`CountTokens(apiKey, model, text string) (int, error)` calls the Google
Generative Language API to obtain an exact token count for a prompt string. It
is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
endpoint. The URL is not configurable, which prevents unit testing of the
success path without network access.
`CountTokens(ctx, apiKey, model, text)` calls the Google Generative Language API
to obtain an exact token count for a prompt string. It is Gemini-specific and
hardcodes the `generativelanguage.googleapis.com` endpoint.
For other providers, callers must supply `estimatedTokens` directly to
`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
in API response metadata after a call completes.
`CanSend()`. Accurate token counts are typically available in API response
metadata after a call completes.
---
## Concurrency Model
All reads and writes are protected by a single `sync.RWMutex` on the
`RateLimiter` struct.
| Method | Lock type | Reason |
|--------|-----------|--------|
| `CanSend()` | Write | Calls `prune()`, which mutates state slices |
| `RecordUsage()` | Write | Appends to state slices |
| `Reset()` | Write | Deletes state entries |
| `Load()` | Write | Replaces in-memory state |
| `SetQuota()` | Write | Modifies quota map |
| `AddProvider()` | Write | Modifies quota map |
| `Persist()` | Write (brief) | Clones state, then releases lock before I/O |
| `Stats()` | Write | Calls `prune()` |
| `AllStats()` | Write | Prunes inline |
| `Models()` | Read | Reads keys only |
`Persist()` minimises lock contention by cloning the state under a write lock,
then performing I/O after releasing the lock. The test suite passes clean under
`go test -race ./...` with 20 goroutines performing concurrent operations.

View file

@ -1,9 +1,14 @@
---
title: Development Guide
description: How to build, test, and contribute to go-ratelimit -- prerequisites, test patterns, coding standards, and commit conventions.
---
# Development Guide
## Prerequisites
- Go 1.25 or later (the module declares `go 1.25.5`)
- No CGO required `modernc.org/sqlite` is a pure Go port
- **Go 1.26** or later (the module declares `go 1.26.0`)
- No CGO required -- `modernc.org/sqlite` is a pure Go port
No C toolchain, no system SQLite library, no external build tools. A plain
`go build ./...` is sufficient.
@ -34,6 +39,9 @@ go test -bench=BenchmarkCanSend -benchmem ./...
# Check for vet issues
go vet ./...
# Lint (requires golangci-lint)
golangci-lint run ./...
# Tidy dependencies
go mod tidy
```
@ -43,31 +51,35 @@ must produce no errors or warnings before a commit is pushed.
---
## Test Patterns
## Test Organisation
### File Organisation
### File Layout
- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
- `sqlite_test.go` — Phase 2 (SQLite backend)
| File | Scope |
|------|-------|
| `ratelimit_test.go` | Core sliding window logic, provider profiles, concurrency, benchmarks |
| `sqlite_test.go` | SQLite backend, migration, concurrent persistence |
| `error_test.go` | SQLite and YAML error paths |
| `iter_test.go` | `Models()` and `Iter()` iterators, `CountTokens` edge cases |
Both files are in `package ratelimit` (white-box tests) so they can access
All test files are in `package ratelimit` (white-box tests), giving access to
unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
### Naming Convention
SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
- `_Good` happy path
- `_Bad` expected error conditions (invalid paths, corrupt input)
- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
- `_Good` -- happy path
- `_Bad` -- expected error conditions (invalid paths, corrupt input)
- `_Ugly` -- panic-adjacent edge cases (corrupt database files, truncated files)
Core logic tests use plain descriptive names without suffixes, grouped by
method with table-driven subtests.
Core logic tests use plain descriptive names without suffixes, grouped by method
with table-driven subtests.
### Test Helpers
### Test Helper
`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
redirects the YAML file path into `t.TempDir()`:
`newTestLimiter(t)` creates a `RateLimiter` with Gemini defaults and redirects
the YAML file path into `t.TempDir()`:
```go
func newTestLimiter(t *testing.T) *RateLimiter {
@ -86,43 +98,64 @@ after each test completes.
Tests use `github.com/stretchr/testify` exclusively:
- `require.NoError(t, err)` fail immediately on setup errors
- `assert.NoError(t, err)` record failure but continue
- `assert.Equal(t, expected, actual, "message")` prefer over raw comparisons
- `assert.True / assert.False` — for boolean checks
- `assert.Empty / assert.Len` — for slice length checks
- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
- `require.NoError(t, err)` -- fail immediately on setup errors
- `assert.NoError(t, err)` -- record failure but continue
- `assert.Equal(t, expected, actual, "message")` -- prefer over raw comparisons
- `assert.True` / `assert.False` -- for boolean checks
- `assert.Empty` / `assert.Len` -- for slice length checks
- `assert.ErrorIs(t, err, target)` -- for sentinel errors
Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
### Race Tests
### Concurrency Tests
Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
assert anything beyond absence of data races (the race detector does the work):
Race tests spin up goroutines and use `sync.WaitGroup`. Some assert specific
outcomes (e.g., correct RPD count after concurrent recordings), while others
rely solely on the race detector to catch data races:
```go
var wg sync.WaitGroup
for i := range 20 {
wg.Add(1)
go func() {
defer wg.Done()
// concurrent operations
}()
for range 20 {
wg.Go(func() {
for range 50 {
rl.CanSend(model, 10)
rl.RecordUsage(model, 5, 5)
rl.Stats(model)
}
})
}
wg.Wait()
```
Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
clean.
Always run concurrency tests with `-race`.
### Benchmarks
The following benchmarks are included:
| Benchmark | What it measures |
|-----------|------------------|
| `BenchmarkCanSend` | CanSend with a 1,000-entry sliding window |
| `BenchmarkRecordUsage` | Recording usage on a single model |
| `BenchmarkCanSendConcurrent` | Parallel CanSend across goroutines |
| `BenchmarkCanSendWithPrune` | CanSend with 500 old + 500 new entries |
| `BenchmarkStats` | Stats retrieval with a 1,000-entry window |
| `BenchmarkAllStats` | AllStats across 5 models x 200 entries each |
| `BenchmarkPersist` | YAML persistence I/O |
| `BenchmarkSQLitePersist` | SQLite persistence I/O |
| `BenchmarkSQLiteLoad` | SQLite state loading |
### Coverage
Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
be covered in unit tests without modifying the production code:
Current coverage: 95.1%. The remaining paths cannot be covered in unit tests
without modifying production code:
1. `CountTokens` success path — hardcoded Google API URL requires network access
2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
1. `CountTokens` success path -- the Google API URL is hardcoded; unit tests
cannot intercept the HTTP call without URL injection support.
2. `yaml.Marshal` error path in `Persist()` -- `yaml.Marshal` does not fail on
valid Go structs.
3. `os.UserHomeDir()` error path in `NewWithConfig()` -- triggered only when
`$HOME` is unset, which test infrastructure prevents.
Do not lower coverage below 95% without a documented reason.
@ -137,26 +170,25 @@ Do not use American spellings in identifiers, comments, or documentation.
### Go Style
- All exported types, functions, and fields must have doc comments
- Error strings must be lowercase and not end with punctuation (Go convention)
- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
prefix `ratelimit.` is included so errors identify their origin clearly
- No `init()` functions
- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
map on each call)
- All exported types, functions, and fields must have doc comments.
- Error strings must be lowercase and not end with punctuation (Go convention).
- Contextual errors use `fmt.Errorf("ratelimit.Function: what: %w", err)` so
errors identify their origin clearly.
- No `init()` functions.
- No global mutable state. `DefaultProfiles()` returns a fresh map on each call.
### Mutex Discipline
The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
even if they appear read-only, because `prune()` mutates slices
- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
snapshot of state
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), even
if they appear read-only, because `prune()` mutates state slices.
- `Persist()` acquires the write lock briefly to clone state, then releases it
before performing I/O.
- Lock acquisition always happens at the top of the public method, never inside
a helper — helpers document "Caller must hold the lock"
- Never call a public method from inside another public method while holding
the lock (deadlock risk)
a helper. Helpers document "Caller must hold the lock".
- Never call a public method from inside another public method while holding the
lock (deadlock risk).
### Dependencies
@ -175,9 +207,8 @@ client libraries; the existing `CountTokens` function uses the standard library.
## Licence
EUPL-1.2. Every new source file must carry the standard header if the project
adopts per-file headers in future. Confirm with the project lead before adding
files under a different licence.
EUPL-1.2. Confirm with the project lead before adding files under a different
licence.
---
@ -205,3 +236,17 @@ Co-Authored-By: Virgil <virgil@lethean.io>
Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
pass. `go mod tidy` must produce no changes.
---
## Linting
The project uses `golangci-lint` with the following enabled linters (see
`.golangci.yml`):
- `govet`, `errcheck`, `staticcheck`, `unused`, `gosimple`
- `ineffassign`, `typecheck`, `gocritic`, `gofmt`
Disabled linters: `exhaustive`, `wrapcheck`.
Run `golangci-lint run ./...` to check before committing.

117
docs/index.md Normal file
View file

@ -0,0 +1,117 @@
---
title: go-ratelimit
description: Provider-agnostic sliding window rate limiter for LLM API calls, with YAML and SQLite persistence backends.
---
# go-ratelimit
**Module**: `forge.lthn.ai/core/go-ratelimit`
**Licence**: EUPL-1.2
**Go version**: 1.26+
go-ratelimit enforces requests-per-minute (RPM), tokens-per-minute (TPM), and
requests-per-day (RPD) quotas on a per-model basis using an in-memory sliding
window. It ships with default quota profiles for Gemini, OpenAI, Anthropic, and
a local inference provider. State persists across process restarts via YAML
(single-process) or SQLite with WAL mode (multi-process). A YAML-to-SQLite
migration helper is included.
## Quick Start
```go
import "forge.lthn.ai/core/go-ratelimit"
// Create a limiter with Gemini defaults (YAML backend).
rl, err := ratelimit.New()
if err != nil {
log.Fatal(err)
}
// Check capacity before sending.
if rl.CanSend("gemini-2.0-flash", 1500) {
// Make the API call...
rl.RecordUsage("gemini-2.0-flash", 1000, 500) // promptTokens, outputTokens
}
// Persist state to disk for recovery across restarts.
if err := rl.Persist(); err != nil {
log.Printf("persist failed: %v", err)
}
```
### Multi-provider configuration
```go
rl, err := ratelimit.NewWithConfig(ratelimit.Config{
Providers: []ratelimit.Provider{
ratelimit.ProviderGemini,
ratelimit.ProviderAnthropic,
},
Quotas: map[string]ratelimit.ModelQuota{
// Override a specific model's limits.
"gemini-3-pro-preview": {MaxRPM: 50, MaxTPM: 500000, MaxRPD: 200},
// Add a custom model not in any profile.
"llama-3.3-70b": {MaxRPM: 5, MaxTPM: 50000, MaxRPD: 0},
},
})
```
### SQLite backend (multi-process safe)
```go
rl, err := ratelimit.NewWithSQLite("~/.core/ratelimits.db")
if err != nil {
log.Fatal(err)
}
defer rl.Close()
// Load persisted state.
if err := rl.Load(); err != nil {
log.Fatal(err)
}
// Use exactly as the YAML backend -- CanSend, RecordUsage, Persist, etc.
```
### Blocking until capacity is available
```go
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
log.Printf("timed out waiting for capacity: %v", err)
return
}
// Capacity is available; proceed with the API call.
```
## Package Layout
The module is a single package with no sub-packages.
| File | Purpose |
|------|---------|
| `ratelimit.go` | Core types (`RateLimiter`, `ModelQuota`, `Config`, `Provider`), sliding window logic, provider profiles, YAML persistence, `CountTokens` helper |
| `sqlite.go` | SQLite persistence backend (`sqliteStore`), schema creation, load/save operations |
| `ratelimit_test.go` | Tests for core logic, provider profiles, concurrency, and benchmarks |
| `sqlite_test.go` | Tests for SQLite backend, migration, and error recovery |
| `error_test.go` | Tests for SQLite and YAML error paths |
| `iter_test.go` | Tests for `Models()` and `Iter()` iterators, plus `CountTokens` edge cases |
## Dependencies
| Dependency | Purpose | Category |
|------------|---------|----------|
| `gopkg.in/yaml.v3` | YAML serialisation for the legacy persistence backend | Direct |
| `modernc.org/sqlite` | Pure Go SQLite driver (no CGO required) | Direct |
| `github.com/stretchr/testify` | Test assertions (`assert`, `require`) | Test only |
All indirect dependencies are pulled in by `modernc.org/sqlite`. No C toolchain
or system SQLite library is needed.
## Further Reading
- [Architecture](architecture.md) -- sliding window algorithm, provider quotas, YAML and SQLite backends, concurrency model
- [Development](development.md) -- build commands, test patterns, coding standards, commit conventions
- [History](history.md) -- completed phases with commit hashes, known limitations