docs: add human-friendly documentation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
ae2cb96d38
commit
9572425e89
3 changed files with 475 additions and 204 deletions
|
|
@ -1,81 +1,72 @@
|
||||||
|
---
|
||||||
|
title: Architecture
|
||||||
|
description: Internals of go-ratelimit -- sliding window algorithm, provider quota system, persistence backends, and concurrency model.
|
||||||
|
---
|
||||||
|
|
||||||
# Architecture
|
# Architecture
|
||||||
|
|
||||||
go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
|
go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
|
||||||
three independent quota dimensions per model — requests per minute (RPM), tokens
|
three independent quota dimensions per model -- requests per minute (RPM), tokens
|
||||||
per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
|
per minute (TPM), and requests per day (RPD) -- using an in-memory sliding window
|
||||||
that can be persisted across process restarts via YAML or SQLite.
|
that can be persisted across process restarts via YAML or SQLite.
|
||||||
|
|
||||||
Module path: `forge.lthn.ai/core/go-ratelimit`
|
Module path: `forge.lthn.ai/core/go-ratelimit`
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Sliding Window Algorithm
|
## Key Types
|
||||||
|
|
||||||
The limiter maintains per-model `UsageStats` structs in memory:
|
### RateLimiter
|
||||||
|
|
||||||
|
The central struct. Holds the quota definitions, current usage state, a mutex for
|
||||||
|
thread safety, and an optional SQLite backend reference.
|
||||||
|
|
||||||
|
```go
|
||||||
|
type RateLimiter struct {
|
||||||
|
mu sync.RWMutex
|
||||||
|
Quotas map[string]ModelQuota // per-model quota definitions
|
||||||
|
State map[string]*UsageStats // per-model sliding window state
|
||||||
|
filePath string // YAML file path (ignored when SQLite is active)
|
||||||
|
sqlite *sqliteStore // non-nil when using SQLite backend
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### ModelQuota
|
||||||
|
|
||||||
|
Defines the rate limits for a single model. A zero value in any field means that
|
||||||
|
dimension is unlimited.
|
||||||
|
|
||||||
|
```go
|
||||||
|
type ModelQuota struct {
|
||||||
|
MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited)
|
||||||
|
MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited)
|
||||||
|
MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### UsageStats
|
||||||
|
|
||||||
|
Tracks the sliding window state for a single model.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
type UsageStats struct {
|
type UsageStats struct {
|
||||||
Requests []time.Time // timestamps of recent requests (1-minute window)
|
Requests []time.Time // timestamps of recent requests (1-minute window)
|
||||||
Tokens []TokenEntry // token counts with timestamps (1-minute window)
|
Tokens []TokenEntry // token counts with timestamps (1-minute window)
|
||||||
DayStart time.Time // when the current daily window started
|
DayStart time.Time // when the current 24-hour window started
|
||||||
DayCount int // total requests recorded since DayStart
|
DayCount int // total requests since DayStart
|
||||||
|
}
|
||||||
|
|
||||||
|
type TokenEntry struct {
|
||||||
|
Time time.Time
|
||||||
|
Count int // prompt + output tokens for this request
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
|
### Config
|
||||||
slices and discards entries older than `now - 1 minute`. Pruning is done
|
|
||||||
in-place to avoid allocation on the hot path:
|
Controls `RateLimiter` initialisation.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
validReqs := 0
|
|
||||||
for _, t := range stats.Requests {
|
|
||||||
if t.After(window) {
|
|
||||||
stats.Requests[validReqs] = t
|
|
||||||
validReqs++
|
|
||||||
}
|
|
||||||
}
|
|
||||||
stats.Requests = stats.Requests[:validReqs]
|
|
||||||
```
|
|
||||||
|
|
||||||
The same loop runs for token entries. After pruning, `CanSend()` checks each
|
|
||||||
quota dimension in priority order: RPD first (cheapest check), then RPM, then
|
|
||||||
TPM. A zero value for any dimension means that dimension is unlimited. If all
|
|
||||||
three are zero the model is treated as fully unlimited and the check short-circuits
|
|
||||||
before touching any state.
|
|
||||||
|
|
||||||
### Daily Reset
|
|
||||||
|
|
||||||
The daily counter resets automatically inside `prune()`. When
|
|
||||||
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
|
|
||||||
to the current time. This means the daily window is a rolling 24-hour period
|
|
||||||
anchored to the first request of the day, not a calendar boundary.
|
|
||||||
|
|
||||||
### Concurrency
|
|
||||||
|
|
||||||
All reads and writes are protected by a single `sync.RWMutex`. Methods that
|
|
||||||
write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
|
|
||||||
full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
|
|
||||||
where possible. The `CanSend()` method acquires a write lock because it calls
|
|
||||||
`prune()`, which mutates the state slices.
|
|
||||||
|
|
||||||
`go test -race ./...` passes clean with 20 goroutines performing concurrent
|
|
||||||
`CanSend()`, `RecordUsage()`, and `Stats()` calls.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Provider and Quota Configuration
|
|
||||||
|
|
||||||
### Types
|
|
||||||
|
|
||||||
```go
|
|
||||||
type Provider string // "gemini", "openai", "anthropic", "local"
|
|
||||||
|
|
||||||
type ModelQuota struct {
|
|
||||||
MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
|
|
||||||
MaxTPM int `yaml:"max_tpm"`
|
|
||||||
MaxRPD int `yaml:"max_rpd"`
|
|
||||||
}
|
|
||||||
|
|
||||||
type Config struct {
|
type Config struct {
|
||||||
FilePath string // default: ~/.core/ratelimits.yaml
|
FilePath string // default: ~/.core/ratelimits.yaml
|
||||||
Backend string // "yaml" (default) or "sqlite"
|
Backend string // "yaml" (default) or "sqlite"
|
||||||
|
|
@ -84,32 +75,110 @@ type Config struct {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Quota Resolution
|
### Provider
|
||||||
|
|
||||||
1. Provider profiles are loaded first (from `DefaultProfiles()`).
|
A string type identifying an LLM provider. Four constants are defined:
|
||||||
2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
|
|
||||||
|
```go
|
||||||
|
type Provider string
|
||||||
|
|
||||||
|
const (
|
||||||
|
ProviderGemini Provider = "gemini"
|
||||||
|
ProviderOpenAI Provider = "openai"
|
||||||
|
ProviderAnthropic Provider = "anthropic"
|
||||||
|
ProviderLocal Provider = "local"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sliding Window Algorithm
|
||||||
|
|
||||||
|
Every call to `CanSend()` or `Stats()` first calls `prune()`, which removes
|
||||||
|
entries older than one minute from the `Requests` and `Tokens` slices. Pruning
|
||||||
|
is done in-place using `slices.DeleteFunc` to minimise allocations:
|
||||||
|
|
||||||
|
```go
|
||||||
|
window := now.Add(-1 * time.Minute)
|
||||||
|
|
||||||
|
stats.Requests = slices.DeleteFunc(stats.Requests, func(t time.Time) bool {
|
||||||
|
return t.Before(window)
|
||||||
|
})
|
||||||
|
stats.Tokens = slices.DeleteFunc(stats.Tokens, func(t TokenEntry) bool {
|
||||||
|
return t.Time.Before(window)
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
After pruning, `CanSend()` checks each quota dimension. If all three limits
|
||||||
|
(RPM, TPM, RPD) are zero, the model is treated as fully unlimited and the
|
||||||
|
check short-circuits before touching any state.
|
||||||
|
|
||||||
|
The check order is: RPD, then RPM, then TPM. RPD is checked first because it
|
||||||
|
is the cheapest comparison (a single integer). TPM is checked last because it
|
||||||
|
requires summing the token counts in the sliding window.
|
||||||
|
|
||||||
|
### Daily Reset
|
||||||
|
|
||||||
|
The daily counter resets automatically inside `prune()`. When
|
||||||
|
`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is
|
||||||
|
updated to the current time. The daily window is a rolling 24-hour period
|
||||||
|
anchored to the first request of the day, not a calendar boundary.
|
||||||
|
|
||||||
|
### Background Pruning
|
||||||
|
|
||||||
|
`BackgroundPrune(interval)` starts a goroutine that periodically prunes all
|
||||||
|
model states on a configurable interval. It returns a cancel function to stop
|
||||||
|
the pruner:
|
||||||
|
|
||||||
|
```go
|
||||||
|
stop := rl.BackgroundPrune(30 * time.Second)
|
||||||
|
defer stop()
|
||||||
|
```
|
||||||
|
|
||||||
|
This prevents memory growth in long-running processes where some models may
|
||||||
|
accumulate stale entries between calls to `CanSend()`.
|
||||||
|
|
||||||
|
### Memory Cleanup
|
||||||
|
|
||||||
|
When `prune()` empties both the `Requests` and `Tokens` slices for a model,
|
||||||
|
and `DayCount` is also zero, the entire `UsageStats` entry is deleted from
|
||||||
|
the `State` map. This prevents memory leaks from models that were used once
|
||||||
|
and never again.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Provider and Quota Configuration
|
||||||
|
|
||||||
|
### Quota Resolution Order
|
||||||
|
|
||||||
|
1. Provider profiles are loaded first from `DefaultProfiles()`.
|
||||||
|
2. Explicit `Config.Quotas` are merged on top using `maps.Copy`, overriding any
|
||||||
|
matching model.
|
||||||
3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
|
3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
|
||||||
|
|
||||||
`SetQuota()` and `AddProvider()` allow runtime modification; both are
|
`SetQuota()` and `AddProvider()` allow runtime modification. Both acquire the
|
||||||
mutex-protected. `AddProvider()` is additive — it does not remove existing
|
write lock. `AddProvider()` is additive -- it does not remove existing quotas for
|
||||||
quotas for models outside the new provider's profile.
|
models outside the new provider's profile.
|
||||||
|
|
||||||
### Default Quotas (as of February 2026)
|
### Default Quotas (as of February 2026)
|
||||||
|
|
||||||
| Provider | Model | MaxRPM | MaxTPM | MaxRPD |
|
| Provider | Model | MaxRPM | MaxTPM | MaxRPD |
|
||||||
|-----------|------------------------|-----------|-----------|-----------|
|
|-----------|------------------------|-----------|-------------|-----------|
|
||||||
| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 |
|
| Gemini | gemini-3-pro-preview | 150 | 1,000,000 | 1,000 |
|
||||||
| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 |
|
| Gemini | gemini-3-flash-preview | 150 | 1,000,000 | 1,000 |
|
||||||
| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 |
|
| Gemini | gemini-2.5-pro | 150 | 1,000,000 | 1,000 |
|
||||||
| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited |
|
| Gemini | gemini-2.0-flash | 150 | 1,000,000 | unlimited |
|
||||||
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
|
| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
|
||||||
| OpenAI | gpt-4o, gpt-4-turbo | 500 | 30,000 | unlimited |
|
| OpenAI | gpt-4o | 500 | 30,000 | unlimited |
|
||||||
| OpenAI | gpt-4o-mini, o1-mini | 500 | 200,000 | unlimited |
|
| OpenAI | gpt-4o-mini | 500 | 200,000 | unlimited |
|
||||||
| OpenAI | o1, o3-mini | 500 | varies | unlimited |
|
| OpenAI | gpt-4-turbo | 500 | 30,000 | unlimited |
|
||||||
| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited |
|
| OpenAI | o1 | 500 | 30,000 | unlimited |
|
||||||
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited |
|
| OpenAI | o1-mini | 500 | 200,000 | unlimited |
|
||||||
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited |
|
| OpenAI | o3-mini | 500 | 200,000 | unlimited |
|
||||||
| Local | (none by default) | user-defined |
|
| Anthropic | claude-opus-4 | 50 | 40,000 | unlimited |
|
||||||
|
| Anthropic | claude-sonnet-4 | 50 | 40,000 | unlimited |
|
||||||
|
| Anthropic | claude-haiku-3.5 | 50 | 50,000 | unlimited |
|
||||||
|
| Local | (none by default) | user-defined |
|
||||||
|
|
||||||
The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
|
The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
|
||||||
where the throttle limit is hardware rather than an API quota. No defaults are
|
where the throttle limit is hardware rather than an API quota. No defaults are
|
||||||
|
|
@ -117,10 +186,54 @@ provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## YAML Persistence (Legacy)
|
## Constructors
|
||||||
|
|
||||||
The default backend serialises the entire `RateLimiter` struct — both the
|
| Function | Backend | Default Provider |
|
||||||
`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
|
|----------|---------|------------------|
|
||||||
|
| `New()` | YAML | Gemini |
|
||||||
|
| `NewWithConfig(cfg)` | YAML | Configurable (Gemini if empty) |
|
||||||
|
| `NewWithSQLite(dbPath)` | SQLite | Gemini |
|
||||||
|
| `NewWithSQLiteConfig(dbPath, cfg)` | SQLite | Configurable (Gemini if empty) |
|
||||||
|
|
||||||
|
`Close()` releases the database connection for SQLite-backed limiters. It is a
|
||||||
|
no-op on YAML-backed limiters. Always call `Close()` (or `defer rl.Close()`)
|
||||||
|
when using the SQLite backend.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
A typical request lifecycle:
|
||||||
|
|
||||||
|
```
|
||||||
|
1. CanSend(model, estimatedTokens)
|
||||||
|
|-- acquires write lock
|
||||||
|
|-- looks up ModelQuota for the model
|
||||||
|
|-- if unknown model or all-zero quota: returns true (allowed)
|
||||||
|
|-- calls prune(model) to discard stale entries
|
||||||
|
|-- checks RPD, RPM, TPM against the pruned state
|
||||||
|
'-- returns true/false
|
||||||
|
|
||||||
|
2. (caller makes the API call)
|
||||||
|
|
||||||
|
3. RecordUsage(model, promptTokens, outputTokens)
|
||||||
|
|-- acquires write lock
|
||||||
|
|-- calls prune(model)
|
||||||
|
|-- appends to Requests and Tokens slices
|
||||||
|
'-- increments DayCount
|
||||||
|
|
||||||
|
4. Persist()
|
||||||
|
|-- acquires write lock, clones state, releases lock
|
||||||
|
|-- YAML: marshals to file
|
||||||
|
'-- SQLite: saves quotas and state in transactions
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## YAML Persistence
|
||||||
|
|
||||||
|
The default backend serialises both the `Quotas` map and the `State` map to a
|
||||||
|
YAML file at `~/.core/ratelimits.yaml` (configurable via `Config.FilePath`).
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
quotas:
|
quotas:
|
||||||
|
|
@ -143,35 +256,30 @@ state:
|
||||||
`Load()` treats a missing file as an empty state (no error). Corrupt or
|
`Load()` treats a missing file as an empty state (no error). Corrupt or
|
||||||
unreadable files return an error.
|
unreadable files return an error.
|
||||||
|
|
||||||
**Limitations of YAML backend:**
|
**Limitations of the YAML backend:**
|
||||||
|
|
||||||
- Single-process only. Concurrent writes from multiple processes corrupt the
|
- Single-process only. Concurrent writes from multiple processes corrupt the
|
||||||
file because the write is not atomic at the OS level.
|
file because the write is not atomic at the OS level.
|
||||||
- The entire state is serialised on every `Persist()` call, which grows linearly
|
- The entire state is serialised on every `Persist()` call.
|
||||||
with the number of tracked models and entries.
|
- Timestamps are serialised as RFC 3339 strings.
|
||||||
- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
|
|
||||||
preserved by Go's time marshaller but depends on the YAML library.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## SQLite Backend
|
## SQLite Backend
|
||||||
|
|
||||||
The SQLite backend was added in Phase 2 to support multi-process scenarios and
|
The SQLite backend supports multi-process scenarios. It uses `modernc.org/sqlite`,
|
||||||
provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
|
a pure Go port of SQLite that compiles without CGO.
|
||||||
Go port of SQLite that compiles without CGO.
|
|
||||||
|
|
||||||
### Connection Settings
|
### Connection Settings
|
||||||
|
|
||||||
```go
|
```go
|
||||||
db.SetMaxOpenConns(1) // single connection for PRAGMA consistency
|
db.SetMaxOpenConns(1) // single connection for PRAGMA consistency
|
||||||
db.Exec("PRAGMA journal_mode=WAL") // WAL mode for concurrent readers
|
db.Exec("PRAGMA journal_mode=WAL") // concurrent readers alongside a single writer
|
||||||
db.Exec("PRAGMA busy_timeout=5000") // 5-second busy timeout
|
db.Exec("PRAGMA busy_timeout=5000") // 5-second wait on lock contention
|
||||||
```
|
```
|
||||||
|
|
||||||
WAL mode allows one writer and multiple concurrent readers. The 5-second busy
|
WAL mode allows one writer and multiple concurrent readers. The 5-second busy
|
||||||
timeout prevents immediate failure when a second process is mid-commit. A single
|
timeout prevents immediate failure when a second process is mid-commit.
|
||||||
`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
|
|
||||||
at the file level; multiple Go connections to the same file through a single
|
|
||||||
process would not add throughput but would complicate locking.
|
|
||||||
|
|
||||||
### Schema
|
### Schema
|
||||||
|
|
||||||
|
|
@ -196,7 +304,7 @@ CREATE TABLE IF NOT EXISTS tokens (
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS daily (
|
CREATE TABLE IF NOT EXISTS daily (
|
||||||
model TEXT PRIMARY KEY,
|
model TEXT PRIMARY KEY,
|
||||||
day_start INTEGER NOT NULL, -- UnixNano
|
day_start INTEGER NOT NULL, -- UnixNano
|
||||||
day_count INTEGER NOT NULL DEFAULT 0
|
day_count INTEGER NOT NULL DEFAULT 0
|
||||||
);
|
);
|
||||||
|
|
||||||
|
|
@ -205,51 +313,22 @@ CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
|
||||||
```
|
```
|
||||||
|
|
||||||
Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
|
Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
|
||||||
precision without relying on SQLite's text date format, and allows efficient
|
precision and allows efficient range queries using the composite indices.
|
||||||
range queries using the composite indices.
|
|
||||||
|
|
||||||
### Save Strategy
|
### Save Strategy
|
||||||
|
|
||||||
`saveState()` uses a delete-then-insert pattern inside a single transaction.
|
- **Quotas**: `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert). Existing quota
|
||||||
All three state tables are truncated and rewritten atomically:
|
rows are updated in place without deleting unrelated models.
|
||||||
|
- **State**: Delete-then-insert inside a single transaction. All three state
|
||||||
```go
|
tables (`requests`, `tokens`, `daily`) are truncated and rewritten atomically.
|
||||||
tx.Exec("DELETE FROM requests")
|
|
||||||
tx.Exec("DELETE FROM tokens")
|
|
||||||
tx.Exec("DELETE FROM daily")
|
|
||||||
// then INSERT for every model in state
|
|
||||||
tx.Commit()
|
|
||||||
```
|
|
||||||
|
|
||||||
`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
|
|
||||||
existing quota rows are updated in place without deleting unrelated models.
|
|
||||||
|
|
||||||
### Constructors
|
|
||||||
|
|
||||||
```go
|
|
||||||
// YAML backend (default)
|
|
||||||
rl, err := ratelimit.New()
|
|
||||||
rl, err := ratelimit.NewWithConfig(cfg)
|
|
||||||
|
|
||||||
// SQLite backend
|
|
||||||
rl, err := ratelimit.NewWithSQLite(dbPath)
|
|
||||||
rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
|
|
||||||
|
|
||||||
defer rl.Close() // releases the database connection
|
|
||||||
```
|
|
||||||
|
|
||||||
`Close()` is a no-op on YAML-backed limiters.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Migration Path
|
## Migration Path
|
||||||
|
|
||||||
`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
|
`MigrateYAMLToSQLite(yamlPath, sqlitePath)` reads an existing YAML state file
|
||||||
state file and writes all quotas and usage state to a new SQLite database. The
|
and writes all quotas and usage state to a new SQLite database. The function is
|
||||||
function is idempotent — running it again on the same YAML file overwrites the
|
idempotent -- running it again overwrites the SQLite database state.
|
||||||
SQLite database state.
|
|
||||||
|
|
||||||
Typical one-time migration:
|
|
||||||
|
|
||||||
```go
|
```go
|
||||||
err := ratelimit.MigrateYAMLToSQLite(
|
err := ratelimit.MigrateYAMLToSQLite(
|
||||||
|
|
@ -258,29 +337,59 @@ err := ratelimit.MigrateYAMLToSQLite(
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
After migration, switch the constructor:
|
After migration, switch the constructor from `New()` to `NewWithSQLite()`. The
|
||||||
|
YAML file can be kept as a backup; the two backends do not share state.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Iterators
|
||||||
|
|
||||||
|
Two Go 1.26+ iterators are provided for inspecting the limiter state:
|
||||||
|
|
||||||
|
- `Models() iter.Seq[string]` -- returns a sorted sequence of all model names
|
||||||
|
(from both `Quotas` and `State` maps, deduplicated).
|
||||||
|
- `Iter() iter.Seq2[string, ModelStats]` -- returns sorted model names paired
|
||||||
|
with their current `ModelStats` snapshot.
|
||||||
|
|
||||||
```go
|
```go
|
||||||
// Before
|
for model, stats := range rl.Iter() {
|
||||||
rl, _ := ratelimit.New()
|
fmt.Printf("%s: %d/%d RPM, %d/%d TPM\n",
|
||||||
|
model, stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM)
|
||||||
// After
|
}
|
||||||
rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
|
|
||||||
defer rl.Close()
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The YAML file can be kept as a backup; the two backends do not share state.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## CountTokens
|
## CountTokens
|
||||||
|
|
||||||
`CountTokens(apiKey, model, text string) (int, error)` calls the Google
|
`CountTokens(ctx, apiKey, model, text)` calls the Google Generative Language API
|
||||||
Generative Language API to obtain an exact token count for a prompt string. It
|
to obtain an exact token count for a prompt string. It is Gemini-specific and
|
||||||
is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
|
hardcodes the `generativelanguage.googleapis.com` endpoint.
|
||||||
endpoint. The URL is not configurable, which prevents unit testing of the
|
|
||||||
success path without network access.
|
|
||||||
|
|
||||||
For other providers, callers must supply `estimatedTokens` directly to
|
For other providers, callers must supply `estimatedTokens` directly to
|
||||||
`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
|
`CanSend()`. Accurate token counts are typically available in API response
|
||||||
in API response metadata after a call completes.
|
metadata after a call completes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Concurrency Model
|
||||||
|
|
||||||
|
All reads and writes are protected by a single `sync.RWMutex` on the
|
||||||
|
`RateLimiter` struct.
|
||||||
|
|
||||||
|
| Method | Lock type | Reason |
|
||||||
|
|--------|-----------|--------|
|
||||||
|
| `CanSend()` | Write | Calls `prune()`, which mutates state slices |
|
||||||
|
| `RecordUsage()` | Write | Appends to state slices |
|
||||||
|
| `Reset()` | Write | Deletes state entries |
|
||||||
|
| `Load()` | Write | Replaces in-memory state |
|
||||||
|
| `SetQuota()` | Write | Modifies quota map |
|
||||||
|
| `AddProvider()` | Write | Modifies quota map |
|
||||||
|
| `Persist()` | Write (brief) | Clones state, then releases lock before I/O |
|
||||||
|
| `Stats()` | Write | Calls `prune()` |
|
||||||
|
| `AllStats()` | Write | Prunes inline |
|
||||||
|
| `Models()` | Read | Reads keys only |
|
||||||
|
|
||||||
|
`Persist()` minimises lock contention by cloning the state under a write lock,
|
||||||
|
then performing I/O after releasing the lock. The test suite passes clean under
|
||||||
|
`go test -race ./...` with 20 goroutines performing concurrent operations.
|
||||||
|
|
|
||||||
|
|
@ -1,9 +1,14 @@
|
||||||
|
---
|
||||||
|
title: Development Guide
|
||||||
|
description: How to build, test, and contribute to go-ratelimit -- prerequisites, test patterns, coding standards, and commit conventions.
|
||||||
|
---
|
||||||
|
|
||||||
# Development Guide
|
# Development Guide
|
||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- Go 1.25 or later (the module declares `go 1.25.5`)
|
- **Go 1.26** or later (the module declares `go 1.26.0`)
|
||||||
- No CGO required — `modernc.org/sqlite` is a pure Go port
|
- No CGO required -- `modernc.org/sqlite` is a pure Go port
|
||||||
|
|
||||||
No C toolchain, no system SQLite library, no external build tools. A plain
|
No C toolchain, no system SQLite library, no external build tools. A plain
|
||||||
`go build ./...` is sufficient.
|
`go build ./...` is sufficient.
|
||||||
|
|
@ -34,6 +39,9 @@ go test -bench=BenchmarkCanSend -benchmem ./...
|
||||||
# Check for vet issues
|
# Check for vet issues
|
||||||
go vet ./...
|
go vet ./...
|
||||||
|
|
||||||
|
# Lint (requires golangci-lint)
|
||||||
|
golangci-lint run ./...
|
||||||
|
|
||||||
# Tidy dependencies
|
# Tidy dependencies
|
||||||
go mod tidy
|
go mod tidy
|
||||||
```
|
```
|
||||||
|
|
@ -43,31 +51,35 @@ must produce no errors or warnings before a commit is pushed.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Test Patterns
|
## Test Organisation
|
||||||
|
|
||||||
### File Organisation
|
### File Layout
|
||||||
|
|
||||||
- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
|
| File | Scope |
|
||||||
- `sqlite_test.go` — Phase 2 (SQLite backend)
|
|------|-------|
|
||||||
|
| `ratelimit_test.go` | Core sliding window logic, provider profiles, concurrency, benchmarks |
|
||||||
|
| `sqlite_test.go` | SQLite backend, migration, concurrent persistence |
|
||||||
|
| `error_test.go` | SQLite and YAML error paths |
|
||||||
|
| `iter_test.go` | `Models()` and `Iter()` iterators, `CountTokens` edge cases |
|
||||||
|
|
||||||
Both files are in `package ratelimit` (white-box tests) so they can access
|
All test files are in `package ratelimit` (white-box tests), giving access to
|
||||||
unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
|
unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
|
||||||
|
|
||||||
### Naming Convention
|
### Naming Convention
|
||||||
|
|
||||||
SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
|
SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
|
||||||
|
|
||||||
- `_Good` — happy path
|
- `_Good` -- happy path
|
||||||
- `_Bad` — expected error conditions (invalid paths, corrupt input)
|
- `_Bad` -- expected error conditions (invalid paths, corrupt input)
|
||||||
- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
|
- `_Ugly` -- panic-adjacent edge cases (corrupt database files, truncated files)
|
||||||
|
|
||||||
Core logic tests use plain descriptive names without suffixes, grouped by
|
Core logic tests use plain descriptive names without suffixes, grouped by method
|
||||||
method with table-driven subtests.
|
with table-driven subtests.
|
||||||
|
|
||||||
### Test Helpers
|
### Test Helper
|
||||||
|
|
||||||
`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
|
`newTestLimiter(t)` creates a `RateLimiter` with Gemini defaults and redirects
|
||||||
redirects the YAML file path into `t.TempDir()`:
|
the YAML file path into `t.TempDir()`:
|
||||||
|
|
||||||
```go
|
```go
|
||||||
func newTestLimiter(t *testing.T) *RateLimiter {
|
func newTestLimiter(t *testing.T) *RateLimiter {
|
||||||
|
|
@ -86,43 +98,64 @@ after each test completes.
|
||||||
|
|
||||||
Tests use `github.com/stretchr/testify` exclusively:
|
Tests use `github.com/stretchr/testify` exclusively:
|
||||||
|
|
||||||
- `require.NoError(t, err)` — fail immediately on setup errors
|
- `require.NoError(t, err)` -- fail immediately on setup errors
|
||||||
- `assert.NoError(t, err)` — record failure but continue
|
- `assert.NoError(t, err)` -- record failure but continue
|
||||||
- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons
|
- `assert.Equal(t, expected, actual, "message")` -- prefer over raw comparisons
|
||||||
- `assert.True / assert.False` — for boolean checks
|
- `assert.True` / `assert.False` -- for boolean checks
|
||||||
- `assert.Empty / assert.Len` — for slice length checks
|
- `assert.Empty` / `assert.Len` -- for slice length checks
|
||||||
- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
|
- `assert.ErrorIs(t, err, target)` -- for sentinel errors
|
||||||
|
|
||||||
Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
|
Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
|
||||||
|
|
||||||
### Race Tests
|
### Concurrency Tests
|
||||||
|
|
||||||
Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
|
Race tests spin up goroutines and use `sync.WaitGroup`. Some assert specific
|
||||||
assert anything beyond absence of data races (the race detector does the work):
|
outcomes (e.g., correct RPD count after concurrent recordings), while others
|
||||||
|
rely solely on the race detector to catch data races:
|
||||||
|
|
||||||
```go
|
```go
|
||||||
var wg sync.WaitGroup
|
var wg sync.WaitGroup
|
||||||
for i := range 20 {
|
for range 20 {
|
||||||
wg.Add(1)
|
wg.Go(func() {
|
||||||
go func() {
|
for range 50 {
|
||||||
defer wg.Done()
|
rl.CanSend(model, 10)
|
||||||
// concurrent operations
|
rl.RecordUsage(model, 5, 5)
|
||||||
}()
|
rl.Stats(model)
|
||||||
|
}
|
||||||
|
})
|
||||||
}
|
}
|
||||||
wg.Wait()
|
wg.Wait()
|
||||||
```
|
```
|
||||||
|
|
||||||
Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
|
Always run concurrency tests with `-race`.
|
||||||
clean.
|
|
||||||
|
### Benchmarks
|
||||||
|
|
||||||
|
The following benchmarks are included:
|
||||||
|
|
||||||
|
| Benchmark | What it measures |
|
||||||
|
|-----------|------------------|
|
||||||
|
| `BenchmarkCanSend` | CanSend with a 1,000-entry sliding window |
|
||||||
|
| `BenchmarkRecordUsage` | Recording usage on a single model |
|
||||||
|
| `BenchmarkCanSendConcurrent` | Parallel CanSend across goroutines |
|
||||||
|
| `BenchmarkCanSendWithPrune` | CanSend with 500 old + 500 new entries |
|
||||||
|
| `BenchmarkStats` | Stats retrieval with a 1,000-entry window |
|
||||||
|
| `BenchmarkAllStats` | AllStats across 5 models x 200 entries each |
|
||||||
|
| `BenchmarkPersist` | YAML persistence I/O |
|
||||||
|
| `BenchmarkSQLitePersist` | SQLite persistence I/O |
|
||||||
|
| `BenchmarkSQLiteLoad` | SQLite state loading |
|
||||||
|
|
||||||
### Coverage
|
### Coverage
|
||||||
|
|
||||||
Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
|
Current coverage: 95.1%. The remaining paths cannot be covered in unit tests
|
||||||
be covered in unit tests without modifying the production code:
|
without modifying production code:
|
||||||
|
|
||||||
1. `CountTokens` success path — hardcoded Google API URL requires network access
|
1. `CountTokens` success path -- the Google API URL is hardcoded; unit tests
|
||||||
2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
|
cannot intercept the HTTP call without URL injection support.
|
||||||
3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
|
2. `yaml.Marshal` error path in `Persist()` -- `yaml.Marshal` does not fail on
|
||||||
|
valid Go structs.
|
||||||
|
3. `os.UserHomeDir()` error path in `NewWithConfig()` -- triggered only when
|
||||||
|
`$HOME` is unset, which test infrastructure prevents.
|
||||||
|
|
||||||
Do not lower coverage below 95% without a documented reason.
|
Do not lower coverage below 95% without a documented reason.
|
||||||
|
|
||||||
|
|
@ -137,26 +170,25 @@ Do not use American spellings in identifiers, comments, or documentation.
|
||||||
|
|
||||||
### Go Style
|
### Go Style
|
||||||
|
|
||||||
- All exported types, functions, and fields must have doc comments
|
- All exported types, functions, and fields must have doc comments.
|
||||||
- Error strings must be lowercase and not end with punctuation (Go convention)
|
- Error strings must be lowercase and not end with punctuation (Go convention).
|
||||||
- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
|
- Contextual errors use `fmt.Errorf("ratelimit.Function: what: %w", err)` so
|
||||||
prefix `ratelimit.` is included so errors identify their origin clearly
|
errors identify their origin clearly.
|
||||||
- No `init()` functions
|
- No `init()` functions.
|
||||||
- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
|
- No global mutable state. `DefaultProfiles()` returns a fresh map on each call.
|
||||||
map on each call)
|
|
||||||
|
|
||||||
### Mutex Discipline
|
### Mutex Discipline
|
||||||
|
|
||||||
The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
|
The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
|
||||||
|
|
||||||
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
|
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), even
|
||||||
even if they appear read-only, because `prune()` mutates slices
|
if they appear read-only, because `prune()` mutates state slices.
|
||||||
- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
|
- `Persist()` acquires the write lock briefly to clone state, then releases it
|
||||||
snapshot of state
|
before performing I/O.
|
||||||
- Lock acquisition always happens at the top of the public method, never inside
|
- Lock acquisition always happens at the top of the public method, never inside
|
||||||
a helper — helpers document "Caller must hold the lock"
|
a helper. Helpers document "Caller must hold the lock".
|
||||||
- Never call a public method from inside another public method while holding
|
- Never call a public method from inside another public method while holding the
|
||||||
the lock (deadlock risk)
|
lock (deadlock risk).
|
||||||
|
|
||||||
### Dependencies
|
### Dependencies
|
||||||
|
|
||||||
|
|
@ -175,9 +207,8 @@ client libraries; the existing `CountTokens` function uses the standard library.
|
||||||
|
|
||||||
## Licence
|
## Licence
|
||||||
|
|
||||||
EUPL-1.2. Every new source file must carry the standard header if the project
|
EUPL-1.2. Confirm with the project lead before adding files under a different
|
||||||
adopts per-file headers in future. Confirm with the project lead before adding
|
licence.
|
||||||
files under a different licence.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -205,3 +236,17 @@ Co-Authored-By: Virgil <virgil@lethean.io>
|
||||||
|
|
||||||
Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
|
Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
|
||||||
pass. `go mod tidy` must produce no changes.
|
pass. `go mod tidy` must produce no changes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Linting
|
||||||
|
|
||||||
|
The project uses `golangci-lint` with the following enabled linters (see
|
||||||
|
`.golangci.yml`):
|
||||||
|
|
||||||
|
- `govet`, `errcheck`, `staticcheck`, `unused`, `gosimple`
|
||||||
|
- `ineffassign`, `typecheck`, `gocritic`, `gofmt`
|
||||||
|
|
||||||
|
Disabled linters: `exhaustive`, `wrapcheck`.
|
||||||
|
|
||||||
|
Run `golangci-lint run ./...` to check before committing.
|
||||||
|
|
|
||||||
117
docs/index.md
Normal file
117
docs/index.md
Normal file
|
|
@ -0,0 +1,117 @@
|
||||||
|
---
|
||||||
|
title: go-ratelimit
|
||||||
|
description: Provider-agnostic sliding window rate limiter for LLM API calls, with YAML and SQLite persistence backends.
|
||||||
|
---
|
||||||
|
|
||||||
|
# go-ratelimit
|
||||||
|
|
||||||
|
**Module**: `forge.lthn.ai/core/go-ratelimit`
|
||||||
|
**Licence**: EUPL-1.2
|
||||||
|
**Go version**: 1.26+
|
||||||
|
|
||||||
|
go-ratelimit enforces requests-per-minute (RPM), tokens-per-minute (TPM), and
|
||||||
|
requests-per-day (RPD) quotas on a per-model basis using an in-memory sliding
|
||||||
|
window. It ships with default quota profiles for Gemini, OpenAI, Anthropic, and
|
||||||
|
a local inference provider. State persists across process restarts via YAML
|
||||||
|
(single-process) or SQLite with WAL mode (multi-process). A YAML-to-SQLite
|
||||||
|
migration helper is included.
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```go
|
||||||
|
import "forge.lthn.ai/core/go-ratelimit"
|
||||||
|
|
||||||
|
// Create a limiter with Gemini defaults (YAML backend).
|
||||||
|
rl, err := ratelimit.New()
|
||||||
|
if err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check capacity before sending.
|
||||||
|
if rl.CanSend("gemini-2.0-flash", 1500) {
|
||||||
|
// Make the API call...
|
||||||
|
rl.RecordUsage("gemini-2.0-flash", 1000, 500) // promptTokens, outputTokens
|
||||||
|
}
|
||||||
|
|
||||||
|
// Persist state to disk for recovery across restarts.
|
||||||
|
if err := rl.Persist(); err != nil {
|
||||||
|
log.Printf("persist failed: %v", err)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-provider configuration
|
||||||
|
|
||||||
|
```go
|
||||||
|
rl, err := ratelimit.NewWithConfig(ratelimit.Config{
|
||||||
|
Providers: []ratelimit.Provider{
|
||||||
|
ratelimit.ProviderGemini,
|
||||||
|
ratelimit.ProviderAnthropic,
|
||||||
|
},
|
||||||
|
Quotas: map[string]ratelimit.ModelQuota{
|
||||||
|
// Override a specific model's limits.
|
||||||
|
"gemini-3-pro-preview": {MaxRPM: 50, MaxTPM: 500000, MaxRPD: 200},
|
||||||
|
// Add a custom model not in any profile.
|
||||||
|
"llama-3.3-70b": {MaxRPM: 5, MaxTPM: 50000, MaxRPD: 0},
|
||||||
|
},
|
||||||
|
})
|
||||||
|
```
|
||||||
|
|
||||||
|
### SQLite backend (multi-process safe)
|
||||||
|
|
||||||
|
```go
|
||||||
|
rl, err := ratelimit.NewWithSQLite("~/.core/ratelimits.db")
|
||||||
|
if err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
defer rl.Close()
|
||||||
|
|
||||||
|
// Load persisted state.
|
||||||
|
if err := rl.Load(); err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use exactly as the YAML backend -- CanSend, RecordUsage, Persist, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
### Blocking until capacity is available
|
||||||
|
|
||||||
|
```go
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
|
||||||
|
log.Printf("timed out waiting for capacity: %v", err)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// Capacity is available; proceed with the API call.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Package Layout
|
||||||
|
|
||||||
|
The module is a single package with no sub-packages.
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `ratelimit.go` | Core types (`RateLimiter`, `ModelQuota`, `Config`, `Provider`), sliding window logic, provider profiles, YAML persistence, `CountTokens` helper |
|
||||||
|
| `sqlite.go` | SQLite persistence backend (`sqliteStore`), schema creation, load/save operations |
|
||||||
|
| `ratelimit_test.go` | Tests for core logic, provider profiles, concurrency, and benchmarks |
|
||||||
|
| `sqlite_test.go` | Tests for SQLite backend, migration, and error recovery |
|
||||||
|
| `error_test.go` | Tests for SQLite and YAML error paths |
|
||||||
|
| `iter_test.go` | Tests for `Models()` and `Iter()` iterators, plus `CountTokens` edge cases |
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
| Dependency | Purpose | Category |
|
||||||
|
|------------|---------|----------|
|
||||||
|
| `gopkg.in/yaml.v3` | YAML serialisation for the legacy persistence backend | Direct |
|
||||||
|
| `modernc.org/sqlite` | Pure Go SQLite driver (no CGO required) | Direct |
|
||||||
|
| `github.com/stretchr/testify` | Test assertions (`assert`, `require`) | Test only |
|
||||||
|
|
||||||
|
All indirect dependencies are pulled in by `modernc.org/sqlite`. No C toolchain
|
||||||
|
or system SQLite library is needed.
|
||||||
|
|
||||||
|
## Further Reading
|
||||||
|
|
||||||
|
- [Architecture](architecture.md) -- sliding window algorithm, provider quotas, YAML and SQLite backends, concurrency model
|
||||||
|
- [Development](development.md) -- build commands, test patterns, coding standards, commit conventions
|
||||||
|
- [History](history.md) -- completed phases with commit hashes, known limitations
|
||||||
Loading…
Add table
Reference in a new issue