docs: add human-friendly documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 13:02:40 +00:00 · 2026-03-11 13:02:40 +00:00 · 9572425e89
commit 9572425e89
parent ae2cb96d38
3 changed files with 475 additions and 204 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -1,81 +1,72 @@
 ---
 title: Architecture
 description: Internals of go-ratelimit -- sliding window algorithm, provider quota system, persistence backends, and concurrency model.
 ---
 # Architecture
 go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
-three independent quota dimensions per model — requests per minute (RPM), tokens
+three independent quota dimensions per model -- requests per minute (RPM), tokens
-per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
+per minute (TPM), and requests per day (RPD) -- using an in-memory sliding window
 that can be persisted across process restarts via YAML or SQLite.
 Module path: `forge.lthn.ai/core/go-ratelimit`
 ---
-## Sliding Window Algorithm
+## Key Types
-The limiter maintains per-model `UsageStats` structs in memory:
+### RateLimiter
 The central struct. Holds the quota definitions, current usage state, a mutex for
 thread safety, and an optional SQLite backend reference.
 ```go
 type RateLimiter struct {
    mu       sync.RWMutex
    Quotas   map[string]ModelQuota  // per-model quota definitions
    State    map[string]*UsageStats // per-model sliding window state
    filePath string                 // YAML file path (ignored when SQLite is active)
    sqlite   *sqliteStore           // non-nil when using SQLite backend
 }
 ```
 ### ModelQuota
 Defines the rate limits for a single model. A zero value in any field means that
 dimension is unlimited.
 ```go
 type ModelQuota struct {
    MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited)
    MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited)
    MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
 }
 ```
 ### UsageStats
 Tracks the sliding window state for a single model.
 ```go
 type UsageStats struct {
    Requests []time.Time  // timestamps of recent requests (1-minute window)
    Tokens   []TokenEntry // token counts with timestamps (1-minute window)
-    DayStart time.Time    // when the current daily window started
+    DayStart time.Time    // when the current 24-hour window started
-    DayCount int          // total requests recorded since DayStart
+    DayCount int          // total requests since DayStart
 }
 type TokenEntry struct {
    Time  time.Time
    Count int       // prompt + output tokens for this request
 }
 ```
-Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
+### Config
-slices and discards entries older than `now - 1 minute`. Pruning is done
+
-in-place to avoid allocation on the hot path:
+Controls `RateLimiter` initialisation.
 ```go
 validReqs := 0
 for _, t := range stats.Requests {
    if t.After(window) {
        stats.Requests[validReqs] = t
        validReqs++
    }
 }
 stats.Requests = stats.Requests[:validReqs]
 ```
 The same loop runs for token entries. After pruning, `CanSend()` checks each
 quota dimension in priority order: RPD first (cheapest check), then RPM, then
 TPM. A zero value for any dimension means that dimension is unlimited. If all
 three are zero the model is treated as fully unlimited and the check short-circuits
 before touching any state.
 ### Daily Reset
 The daily counter resets automatically inside `prune()`. When
 `now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
 to the current time. This means the daily window is a rolling 24-hour period
 anchored to the first request of the day, not a calendar boundary.
 ### Concurrency
 All reads and writes are protected by a single `sync.RWMutex`. Methods that
 write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
 full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
 where possible. The `CanSend()` method acquires a write lock because it calls
 `prune()`, which mutates the state slices.
 `go test -race ./...` passes clean with 20 goroutines performing concurrent
 `CanSend()`, `RecordUsage()`, and `Stats()` calls.
 ---
 ## Provider and Quota Configuration
 ### Types
 ```go
 type Provider string          // "gemini", "openai", "anthropic", "local"
 type ModelQuota struct {
    MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
    MaxTPM int `yaml:"max_tpm"`
    MaxRPD int `yaml:"max_rpd"`
 }
 type Config struct {
    FilePath  string                 // default: ~/.core/ratelimits.yaml
    Backend   string                 // "yaml" (default) or "sqlite"
@ -84,32 +75,110 @@ type Config struct {
 }
 ```
-### Quota Resolution
+### Provider
-1. Provider profiles are loaded first (from `DefaultProfiles()`).
+A string type identifying an LLM provider. Four constants are defined:
-2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
+
 ```go
 type Provider string
 const (
    ProviderGemini    Provider = "gemini"
    ProviderOpenAI    Provider = "openai"
    ProviderAnthropic Provider = "anthropic"
    ProviderLocal     Provider = "local"
 )
 ```
 ---
 ## Sliding Window Algorithm
 Every call to `CanSend()` or `Stats()` first calls `prune()`, which removes
 entries older than one minute from the `Requests` and `Tokens` slices. Pruning
 is done in-place using `slices.DeleteFunc` to minimise allocations:
 ```go
 window := now.Add(-1 * time.Minute)
 stats.Requests = slices.DeleteFunc(stats.Requests, func(t time.Time) bool {
    return t.Before(window)
 })
 stats.Tokens = slices.DeleteFunc(stats.Tokens, func(t TokenEntry) bool {
    return t.Time.Before(window)
 })
 ```
 After pruning, `CanSend()` checks each quota dimension. If all three limits
 (RPM, TPM, RPD) are zero, the model is treated as fully unlimited and the
 check short-circuits before touching any state.
 The check order is: RPD, then RPM, then TPM. RPD is checked first because it
 is the cheapest comparison (a single integer). TPM is checked last because it
 requires summing the token counts in the sliding window.
 ### Daily Reset
 The daily counter resets automatically inside `prune()`. When
 `now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is
 updated to the current time. The daily window is a rolling 24-hour period
 anchored to the first request of the day, not a calendar boundary.
 ### Background Pruning
 `BackgroundPrune(interval)` starts a goroutine that periodically prunes all
 model states on a configurable interval. It returns a cancel function to stop
 the pruner:
 ```go
 stop := rl.BackgroundPrune(30 * time.Second)
 defer stop()
 ```
 This prevents memory growth in long-running processes where some models may
 accumulate stale entries between calls to `CanSend()`.
 ### Memory Cleanup
 When `prune()` empties both the `Requests` and `Tokens` slices for a model,
 and `DayCount` is also zero, the entire `UsageStats` entry is deleted from
 the `State` map. This prevents memory leaks from models that were used once
 and never again.
 ---
 ## Provider and Quota Configuration
 ### Quota Resolution Order
 1. Provider profiles are loaded first from `DefaultProfiles()`.
 2. Explicit `Config.Quotas` are merged on top using `maps.Copy`, overriding any
   matching model.
 3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
-`SetQuota()` and `AddProvider()` allow runtime modification; both are
+`SetQuota()` and `AddProvider()` allow runtime modification. Both acquire the
-mutex-protected. `AddProvider()` is additive — it does not remove existing
+write lock. `AddProvider()` is additive -- it does not remove existing quotas for
-quotas for models outside the new provider's profile.
+models outside the new provider's profile.
 ### Default Quotas (as of February 2026)
-| Provider  | Model                  | MaxRPM    | MaxTPM    | MaxRPD    |
+| Provider  | Model                  | MaxRPM    | MaxTPM      | MaxRPD    |
-|-----------|------------------------|-----------|-----------|-----------|
+|-----------|------------------------|-----------|-------------|-----------|
-| Gemini    | gemini-3-pro-preview   | 150       | 1,000,000 | 1,000     |
+| Gemini    | gemini-3-pro-preview   | 150       | 1,000,000   | 1,000     |
-| Gemini    | gemini-3-flash-preview | 150       | 1,000,000 | 1,000     |
+| Gemini    | gemini-3-flash-preview | 150       | 1,000,000   | 1,000     |
-| Gemini    | gemini-2.5-pro         | 150       | 1,000,000 | 1,000     |
+| Gemini    | gemini-2.5-pro         | 150       | 1,000,000   | 1,000     |
-| Gemini    | gemini-2.0-flash       | 150       | 1,000,000 | unlimited |
+| Gemini    | gemini-2.0-flash       | 150       | 1,000,000   | unlimited |
-| Gemini    | gemini-2.0-flash-lite  | unlimited | unlimited | unlimited |
+| Gemini    | gemini-2.0-flash-lite  | unlimited | unlimited   | unlimited |
-| OpenAI    | gpt-4o, gpt-4-turbo    | 500       | 30,000    | unlimited |
+| OpenAI    | gpt-4o                 | 500       | 30,000      | unlimited |
-| OpenAI    | gpt-4o-mini, o1-mini   | 500       | 200,000   | unlimited |
+| OpenAI    | gpt-4o-mini            | 500       | 200,000     | unlimited |
-| OpenAI    | o1, o3-mini            | 500       | varies    | unlimited |
+| OpenAI    | gpt-4-turbo            | 500       | 30,000      | unlimited |
-| Anthropic | claude-opus-4          | 50        | 40,000    | unlimited |
+| OpenAI    | o1                     | 500       | 30,000      | unlimited |
-| Anthropic | claude-sonnet-4        | 50        | 40,000    | unlimited |
+| OpenAI    | o1-mini                | 500       | 200,000     | unlimited |
-| Anthropic | claude-haiku-3.5       | 50        | 50,000    | unlimited |
+| OpenAI    | o3-mini                | 500       | 200,000     | unlimited |
-| Local     | (none by default)      | user-defined                          |
+| Anthropic | claude-opus-4          | 50        | 40,000      | unlimited |
 | Anthropic | claude-sonnet-4        | 50        | 40,000      | unlimited |
 | Anthropic | claude-haiku-3.5       | 50        | 50,000      | unlimited |
 | Local     | (none by default)      | user-defined                         |
 The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
 where the throttle limit is hardware rather than an API quota. No defaults are
@ -117,10 +186,54 @@ provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
 ---
-## YAML Persistence (Legacy)
+## Constructors
-The default backend serialises the entire `RateLimiter` struct — both the
+| Function | Backend | Default Provider |
-`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
+|----------|---------|------------------|
 | `New()` | YAML | Gemini |
 | `NewWithConfig(cfg)` | YAML | Configurable (Gemini if empty) |
 | `NewWithSQLite(dbPath)` | SQLite | Gemini |
 | `NewWithSQLiteConfig(dbPath, cfg)` | SQLite | Configurable (Gemini if empty) |
 `Close()` releases the database connection for SQLite-backed limiters. It is a
 no-op on YAML-backed limiters. Always call `Close()` (or `defer rl.Close()`)
 when using the SQLite backend.
 ---
 ## Data Flow
 A typical request lifecycle:
 ```
 1. CanSend(model, estimatedTokens)
   |-- acquires write lock
   |-- looks up ModelQuota for the model
   |-- if unknown model or all-zero quota: returns true (allowed)
   |-- calls prune(model) to discard stale entries
   |-- checks RPD, RPM, TPM against the pruned state
   '-- returns true/false
 2. (caller makes the API call)
 3. RecordUsage(model, promptTokens, outputTokens)
   |-- acquires write lock
   |-- calls prune(model)
   |-- appends to Requests and Tokens slices
   '-- increments DayCount
 4. Persist()
   |-- acquires write lock, clones state, releases lock
   |-- YAML: marshals to file
   '-- SQLite: saves quotas and state in transactions
 ```
 ---
 ## YAML Persistence
 The default backend serialises both the `Quotas` map and the `State` map to a
 YAML file at `~/.core/ratelimits.yaml` (configurable via `Config.FilePath`).
 ```yaml
 quotas:
@ -143,35 +256,30 @@ state:
 `Load()` treats a missing file as an empty state (no error). Corrupt or
 unreadable files return an error.
-**Limitations of YAML backend:**
+**Limitations of the YAML backend:**
 - Single-process only. Concurrent writes from multiple processes corrupt the
  file because the write is not atomic at the OS level.
- The entire state is serialised on every `Persist()` call, which grows linearly
+- The entire state is serialised on every `Persist()` call.
-  with the number of tracked models and entries.
+- Timestamps are serialised as RFC 3339 strings.
 - Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
  preserved by Go's time marshaller but depends on the YAML library.
 ---
 ## SQLite Backend
-The SQLite backend was added in Phase 2 to support multi-process scenarios and
+The SQLite backend supports multi-process scenarios. It uses `modernc.org/sqlite`,
-provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
+a pure Go port of SQLite that compiles without CGO.
 Go port of SQLite that compiles without CGO.
 ### Connection Settings
 ```go
-db.SetMaxOpenConns(1)                      // single connection for PRAGMA consistency
+db.SetMaxOpenConns(1)                  // single connection for PRAGMA consistency
-db.Exec("PRAGMA journal_mode=WAL")         // WAL mode for concurrent readers
+db.Exec("PRAGMA journal_mode=WAL")     // concurrent readers alongside a single writer
-db.Exec("PRAGMA busy_timeout=5000")        // 5-second busy timeout
+db.Exec("PRAGMA busy_timeout=5000")    // 5-second wait on lock contention
 ```
 WAL mode allows one writer and multiple concurrent readers. The 5-second busy
-timeout prevents immediate failure when a second process is mid-commit. A single
+timeout prevents immediate failure when a second process is mid-commit.
 `sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
 at the file level; multiple Go connections to the same file through a single
 process would not add throughput but would complicate locking.
 ### Schema
@ -196,7 +304,7 @@ CREATE TABLE IF NOT EXISTS tokens (
 CREATE TABLE IF NOT EXISTS daily (
    model     TEXT PRIMARY KEY,
-    day_start INTEGER NOT NULL,   -- UnixNano
+    day_start INTEGER NOT NULL,    -- UnixNano
    day_count INTEGER NOT NULL DEFAULT 0
 );
@ -205,51 +313,22 @@ CREATE INDEX IF NOT EXISTS idx_tokens_model_ts   ON tokens(model, ts);
 ```
 Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
-precision without relying on SQLite's text date format, and allows efficient
+precision and allows efficient range queries using the composite indices.
 range queries using the composite indices.
 ### Save Strategy
-`saveState()` uses a delete-then-insert pattern inside a single transaction.
+- **Quotas**: `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert). Existing quota
-All three state tables are truncated and rewritten atomically:
+  rows are updated in place without deleting unrelated models.
-
+- **State**: Delete-then-insert inside a single transaction. All three state
-```go
+  tables (`requests`, `tokens`, `daily`) are truncated and rewritten atomically.
 tx.Exec("DELETE FROM requests")
 tx.Exec("DELETE FROM tokens")
 tx.Exec("DELETE FROM daily")
 // then INSERT for every model in state
 tx.Commit()
 ```
 `saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
 existing quota rows are updated in place without deleting unrelated models.
 ### Constructors
 ```go
 // YAML backend (default)
 rl, err := ratelimit.New()
 rl, err := ratelimit.NewWithConfig(cfg)
 // SQLite backend
 rl, err := ratelimit.NewWithSQLite(dbPath)
 rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
 defer rl.Close()  // releases the database connection
 ```
 `Close()` is a no-op on YAML-backed limiters.
 ---
 ## Migration Path
-`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
+`MigrateYAMLToSQLite(yamlPath, sqlitePath)` reads an existing YAML state file
-state file and writes all quotas and usage state to a new SQLite database. The
+and writes all quotas and usage state to a new SQLite database. The function is
-function is idempotent — running it again on the same YAML file overwrites the
+idempotent -- running it again overwrites the SQLite database state.
 SQLite database state.
 Typical one-time migration:
 ```go
 err := ratelimit.MigrateYAMLToSQLite(
@ -258,29 +337,59 @@ err := ratelimit.MigrateYAMLToSQLite(
 )
 ```
-After migration, switch the constructor:
+After migration, switch the constructor from `New()` to `NewWithSQLite()`. The
 YAML file can be kept as a backup; the two backends do not share state.
 ---
 ## Iterators
 Two Go 1.26+ iterators are provided for inspecting the limiter state:
 - `Models() iter.Seq[string]` -- returns a sorted sequence of all model names
  (from both `Quotas` and `State` maps, deduplicated).
 - `Iter() iter.Seq2[string, ModelStats]` -- returns sorted model names paired
  with their current `ModelStats` snapshot.
 ```go
-// Before
+for model, stats := range rl.Iter() {
-rl, _ := ratelimit.New()
+    fmt.Printf("%s: %d/%d RPM, %d/%d TPM\n",
-
+        model, stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM)
-// After
+}
 rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
 defer rl.Close()
 ```
 The YAML file can be kept as a backup; the two backends do not share state.
 ---
 ## CountTokens
-`CountTokens(apiKey, model, text string) (int, error)` calls the Google
+`CountTokens(ctx, apiKey, model, text)` calls the Google Generative Language API
-Generative Language API to obtain an exact token count for a prompt string. It
+to obtain an exact token count for a prompt string. It is Gemini-specific and
-is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
+hardcodes the `generativelanguage.googleapis.com` endpoint.
 endpoint. The URL is not configurable, which prevents unit testing of the
 success path without network access.
 For other providers, callers must supply `estimatedTokens` directly to
-`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
+`CanSend()`. Accurate token counts are typically available in API response
-in API response metadata after a call completes.
+metadata after a call completes.
 ---
 ## Concurrency Model
 All reads and writes are protected by a single `sync.RWMutex` on the
 `RateLimiter` struct.
 | Method | Lock type | Reason |
 |--------|-----------|--------|
 | `CanSend()` | Write | Calls `prune()`, which mutates state slices |
 | `RecordUsage()` | Write | Appends to state slices |
 | `Reset()` | Write | Deletes state entries |
 | `Load()` | Write | Replaces in-memory state |
 | `SetQuota()` | Write | Modifies quota map |
 | `AddProvider()` | Write | Modifies quota map |
 | `Persist()` | Write (brief) | Clones state, then releases lock before I/O |
 | `Stats()` | Write | Calls `prune()` |
 | `AllStats()` | Write | Prunes inline |
 | `Models()` | Read | Reads keys only |
 `Persist()` minimises lock contention by cloning the state under a write lock,
 then performing I/O after releasing the lock. The test suite passes clean under
 `go test -race ./...` with 20 goroutines performing concurrent operations.
--- a/docs/development.md
+++ b/docs/development.md
@ -1,9 +1,14 @@
 ---
 title: Development Guide
 description: How to build, test, and contribute to go-ratelimit -- prerequisites, test patterns, coding standards, and commit conventions.
 ---
 # Development Guide
 ## Prerequisites
- Go 1.25 or later (the module declares `go 1.25.5`)
+- **Go 1.26** or later (the module declares `go 1.26.0`)
- No CGO required — `modernc.org/sqlite` is a pure Go port
+- No CGO required -- `modernc.org/sqlite` is a pure Go port
 No C toolchain, no system SQLite library, no external build tools. A plain
 `go build ./...` is sufficient.
@ -34,6 +39,9 @@ go test -bench=BenchmarkCanSend -benchmem ./...
 # Check for vet issues
 go vet ./...
 # Lint (requires golangci-lint)
 golangci-lint run ./...
 # Tidy dependencies
 go mod tidy
 ```
@ -43,31 +51,35 @@ must produce no errors or warnings before a commit is pushed.
 ---
-## Test Patterns
+## Test Organisation
-### File Organisation
+### File Layout
- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
+| File | Scope |
- `sqlite_test.go` — Phase 2 (SQLite backend)
+|------|-------|
 | `ratelimit_test.go` | Core sliding window logic, provider profiles, concurrency, benchmarks |
 | `sqlite_test.go` | SQLite backend, migration, concurrent persistence |
 | `error_test.go` | SQLite and YAML error paths |
 | `iter_test.go` | `Models()` and `Iter()` iterators, `CountTokens` edge cases |
-Both files are in `package ratelimit` (white-box tests) so they can access
+All test files are in `package ratelimit` (white-box tests), giving access to
 unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
 ### Naming Convention
 SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
- `_Good` — happy path
+- `_Good` -- happy path
- `_Bad` — expected error conditions (invalid paths, corrupt input)
+- `_Bad` -- expected error conditions (invalid paths, corrupt input)
- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
+- `_Ugly` -- panic-adjacent edge cases (corrupt database files, truncated files)
-Core logic tests use plain descriptive names without suffixes, grouped by
+Core logic tests use plain descriptive names without suffixes, grouped by method
-method with table-driven subtests.
+with table-driven subtests.
-### Test Helpers
+### Test Helper
-`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
+`newTestLimiter(t)` creates a `RateLimiter` with Gemini defaults and redirects
-redirects the YAML file path into `t.TempDir()`:
+the YAML file path into `t.TempDir()`:
 ```go
 func newTestLimiter(t *testing.T) *RateLimiter {
@ -86,43 +98,64 @@ after each test completes.
 Tests use `github.com/stretchr/testify` exclusively:
- `require.NoError(t, err)` — fail immediately on setup errors
+- `require.NoError(t, err)` -- fail immediately on setup errors
- `assert.NoError(t, err)` — record failure but continue
+- `assert.NoError(t, err)` -- record failure but continue
- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons
+- `assert.Equal(t, expected, actual, "message")` -- prefer over raw comparisons
- `assert.True / assert.False` — for boolean checks
+- `assert.True` / `assert.False` -- for boolean checks
- `assert.Empty / assert.Len` — for slice length checks
+- `assert.Empty` / `assert.Len` -- for slice length checks
- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
+- `assert.ErrorIs(t, err, target)` -- for sentinel errors
 Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
-### Race Tests
+### Concurrency Tests
-Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
+Race tests spin up goroutines and use `sync.WaitGroup`. Some assert specific
-assert anything beyond absence of data races (the race detector does the work):
+outcomes (e.g., correct RPD count after concurrent recordings), while others
 rely solely on the race detector to catch data races:
 ```go
 var wg sync.WaitGroup
-for i := range 20 {
+for range 20 {
-    wg.Add(1)
+    wg.Go(func() {
-    go func() {
+        for range 50 {
-        defer wg.Done()
+            rl.CanSend(model, 10)
-        // concurrent operations
+            rl.RecordUsage(model, 5, 5)
-    }()
+            rl.Stats(model)
        }
    })
 }
 wg.Wait()
 ```
-Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
+Always run concurrency tests with `-race`.
-clean.
+
 ### Benchmarks
 The following benchmarks are included:
 | Benchmark | What it measures |
 |-----------|------------------|
 | `BenchmarkCanSend` | CanSend with a 1,000-entry sliding window |
 | `BenchmarkRecordUsage` | Recording usage on a single model |
 | `BenchmarkCanSendConcurrent` | Parallel CanSend across goroutines |
 | `BenchmarkCanSendWithPrune` | CanSend with 500 old + 500 new entries |
 | `BenchmarkStats` | Stats retrieval with a 1,000-entry window |
 | `BenchmarkAllStats` | AllStats across 5 models x 200 entries each |
 | `BenchmarkPersist` | YAML persistence I/O |
 | `BenchmarkSQLitePersist` | SQLite persistence I/O |
 | `BenchmarkSQLiteLoad` | SQLite state loading |
 ### Coverage
-Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
+Current coverage: 95.1%. The remaining paths cannot be covered in unit tests
-be covered in unit tests without modifying the production code:
+without modifying production code:
-1. `CountTokens` success path — hardcoded Google API URL requires network access
+1. `CountTokens` success path -- the Google API URL is hardcoded; unit tests
-2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
+   cannot intercept the HTTP call without URL injection support.
-3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
+2. `yaml.Marshal` error path in `Persist()` -- `yaml.Marshal` does not fail on
   valid Go structs.
 3. `os.UserHomeDir()` error path in `NewWithConfig()` -- triggered only when
   `$HOME` is unset, which test infrastructure prevents.
 Do not lower coverage below 95% without a documented reason.
@ -137,26 +170,25 @@ Do not use American spellings in identifiers, comments, or documentation.
 ### Go Style
- All exported types, functions, and fields must have doc comments
+- All exported types, functions, and fields must have doc comments.
- Error strings must be lowercase and not end with punctuation (Go convention)
+- Error strings must be lowercase and not end with punctuation (Go convention).
- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
+- Contextual errors use `fmt.Errorf("ratelimit.Function: what: %w", err)` so
-  prefix `ratelimit.` is included so errors identify their origin clearly
+  errors identify their origin clearly.
- No `init()` functions
+- No `init()` functions.
- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
+- No global mutable state. `DefaultProfiles()` returns a fresh map on each call.
  map on each call)
 ### Mutex Discipline
 The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
+- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), even
-  even if they appear read-only, because `prune()` mutates slices
+  if they appear read-only, because `prune()` mutates state slices.
- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
+- `Persist()` acquires the write lock briefly to clone state, then releases it
-  snapshot of state
+  before performing I/O.
 - Lock acquisition always happens at the top of the public method, never inside
-  a helper — helpers document "Caller must hold the lock"
+  a helper. Helpers document "Caller must hold the lock".
- Never call a public method from inside another public method while holding
+- Never call a public method from inside another public method while holding the
-  the lock (deadlock risk)
+  lock (deadlock risk).
 ### Dependencies
@ -175,9 +207,8 @@ client libraries; the existing `CountTokens` function uses the standard library.
 ## Licence
-EUPL-1.2. Every new source file must carry the standard header if the project
+EUPL-1.2. Confirm with the project lead before adding files under a different
-adopts per-file headers in future. Confirm with the project lead before adding
+licence.
 files under a different licence.
 ---
@ -205,3 +236,17 @@ Co-Authored-By: Virgil <virgil@lethean.io>
 Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
 pass. `go mod tidy` must produce no changes.
 ---
 ## Linting
 The project uses `golangci-lint` with the following enabled linters (see
 `.golangci.yml`):
 - `govet`, `errcheck`, `staticcheck`, `unused`, `gosimple`
 - `ineffassign`, `typecheck`, `gocritic`, `gofmt`
 Disabled linters: `exhaustive`, `wrapcheck`.
 Run `golangci-lint run ./...` to check before committing.
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1,117 @@
 ---
 title: go-ratelimit
 description: Provider-agnostic sliding window rate limiter for LLM API calls, with YAML and SQLite persistence backends.
 ---
 # go-ratelimit
 **Module**: `forge.lthn.ai/core/go-ratelimit`
 **Licence**: EUPL-1.2
 **Go version**: 1.26+
 go-ratelimit enforces requests-per-minute (RPM), tokens-per-minute (TPM), and
 requests-per-day (RPD) quotas on a per-model basis using an in-memory sliding
 window. It ships with default quota profiles for Gemini, OpenAI, Anthropic, and
 a local inference provider. State persists across process restarts via YAML
 (single-process) or SQLite with WAL mode (multi-process). A YAML-to-SQLite
 migration helper is included.
 ## Quick Start
 ```go
 import "forge.lthn.ai/core/go-ratelimit"
 // Create a limiter with Gemini defaults (YAML backend).
 rl, err := ratelimit.New()
 if err != nil {
    log.Fatal(err)
 }
 // Check capacity before sending.
 if rl.CanSend("gemini-2.0-flash", 1500) {
    // Make the API call...
    rl.RecordUsage("gemini-2.0-flash", 1000, 500) // promptTokens, outputTokens
 }
 // Persist state to disk for recovery across restarts.
 if err := rl.Persist(); err != nil {
    log.Printf("persist failed: %v", err)
 }
 ```
 ### Multi-provider configuration
 ```go
 rl, err := ratelimit.NewWithConfig(ratelimit.Config{
    Providers: []ratelimit.Provider{
        ratelimit.ProviderGemini,
        ratelimit.ProviderAnthropic,
    },
    Quotas: map[string]ratelimit.ModelQuota{
        // Override a specific model's limits.
        "gemini-3-pro-preview": {MaxRPM: 50, MaxTPM: 500000, MaxRPD: 200},
        // Add a custom model not in any profile.
        "llama-3.3-70b": {MaxRPM: 5, MaxTPM: 50000, MaxRPD: 0},
    },
 })
 ```
 ### SQLite backend (multi-process safe)
 ```go
 rl, err := ratelimit.NewWithSQLite("~/.core/ratelimits.db")
 if err != nil {
    log.Fatal(err)
 }
 defer rl.Close()
 // Load persisted state.
 if err := rl.Load(); err != nil {
    log.Fatal(err)
 }
 // Use exactly as the YAML backend -- CanSend, RecordUsage, Persist, etc.
 ```
 ### Blocking until capacity is available
 ```go
 ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
 defer cancel()
 if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
    log.Printf("timed out waiting for capacity: %v", err)
    return
 }
 // Capacity is available; proceed with the API call.
 ```
 ## Package Layout
 The module is a single package with no sub-packages.
 | File | Purpose |
 |------|---------|
 | `ratelimit.go` | Core types (`RateLimiter`, `ModelQuota`, `Config`, `Provider`), sliding window logic, provider profiles, YAML persistence, `CountTokens` helper |
 | `sqlite.go` | SQLite persistence backend (`sqliteStore`), schema creation, load/save operations |
 | `ratelimit_test.go` | Tests for core logic, provider profiles, concurrency, and benchmarks |
 | `sqlite_test.go` | Tests for SQLite backend, migration, and error recovery |
 | `error_test.go` | Tests for SQLite and YAML error paths |
 | `iter_test.go` | Tests for `Models()` and `Iter()` iterators, plus `CountTokens` edge cases |
 ## Dependencies
 | Dependency | Purpose | Category |
 |------------|---------|----------|
 | `gopkg.in/yaml.v3` | YAML serialisation for the legacy persistence backend | Direct |
 | `modernc.org/sqlite` | Pure Go SQLite driver (no CGO required) | Direct |
 | `github.com/stretchr/testify` | Test assertions (`assert`, `require`) | Test only |
 All indirect dependencies are pulled in by `modernc.org/sqlite`. No C toolchain
 or system SQLite library is needed.
 ## Further Reading
 - [Architecture](architecture.md) -- sliding window algorithm, provider quotas, YAML and SQLite backends, concurrency model
 - [Development](development.md) -- build commands, test patterns, coding standards, commit conventions
 - [History](history.md) -- completed phases with commit hashes, known limitations