diff --git a/docs/architecture.md b/docs/architecture.md
index 4cefb7a..e3d9f42 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,81 +1,72 @@
+---
+title: Architecture
+description: Internals of go-ratelimit -- sliding window algorithm, provider quota system, persistence backends, and concurrency model.
+---
+
 # Architecture
 
 go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
-three independent quota dimensions per model — requests per minute (RPM), tokens
-per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
+three independent quota dimensions per model -- requests per minute (RPM), tokens
+per minute (TPM), and requests per day (RPD) -- using an in-memory sliding window
 that can be persisted across process restarts via YAML or SQLite.
 
 Module path: `forge.lthn.ai/core/go-ratelimit`
 
 ---
 
-## Sliding Window Algorithm
+## Key Types
 
-The limiter maintains per-model `UsageStats` structs in memory:
+### RateLimiter
+
+The central struct. Holds the quota definitions, current usage state, a mutex for
+thread safety, and an optional SQLite backend reference.
+
+```go
+type RateLimiter struct {
+    mu       sync.RWMutex
+    Quotas   map[string]ModelQuota  // per-model quota definitions
+    State    map[string]*UsageStats // per-model sliding window state
+    filePath string                 // YAML file path (ignored when SQLite is active)
+    sqlite   *sqliteStore           // non-nil when using SQLite backend
+}
+```
+
+### ModelQuota
+
+Defines the rate limits for a single model. A zero value in any field means that
+dimension is unlimited.
+
+```go
+type ModelQuota struct {
+    MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited)
+    MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited)
+    MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
+}
+```
+
+### UsageStats
+
+Tracks the sliding window state for a single model.
 
 ```go
 type UsageStats struct {
     Requests []time.Time  // timestamps of recent requests (1-minute window)
     Tokens   []TokenEntry // token counts with timestamps (1-minute window)
-    DayStart time.Time    // when the current daily window started
-    DayCount int          // total requests recorded since DayStart
+    DayStart time.Time    // when the current 24-hour window started
+    DayCount int          // total requests since DayStart
+}
+
+type TokenEntry struct {
+    Time  time.Time
+    Count int       // prompt + output tokens for this request
 }
 ```
 
-Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
-slices and discards entries older than `now - 1 minute`. Pruning is done
-in-place to avoid allocation on the hot path:
+### Config
+
+Controls `RateLimiter` initialisation.
 
 ```go
-validReqs := 0
-for _, t := range stats.Requests {
-    if t.After(window) {
-        stats.Requests[validReqs] = t
-        validReqs++
-    }
-}
-stats.Requests = stats.Requests[:validReqs]
-```
-
-The same loop runs for token entries. After pruning, `CanSend()` checks each
-quota dimension in priority order: RPD first (cheapest check), then RPM, then
-TPM. A zero value for any dimension means that dimension is unlimited. If all
-three are zero the model is treated as fully unlimited and the check short-circuits
-before touching any state.
-
-### Daily Reset
-
-The daily counter resets automatically inside `prune()`. When
-`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
-to the current time. This means the daily window is a rolling 24-hour period
-anchored to the first request of the day, not a calendar boundary.
-
-### Concurrency
-
-All reads and writes are protected by a single `sync.RWMutex`. Methods that
-write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
-full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
-where possible. The `CanSend()` method acquires a write lock because it calls
-`prune()`, which mutates the state slices.
-
-`go test -race ./...` passes clean with 20 goroutines performing concurrent
-`CanSend()`, `RecordUsage()`, and `Stats()` calls.
-
----
-
-## Provider and Quota Configuration
-
-### Types
-
-```go
-type Provider string          // "gemini", "openai", "anthropic", "local"
-
-type ModelQuota struct {
-    MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
-    MaxTPM int `yaml:"max_tpm"`
-    MaxRPD int `yaml:"max_rpd"`
-}
-
 type Config struct {
     FilePath  string                 // default: ~/.core/ratelimits.yaml
     Backend   string                 // "yaml" (default) or "sqlite"
@@ -84,32 +75,110 @@ type Config struct {
 }
 ```
 
-### Quota Resolution
+### Provider
 
-1. Provider profiles are loaded first (from `DefaultProfiles()`).
-2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
+A string type identifying an LLM provider. Four constants are defined:
+
+```go
+type Provider string
+
+const (
+    ProviderGemini    Provider = "gemini"
+    ProviderOpenAI    Provider = "openai"
+    ProviderAnthropic Provider = "anthropic"
+    ProviderLocal     Provider = "local"
+)
+```
+
+---
+
+## Sliding Window Algorithm
+
+Every call to `CanSend()` or `Stats()` first calls `prune()`, which removes
+entries older than one minute from the `Requests` and `Tokens` slices. Pruning
+is done in-place using `slices.DeleteFunc` to minimise allocations:
+
+```go
+window := now.Add(-1 * time.Minute)
+
+stats.Requests = slices.DeleteFunc(stats.Requests, func(t time.Time) bool {
+    return t.Before(window)
+})
+stats.Tokens = slices.DeleteFunc(stats.Tokens, func(t TokenEntry) bool {
+    return t.Time.Before(window)
+})
+```
+
+After pruning, `CanSend()` checks each quota dimension. If all three limits
+(RPM, TPM, RPD) are zero, the model is treated as fully unlimited and the
+check short-circuits before touching any state.
+
+The check order is: RPD, then RPM, then TPM. RPD is checked first because it
+is the cheapest comparison (a single integer). TPM is checked last because it
+requires summing the token counts in the sliding window.
+
+### Daily Reset
+
+The daily counter resets automatically inside `prune()`. When
+`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is
+updated to the current time. The daily window is a rolling 24-hour period
+anchored to the first request of the day, not a calendar boundary.
+
+### Background Pruning
+
+`BackgroundPrune(interval)` starts a goroutine that periodically prunes all
+model states on a configurable interval. It returns a cancel function to stop
+the pruner:
+
+```go
+stop := rl.BackgroundPrune(30 * time.Second)
+defer stop()
+```
+
+This prevents memory growth in long-running processes where some models may
+accumulate stale entries between calls to `CanSend()`.
+
+### Memory Cleanup
+
+When `prune()` empties both the `Requests` and `Tokens` slices for a model,
+and `DayCount` is also zero, the entire `UsageStats` entry is deleted from
+the `State` map. This prevents memory leaks from models that were used once
+and never again.
+
+---
+
+## Provider and Quota Configuration
+
+### Quota Resolution Order
+
+1. Provider profiles are loaded first from `DefaultProfiles()`.
+2. Explicit `Config.Quotas` are merged on top using `maps.Copy`, overriding any
+   matching model.
 3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
 
-`SetQuota()` and `AddProvider()` allow runtime modification; both are
-mutex-protected. `AddProvider()` is additive — it does not remove existing
-quotas for models outside the new provider's profile.
+`SetQuota()` and `AddProvider()` allow runtime modification. Both acquire the
+write lock. `AddProvider()` is additive -- it does not remove existing quotas for
+models outside the new provider's profile.
 
 ### Default Quotas (as of February 2026)
 
-| Provider  | Model                  | MaxRPM    | MaxTPM    | MaxRPD    |
-|-----------|------------------------|-----------|-----------|-----------|
-| Gemini    | gemini-3-pro-preview   | 150       | 1,000,000 | 1,000     |
-| Gemini    | gemini-3-flash-preview | 150       | 1,000,000 | 1,000     |
-| Gemini    | gemini-2.5-pro         | 150       | 1,000,000 | 1,000     |
-| Gemini    | gemini-2.0-flash       | 150       | 1,000,000 | unlimited |
-| Gemini    | gemini-2.0-flash-lite  | unlimited | unlimited | unlimited |
-| OpenAI    | gpt-4o, gpt-4-turbo    | 500       | 30,000    | unlimited |
-| OpenAI    | gpt-4o-mini, o1-mini   | 500       | 200,000   | unlimited |
-| OpenAI    | o1, o3-mini            | 500       | varies    | unlimited |
-| Anthropic | claude-opus-4          | 50        | 40,000    | unlimited |
-| Anthropic | claude-sonnet-4        | 50        | 40,000    | unlimited |
-| Anthropic | claude-haiku-3.5       | 50        | 50,000    | unlimited |
-| Local     | (none by default)      | user-defined                          |
+| Provider  | Model                  | MaxRPM    | MaxTPM      | MaxRPD    |
+|-----------|------------------------|-----------|-------------|-----------|
+| Gemini    | gemini-3-pro-preview   | 150       | 1,000,000   | 1,000     |
+| Gemini    | gemini-3-flash-preview | 150       | 1,000,000   | 1,000     |
+| Gemini    | gemini-2.5-pro         | 150       | 1,000,000   | 1,000     |
+| Gemini    | gemini-2.0-flash       | 150       | 1,000,000   | unlimited |
+| Gemini    | gemini-2.0-flash-lite  | unlimited | unlimited   | unlimited |
+| OpenAI    | gpt-4o                 | 500       | 30,000      | unlimited |
+| OpenAI    | gpt-4o-mini            | 500       | 200,000     | unlimited |
+| OpenAI    | gpt-4-turbo            | 500       | 30,000      | unlimited |
+| OpenAI    | o1                     | 500       | 30,000      | unlimited |
+| OpenAI    | o1-mini                | 500       | 200,000     | unlimited |
+| OpenAI    | o3-mini                | 500       | 200,000     | unlimited |
+| Anthropic | claude-opus-4          | 50        | 40,000      | unlimited |
+| Anthropic | claude-sonnet-4        | 50        | 40,000      | unlimited |
+| Anthropic | claude-haiku-3.5       | 50        | 50,000      | unlimited |
+| Local     | (none by default)      | user-defined                         |
 
 The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
 where the throttle limit is hardware rather than an API quota. No defaults are
@@ -117,10 +186,54 @@ provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
 
 ---
 
-## YAML Persistence (Legacy)
+## Constructors
 
-The default backend serialises the entire `RateLimiter` struct — both the
-`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
+| Function | Backend | Default Provider |
+|----------|---------|------------------|
+| `New()` | YAML | Gemini |
+| `NewWithConfig(cfg)` | YAML | Configurable (Gemini if empty) |
+| `NewWithSQLite(dbPath)` | SQLite | Gemini |
+| `NewWithSQLiteConfig(dbPath, cfg)` | SQLite | Configurable (Gemini if empty) |
+
+`Close()` releases the database connection for SQLite-backed limiters. It is a
+no-op on YAML-backed limiters. Always call `Close()` (or `defer rl.Close()`)
+when using the SQLite backend.
+
+---
+
+## Data Flow
+
+A typical request lifecycle:
+
+```
+1. CanSend(model, estimatedTokens)
+   |-- acquires write lock
+   |-- looks up ModelQuota for the model
+   |-- if unknown model or all-zero quota: returns true (allowed)
+   |-- calls prune(model) to discard stale entries
+   |-- checks RPD, RPM, TPM against the pruned state
+   '-- returns true/false
+
+2. (caller makes the API call)
+
+3. RecordUsage(model, promptTokens, outputTokens)
+   |-- acquires write lock
+   |-- calls prune(model)
+   |-- appends to Requests and Tokens slices
+   '-- increments DayCount
+
+4. Persist()
+   |-- acquires write lock, clones state, releases lock
+   |-- YAML: marshals to file
+   '-- SQLite: saves quotas and state in transactions
+```
+
+---
+
+## YAML Persistence
+
+The default backend serialises both the `Quotas` map and the `State` map to a
+YAML file at `~/.core/ratelimits.yaml` (configurable via `Config.FilePath`).
 
 ```yaml
 quotas:
@@ -143,35 +256,30 @@ state:
 `Load()` treats a missing file as an empty state (no error). Corrupt or
 unreadable files return an error.
 
-**Limitations of YAML backend:**
+**Limitations of the YAML backend:**
+
 - Single-process only. Concurrent writes from multiple processes corrupt the
   file because the write is not atomic at the OS level.
-- The entire state is serialised on every `Persist()` call, which grows linearly
-  with the number of tracked models and entries.
-- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
-  preserved by Go's time marshaller but depends on the YAML library.
+- The entire state is serialised on every `Persist()` call.
+- Timestamps are serialised as RFC 3339 strings.
 
 ---
 
 ## SQLite Backend
 
-The SQLite backend was added in Phase 2 to support multi-process scenarios and
-provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
-Go port of SQLite that compiles without CGO.
+The SQLite backend supports multi-process scenarios. It uses `modernc.org/sqlite`,
+a pure Go port of SQLite that compiles without CGO.
 
 ### Connection Settings
 
 ```go
-db.SetMaxOpenConns(1)                      // single connection for PRAGMA consistency
-db.Exec("PRAGMA journal_mode=WAL")         // WAL mode for concurrent readers
-db.Exec("PRAGMA busy_timeout=5000")        // 5-second busy timeout
+db.SetMaxOpenConns(1)                  // single connection for PRAGMA consistency
+db.Exec("PRAGMA journal_mode=WAL")     // concurrent readers alongside a single writer
+db.Exec("PRAGMA busy_timeout=5000")    // 5-second wait on lock contention
 ```
 
 WAL mode allows one writer and multiple concurrent readers. The 5-second busy
-timeout prevents immediate failure when a second process is mid-commit. A single
-`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
-at the file level; multiple Go connections to the same file through a single
-process would not add throughput but would complicate locking.
+timeout prevents immediate failure when a second process is mid-commit.
 
 ### Schema
 
@@ -196,7 +304,7 @@ CREATE TABLE IF NOT EXISTS tokens (
 
 CREATE TABLE IF NOT EXISTS daily (
     model     TEXT PRIMARY KEY,
-    day_start INTEGER NOT NULL,   -- UnixNano
+    day_start INTEGER NOT NULL,    -- UnixNano
     day_count INTEGER NOT NULL DEFAULT 0
 );
 
@@ -205,51 +313,22 @@ CREATE INDEX IF NOT EXISTS idx_tokens_model_ts   ON tokens(model, ts);
 ```
 
 Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
-precision without relying on SQLite's text date format, and allows efficient
-range queries using the composite indices.
+precision and allows efficient range queries using the composite indices.
 
 ### Save Strategy
 
-`saveState()` uses a delete-then-insert pattern inside a single transaction.
-All three state tables are truncated and rewritten atomically:
-
-```go
-tx.Exec("DELETE FROM requests")
-tx.Exec("DELETE FROM tokens")
-tx.Exec("DELETE FROM daily")
-// then INSERT for every model in state
-tx.Commit()
-```
-
-`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
-existing quota rows are updated in place without deleting unrelated models.
-
-### Constructors
-
-```go
-// YAML backend (default)
-rl, err := ratelimit.New()
-rl, err := ratelimit.NewWithConfig(cfg)
-
-// SQLite backend
-rl, err := ratelimit.NewWithSQLite(dbPath)
-rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
-
-defer rl.Close()  // releases the database connection
-```
-
-`Close()` is a no-op on YAML-backed limiters.
+- **Quotas**: `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert). Existing quota
+  rows are updated in place without deleting unrelated models.
+- **State**: Delete-then-insert inside a single transaction. All three state
+  tables (`requests`, `tokens`, `daily`) are truncated and rewritten atomically.
 
 ---
 
 ## Migration Path
 
-`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
-state file and writes all quotas and usage state to a new SQLite database. The
-function is idempotent — running it again on the same YAML file overwrites the
-SQLite database state.
-
-Typical one-time migration:
+`MigrateYAMLToSQLite(yamlPath, sqlitePath)` reads an existing YAML state file
+and writes all quotas and usage state to a new SQLite database. The function is
+idempotent -- running it again overwrites the SQLite database state.
 
 ```go
 err := ratelimit.MigrateYAMLToSQLite(
@@ -258,29 +337,59 @@ err := ratelimit.MigrateYAMLToSQLite(
 )
 ```
 
-After migration, switch the constructor:
+After migration, switch the constructor from `New()` to `NewWithSQLite()`. The
+YAML file can be kept as a backup; the two backends do not share state.
+
+---
+
+## Iterators
+
+Two Go 1.26+ iterators are provided for inspecting the limiter state:
+
+- `Models() iter.Seq[string]` -- returns a sorted sequence of all model names
+  (from both `Quotas` and `State` maps, deduplicated).
+- `Iter() iter.Seq2[string, ModelStats]` -- returns sorted model names paired
+  with their current `ModelStats` snapshot.
 
 ```go
-// Before
-rl, _ := ratelimit.New()
-
-// After
-rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
-defer rl.Close()
+for model, stats := range rl.Iter() {
+    fmt.Printf("%s: %d/%d RPM, %d/%d TPM\n",
+        model, stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM)
+}
 ```
 
-The YAML file can be kept as a backup; the two backends do not share state.
-
 ---
 
 ## CountTokens
 
-`CountTokens(apiKey, model, text string) (int, error)` calls the Google
-Generative Language API to obtain an exact token count for a prompt string. It
-is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
-endpoint. The URL is not configurable, which prevents unit testing of the
-success path without network access.
+`CountTokens(ctx, apiKey, model, text)` calls the Google Generative Language API
+to obtain an exact token count for a prompt string. It is Gemini-specific and
+hardcodes the `generativelanguage.googleapis.com` endpoint.
 
 For other providers, callers must supply `estimatedTokens` directly to
-`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
-in API response metadata after a call completes.
+`CanSend()`. Accurate token counts are typically available in API response
+metadata after a call completes.
+
+---
+
+## Concurrency Model
+
+All reads and writes are protected by a single `sync.RWMutex` on the
+`RateLimiter` struct.
+
+| Method | Lock type | Reason |
+|--------|-----------|--------|
+| `CanSend()` | Write | Calls `prune()`, which mutates state slices |
+| `RecordUsage()` | Write | Appends to state slices |
+| `Reset()` | Write | Deletes state entries |
+| `Load()` | Write | Replaces in-memory state |
+| `SetQuota()` | Write | Modifies quota map |
+| `AddProvider()` | Write | Modifies quota map |
+| `Persist()` | Write (brief) | Clones state, then releases lock before I/O |
+| `Stats()` | Write | Calls `prune()` |
+| `AllStats()` | Write | Prunes inline |
+| `Models()` | Read | Reads keys only |
+
+`Persist()` minimises lock contention by cloning the state under a write lock,
+then performing I/O after releasing the lock. The test suite passes clean under
+`go test -race ./...` with 20 goroutines performing concurrent operations.
diff --git a/docs/development.md b/docs/development.md
index 8d656a2..1f5ceff 100644
--- a/docs/development.md
+++ b/docs/development.md
@@ -1,9 +1,14 @@
+---
+title: Development Guide
+description: How to build, test, and contribute to go-ratelimit -- prerequisites, test patterns, coding standards, and commit conventions.
+---
+
 # Development Guide
 
 ## Prerequisites
 
-- Go 1.25 or later (the module declares `go 1.25.5`)
-- No CGO required — `modernc.org/sqlite` is a pure Go port
+- **Go 1.26** or later (the module declares `go 1.26.0`)
+- No CGO required -- `modernc.org/sqlite` is a pure Go port
 
 No C toolchain, no system SQLite library, no external build tools. A plain
 `go build ./...` is sufficient.
@@ -34,6 +39,9 @@ go test -bench=BenchmarkCanSend -benchmem ./...
 # Check for vet issues
 go vet ./...
 
+# Lint (requires golangci-lint)
+golangci-lint run ./...
+
 # Tidy dependencies
 go mod tidy
 ```
@@ -43,31 +51,35 @@ must produce no errors or warnings before a commit is pushed.
 
 ---
 
-## Test Patterns
+## Test Organisation
 
-### File Organisation
+### File Layout
 
-- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
-- `sqlite_test.go` — Phase 2 (SQLite backend)
+| File | Scope |
+|------|-------|
+| `ratelimit_test.go` | Core sliding window logic, provider profiles, concurrency, benchmarks |
+| `sqlite_test.go` | SQLite backend, migration, concurrent persistence |
+| `error_test.go` | SQLite and YAML error paths |
+| `iter_test.go` | `Models()` and `Iter()` iterators, `CountTokens` edge cases |
 
-Both files are in `package ratelimit` (white-box tests) so they can access
+All test files are in `package ratelimit` (white-box tests), giving access to
 unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
 
 ### Naming Convention
 
 SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
 
-- `_Good` — happy path
-- `_Bad` — expected error conditions (invalid paths, corrupt input)
-- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
+- `_Good` -- happy path
+- `_Bad` -- expected error conditions (invalid paths, corrupt input)
+- `_Ugly` -- panic-adjacent edge cases (corrupt database files, truncated files)
 
-Core logic tests use plain descriptive names without suffixes, grouped by
-method with table-driven subtests.
+Core logic tests use plain descriptive names without suffixes, grouped by method
+with table-driven subtests.
 
-### Test Helpers
+### Test Helper
 
-`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
-redirects the YAML file path into `t.TempDir()`:
+`newTestLimiter(t)` creates a `RateLimiter` with Gemini defaults and redirects
+the YAML file path into `t.TempDir()`:
 
 ```go
 func newTestLimiter(t *testing.T) *RateLimiter {
@@ -86,43 +98,64 @@ after each test completes.
 
 Tests use `github.com/stretchr/testify` exclusively:
 
-- `require.NoError(t, err)` — fail immediately on setup errors
-- `assert.NoError(t, err)` — record failure but continue
-- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons
-- `assert.True / assert.False` — for boolean checks
-- `assert.Empty / assert.Len` — for slice length checks
-- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
+- `require.NoError(t, err)` -- fail immediately on setup errors
+- `assert.NoError(t, err)` -- record failure but continue
+- `assert.Equal(t, expected, actual, "message")` -- prefer over raw comparisons
+- `assert.True` / `assert.False` -- for boolean checks
+- `assert.Empty` / `assert.Len` -- for slice length checks
+- `assert.ErrorIs(t, err, target)` -- for sentinel errors
 
 Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
 
-### Race Tests
+### Concurrency Tests
 
-Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
-assert anything beyond absence of data races (the race detector does the work):
+Race tests spin up goroutines and use `sync.WaitGroup`. Some assert specific
+outcomes (e.g., correct RPD count after concurrent recordings), while others
+rely solely on the race detector to catch data races:
 
 ```go
 var wg sync.WaitGroup
-for i := range 20 {
-    wg.Add(1)
-    go func() {
-        defer wg.Done()
-        // concurrent operations
-    }()
+for range 20 {
+    wg.Go(func() {
+        for range 50 {
+            rl.CanSend(model, 10)
+            rl.RecordUsage(model, 5, 5)
+            rl.Stats(model)
+        }
+    })
 }
 wg.Wait()
 ```
 
-Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
-clean.
+Always run concurrency tests with `-race`.
+
+### Benchmarks
+
+The following benchmarks are included:
+
+| Benchmark | What it measures |
+|-----------|------------------|
+| `BenchmarkCanSend` | CanSend with a 1,000-entry sliding window |
+| `BenchmarkRecordUsage` | Recording usage on a single model |
+| `BenchmarkCanSendConcurrent` | Parallel CanSend across goroutines |
+| `BenchmarkCanSendWithPrune` | CanSend with 500 old + 500 new entries |
+| `BenchmarkStats` | Stats retrieval with a 1,000-entry window |
+| `BenchmarkAllStats` | AllStats across 5 models x 200 entries each |
+| `BenchmarkPersist` | YAML persistence I/O |
+| `BenchmarkSQLitePersist` | SQLite persistence I/O |
+| `BenchmarkSQLiteLoad` | SQLite state loading |
 
 ### Coverage
 
-Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
-be covered in unit tests without modifying the production code:
+Current coverage: 95.1%. The remaining paths cannot be covered in unit tests
+without modifying production code:
 
-1. `CountTokens` success path — hardcoded Google API URL requires network access
-2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
-3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
+1. `CountTokens` success path -- the Google API URL is hardcoded; unit tests
+   cannot intercept the HTTP call without URL injection support.
+2. `yaml.Marshal` error path in `Persist()` -- `yaml.Marshal` does not fail on
+   valid Go structs.
+3. `os.UserHomeDir()` error path in `NewWithConfig()` -- triggered only when
+   `$HOME` is unset, which test infrastructure prevents.
 
 Do not lower coverage below 95% without a documented reason.
 
@@ -137,26 +170,25 @@ Do not use American spellings in identifiers, comments, or documentation.
 
 ### Go Style
 
-- All exported types, functions, and fields must have doc comments
-- Error strings must be lowercase and not end with punctuation (Go convention)
-- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
-  prefix `ratelimit.` is included so errors identify their origin clearly
-- No `init()` functions
-- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
-  map on each call)
+- All exported types, functions, and fields must have doc comments.
+- Error strings must be lowercase and not end with punctuation (Go convention).
+- Contextual errors use `fmt.Errorf("ratelimit.Function: what: %w", err)` so
+  errors identify their origin clearly.
+- No `init()` functions.
+- No global mutable state. `DefaultProfiles()` returns a fresh map on each call.
 
 ### Mutex Discipline
 
 The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
 
-- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
-  even if they appear read-only, because `prune()` mutates slices
-- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
-  snapshot of state
+- Methods that call `prune()` always acquire the write lock (`mu.Lock()`), even
+  if they appear read-only, because `prune()` mutates state slices.
+- `Persist()` acquires the write lock briefly to clone state, then releases it
+  before performing I/O.
 - Lock acquisition always happens at the top of the public method, never inside
-  a helper — helpers document "Caller must hold the lock"
-- Never call a public method from inside another public method while holding
-  the lock (deadlock risk)
+  a helper. Helpers document "Caller must hold the lock".
+- Never call a public method from inside another public method while holding the
+  lock (deadlock risk).
 
 ### Dependencies
 
@@ -175,9 +207,8 @@ client libraries; the existing `CountTokens` function uses the standard library.
 
 ## Licence
 
-EUPL-1.2. Every new source file must carry the standard header if the project
-adopts per-file headers in future. Confirm with the project lead before adding
-files under a different licence.
+EUPL-1.2. Confirm with the project lead before adding files under a different
+licence.
 
 ---
 
@@ -205,3 +236,17 @@ Co-Authored-By: Virgil <virgil@lethean.io>
 
 Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
 pass. `go mod tidy` must produce no changes.
+
+---
+
+## Linting
+
+The project uses `golangci-lint` with the following enabled linters (see
+`.golangci.yml`):
+
+- `govet`, `errcheck`, `staticcheck`, `unused`, `gosimple`
+- `ineffassign`, `typecheck`, `gocritic`, `gofmt`
+
+Disabled linters: `exhaustive`, `wrapcheck`.
+
+Run `golangci-lint run ./...` to check before committing.
diff --git a/docs/index.md b/docs/index.md
new file mode 100644
index 0000000..c73cc13
--- /dev/null
+++ b/docs/index.md
@@ -0,0 +1,117 @@
+---
+title: go-ratelimit
+description: Provider-agnostic sliding window rate limiter for LLM API calls, with YAML and SQLite persistence backends.
+---
+
+# go-ratelimit
+
+**Module**: `forge.lthn.ai/core/go-ratelimit`
+**Licence**: EUPL-1.2
+**Go version**: 1.26+
+
+go-ratelimit enforces requests-per-minute (RPM), tokens-per-minute (TPM), and
+requests-per-day (RPD) quotas on a per-model basis using an in-memory sliding
+window. It ships with default quota profiles for Gemini, OpenAI, Anthropic, and
+a local inference provider. State persists across process restarts via YAML
+(single-process) or SQLite with WAL mode (multi-process). A YAML-to-SQLite
+migration helper is included.
+
+## Quick Start
+
+```go
+import "forge.lthn.ai/core/go-ratelimit"
+
+// Create a limiter with Gemini defaults (YAML backend).
+rl, err := ratelimit.New()
+if err != nil {
+    log.Fatal(err)
+}
+
+// Check capacity before sending.
+if rl.CanSend("gemini-2.0-flash", 1500) {
+    // Make the API call...
+    rl.RecordUsage("gemini-2.0-flash", 1000, 500) // promptTokens, outputTokens
+}
+
+// Persist state to disk for recovery across restarts.
+if err := rl.Persist(); err != nil {
+    log.Printf("persist failed: %v", err)
+}
+```
+
+### Multi-provider configuration
+
+```go
+rl, err := ratelimit.NewWithConfig(ratelimit.Config{
+    Providers: []ratelimit.Provider{
+        ratelimit.ProviderGemini,
+        ratelimit.ProviderAnthropic,
+    },
+    Quotas: map[string]ratelimit.ModelQuota{
+        // Override a specific model's limits.
+        "gemini-3-pro-preview": {MaxRPM: 50, MaxTPM: 500000, MaxRPD: 200},
+        // Add a custom model not in any profile.
+        "llama-3.3-70b": {MaxRPM: 5, MaxTPM: 50000, MaxRPD: 0},
+    },
+})
+```
+
+### SQLite backend (multi-process safe)
+
+```go
+rl, err := ratelimit.NewWithSQLite("~/.core/ratelimits.db")
+if err != nil {
+    log.Fatal(err)
+}
+defer rl.Close()
+
+// Load persisted state.
+if err := rl.Load(); err != nil {
+    log.Fatal(err)
+}
+
+// Use exactly as the YAML backend -- CanSend, RecordUsage, Persist, etc.
+```
+
+### Blocking until capacity is available
+
+```go
+ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+defer cancel()
+
+if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
+    log.Printf("timed out waiting for capacity: %v", err)
+    return
+}
+// Capacity is available; proceed with the API call.
+```
+
+## Package Layout
+
+The module is a single package with no sub-packages.
+
+| File | Purpose |
+|------|---------|
+| `ratelimit.go` | Core types (`RateLimiter`, `ModelQuota`, `Config`, `Provider`), sliding window logic, provider profiles, YAML persistence, `CountTokens` helper |
+| `sqlite.go` | SQLite persistence backend (`sqliteStore`), schema creation, load/save operations |
+| `ratelimit_test.go` | Tests for core logic, provider profiles, concurrency, and benchmarks |
+| `sqlite_test.go` | Tests for SQLite backend, migration, and error recovery |
+| `error_test.go` | Tests for SQLite and YAML error paths |
+| `iter_test.go` | Tests for `Models()` and `Iter()` iterators, plus `CountTokens` edge cases |
+
+## Dependencies
+
+| Dependency | Purpose | Category |
+|------------|---------|----------|
+| `gopkg.in/yaml.v3` | YAML serialisation for the legacy persistence backend | Direct |
+| `modernc.org/sqlite` | Pure Go SQLite driver (no CGO required) | Direct |
+| `github.com/stretchr/testify` | Test assertions (`assert`, `require`) | Test only |
+
+All indirect dependencies are pulled in by `modernc.org/sqlite`. No C toolchain
+or system SQLite library is needed.
+
+## Further Reading
+
+- [Architecture](architecture.md) -- sliding window algorithm, provider quotas, YAML and SQLite backends, concurrency model
+- [Development](development.md) -- build commands, test patterns, coding standards, commit conventions
+- [History](history.md) -- completed phases with commit hashes, known limitations