diff --git a/CLAUDE.md b/CLAUDE.md
index a3bff6a..4c5a6dd 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,19 +1,28 @@
 # CLAUDE.md
 
-## What This Is
+Token counting, model quotas, and sliding window rate limiter.
 
-Token counting, model quotas, and sliding window rate limiter. Module: `forge.lthn.ai/core/go-ratelimit`
+Module: `forge.lthn.ai/core/go-ratelimit`
 
 ## Commands
 
 ```bash
-go test ./...          # Run all tests
-go test -v -run Name   # Run single test
+go test ./...               # run all tests
+go test -race ./...         # race detector (required before commit)
+go test -v -run Name ./...  # single test
+go vet ./...                # vet check
 ```
 
-## Coding Standards
+## Standards
 
 - UK English
-- `go test ./...` must pass before commit
+- `go test -race ./...` and `go vet ./...` must pass before commit
 - Conventional commits: `type(scope): description`
 - Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
+- Coverage must not drop below 95%
+
+## Docs
+
+- `docs/architecture.md` — sliding window algorithm, provider quotas, YAML/SQLite backends
+- `docs/development.md` — prerequisites, test patterns, coding standards
+- `docs/history.md` — completed phases with commit hashes, known limitations
diff --git a/FINDINGS.md b/FINDINGS.md
deleted file mode 100644
index 0c5d194..0000000
--- a/FINDINGS.md
+++ /dev/null
@@ -1,106 +0,0 @@
-# FINDINGS.md -- go-ratelimit
-
-## 2026-02-19: Split from core/go (Virgil)
-
-### Origin
-
-Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026.
-
-### Architecture
-
-- Sliding window rate limiter (1-minute window)
-- Daily request caps per model
-- Token counting via Google `CountTokens` API
-- Model-specific quota configuration
-
-### Gemini-Specific Defaults
-
-- `gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD
-- Quotas are currently hardcoded -- needs generalisation (see TODO Phase 1)
-
-### Tests
-
-- 1 test file covering sliding window and quota enforcement
-
----
-
-## 2026-02-20: Phase 0 -- Hardening (Charon)
-
-### Coverage: 77.1% -> 95.1%
-
-Rewrote test suite with testify assert/require. Table-driven subtests throughout.
-
-#### Tests added
-
-- **CanSend boundaries**: exact RPM/TPM/RPD limits, RPM-only, TPM-only, zero-token estimates, unknown models, unlimited models
-- **Prune**: keeps recent entries, prunes old ones, daily reset at 24h, boundary-exact timestamps, noop on non-existent model
-- **RecordUsage**: fresh state, accumulation, existing state
-- **Reset**: single model, all models (empty string), non-existent model
-- **WaitForCapacity**: immediate capacity, context cancellation, pre-cancelled context, unknown model
-- **Stats/AllStats**: known/unknown/quota-only models, pruning in AllStats, daily reset in AllStats
-- **Persist/Load**: round-trip, non-existent file, corrupt YAML, unreadable file, nested directory creation, unwritable directory
-- **Concurrency**: 20 goroutines x 50 ops (CanSend + RecordUsage + Stats), concurrent Reset + RecordUsage + AllStats
-- **Benchmarks**: BenchmarkCanSend (1000-entry window), BenchmarkRecordUsage, BenchmarkCanSendConcurrent
-
-#### Remaining uncovered (5%)
-
-- `CountTokens` success path: hardcoded Google URL prevents unit testing without URL injection. Only the connection-error path is covered.
-- `yaml.Marshal` error in `Persist()`: virtually impossible to trigger with valid structs.
-- `os.UserHomeDir` error in `NewWithConfig()`: only fails when `$HOME` is unset.
-
-### Race detector
-
-`go test -race ./...` passes clean. The `sync.RWMutex` correctly guards all shared state.
-
-### go vet
-
-No warnings.
-
----
-
-## 2026-02-20: Phase 1 -- Generalisation (Charon)
-
-### Problem
-
-Hardcoded Gemini-specific quotas in `New()`. No way to configure for other providers.
-
-### Solution
-
-Introduced provider-agnostic configuration without breaking existing API.
-
-#### New types
-
-- `Provider` -- string type with constants: `ProviderGemini`, `ProviderOpenAI`, `ProviderAnthropic`, `ProviderLocal`
-- `ProviderProfile` -- bundles provider identity with model quotas map
-- `Config` -- construction config with `FilePath`, `Providers` list, `Quotas` map
-
-#### New functions
-
-- `DefaultProfiles()` -- returns pre-configured profiles for all four providers
-- `NewWithConfig(Config)` -- creates limiter from explicit configuration
-- `SetQuota(model, quota)` -- runtime quota modification
-- `AddProvider(provider)` -- loads all default quotas for a provider at runtime
-
-#### Provider defaults (Feb 2026)
-
-| Provider | Models | RPM | TPM | RPD |
-|----------|--------|-----|-----|-----|
-| Gemini | gemini-3-pro-preview, gemini-3-flash-preview, gemini-2.5-pro | 150 | 1M | 1000 |
-| Gemini | gemini-2.0-flash | 150 | 1M | unlimited |
-| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
-| OpenAI | gpt-4o, gpt-4-turbo, o1 | 500 | 30K | unlimited |
-| OpenAI | gpt-4o-mini, o1-mini, o3-mini | 500 | 200K | unlimited |
-| Anthropic | claude-opus-4, claude-sonnet-4 | 50 | 40K | unlimited |
-| Anthropic | claude-haiku-3.5 | 50 | 50K | unlimited |
-| Local | (none by default) | -- | -- | -- |
-
-#### Backward compatibility
-
-`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Verified by `TestNewBackwardCompatibility` which asserts exact parity with the original hardcoded values.
-
-#### Design notes
-
-- Explicit quotas in `Config.Quotas` override provider defaults (merge-on-top pattern)
-- Local provider has no default quotas -- users add per-model limits for hardware throttling
-- `AddProvider()` is additive -- calling it does not remove existing quotas
-- All new methods are mutex-protected and safe for concurrent use
diff --git a/TODO.md b/TODO.md
deleted file mode 100644
index bb8b103..0000000
--- a/TODO.md
+++ /dev/null
@@ -1,91 +0,0 @@
-# TODO.md -- go-ratelimit
-
-Dispatched from core/go orchestration. Pick up tasks in order.
-
----
-
-## Phase 0: Hardening & Test Coverage
-
-- [x] **Expand test coverage** -- `ratelimit_test.go` rewritten with testify. Tests for: `CanSend()` at exact limits (RPM, TPM, RPD boundaries), `RecordUsage()` with concurrent goroutines, `WaitForCapacity()` timeout and immediate-capacity paths, `prune()` sliding window edge cases, daily reset logic (24h boundary), YAML persistence (save + reload), corrupt/unreadable state file recovery, `Reset()` single/all/nonexistent, `Stats()` known/unknown/quota-only models, `AllStats()` with pruning and daily reset.
-- [x] **Race condition test** -- `go test -race ./...` with 20 goroutines calling `CanSend()` + `RecordUsage()` + `Stats()` concurrently. Additional tests: concurrent `Reset()` + `RecordUsage()` + `AllStats()`, concurrent multi-model access (5 models), concurrent `Persist()` + `Load()` filesystem race, concurrent `AllStats()` + `RecordUsage()`, concurrent `WaitForCapacity()` + `RecordUsage()`. All pass clean.
-- [x] **Benchmark** -- 7 benchmarks: `BenchmarkCanSend` (1000-entry window), `BenchmarkRecordUsage`, `BenchmarkCanSendConcurrent` (parallel), `BenchmarkCanSendWithPrune` (500 old + 500 new), `BenchmarkStats` (1000 entries), `BenchmarkAllStats` (5 models x 200 entries), `BenchmarkPersist` (YAML I/O). Zero allocs on hot paths.
-- [x] **`go vet ./...` clean** -- No warnings.
-- **Coverage: 95.1%** (up from 77.1%). Remaining uncovered: `CountTokens` success path (hardcoded Google URL), `yaml.Marshal` error path in `Persist()`, `os.UserHomeDir` error path in `NewWithConfig`.
-
-## Phase 1: Generalise Beyond Gemini
-
-- [x] **Provider-agnostic config** -- Added `Provider` type, `ProviderProfile`, `Config` struct, `NewWithConfig()` constructor. Quotas are no longer hardcoded in `New()`.
-- [x] **Quota profiles** -- `DefaultProfiles()` returns pre-configured profiles for Gemini, OpenAI (gpt-4o, o1, o3-mini), Anthropic (claude-opus-4, claude-sonnet-4, claude-haiku-3.5), and Local (empty, user-configurable).
-- [x] **Configurable defaults** -- `Config` struct accepts `FilePath`, `Providers` list, and explicit `Quotas` map. Explicit quotas override provider defaults. YAML-serialisable.
-- [x] **Backward compatibility** -- `New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Existing API unchanged. Test `TestNewBackwardCompatibility` verifies exact parity.
-- [x] **Runtime configuration** -- `SetQuota()` and `AddProvider()` allow modifying quotas after construction. Both are mutex-protected.
-
-## Phase 2: SQLite Persistent State
-
-Current YAML persistence is single-process only. Phase 2 adds multi-process safe SQLite storage following the go-store pattern (`modernc.org/sqlite`, pure Go, no CGO).
-
-### 2.1 SQLite Backend
-
-- [x] **Add `modernc.org/sqlite` dependency** — `go get modernc.org/sqlite`. Pure Go, compiles everywhere.
-- [x] **Create `sqlite.go`** — Internal SQLite persistence layer:
-  - `type sqliteStore struct { db *sql.DB }` — wraps database/sql connection
-  - `func newSQLiteStore(dbPath string) (*sqliteStore, error)` — Open DB, set `PRAGMA journal_mode=WAL`, `PRAGMA busy_timeout=5000`, `db.SetMaxOpenConns(1)`. Create schema:
-    ```sql
-    CREATE TABLE IF NOT EXISTS quotas (
-        model TEXT PRIMARY KEY,
-        max_rpm INTEGER NOT NULL DEFAULT 0,
-        max_tpm INTEGER NOT NULL DEFAULT 0,
-        max_rpd INTEGER NOT NULL DEFAULT 0
-    );
-    CREATE TABLE IF NOT EXISTS requests (
-        model TEXT NOT NULL,
-        ts INTEGER NOT NULL  -- UnixNano
-    );
-    CREATE TABLE IF NOT EXISTS tokens (
-        model TEXT NOT NULL,
-        ts INTEGER NOT NULL,  -- UnixNano
-        count INTEGER NOT NULL
-    );
-    CREATE TABLE IF NOT EXISTS daily (
-        model TEXT PRIMARY KEY,
-        day_start INTEGER NOT NULL,
-        day_count INTEGER NOT NULL DEFAULT 0
-    );
-    CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts);
-    CREATE INDEX IF NOT EXISTS idx_tokens_model_ts ON tokens(model, ts);
-    ```
-  - `func (s *sqliteStore) saveQuotas(quotas map[string]ModelQuota) error` — UPSERT all quotas
-  - `func (s *sqliteStore) loadQuotas() (map[string]ModelQuota, error)` — SELECT all quotas
-  - `func (s *sqliteStore) saveState(state map[string]*UsageStats) error` — Transaction: DELETE old + INSERT requests/tokens/daily for each model
-  - `func (s *sqliteStore) loadState() (map[string]*UsageStats, error)` — SELECT and reconstruct UsageStats map
-  - `func (s *sqliteStore) close() error` — Close DB connection
-
-### 2.2 Wire Into RateLimiter
-
-- [x] **Add `Backend` field to Config** — `Backend string` with values `"yaml"` (default), `"sqlite"`. Default `""` maps to `"yaml"` for backward compat.
-- [x] **Update `Persist()` and `Load()`** — Check internal backend type. If SQLite, use `sqliteStore`; otherwise use existing YAML. Keep both paths working.
-- [x] **Add `NewWithSQLite(dbPath string) (*RateLimiter, error)`** — Convenience constructor that creates a SQLite-backed limiter. Sets backend type, initialises DB.
-- [x] **Graceful close** — Add `Close() error` method that closes SQLite DB if open. No-op for YAML backend.
-
-### 2.3 Tests
-
-- [x] **SQLite basic tests** — newSQLiteStore, saveQuotas/loadQuotas round-trip, saveState/loadState round-trip, close.
-- [x] **SQLite integration** — NewWithSQLite, RecordUsage → Persist → Load → verify state preserved. Same test matrix as existing YAML tests but with SQLite backend.
-- [x] **Concurrent SQLite** — 10 goroutines x 20 ops (RecordUsage + CanSend + Persist). Race-clean.
-- [x] **YAML backward compat** — Existing tests pass unchanged (still default to YAML).
-- [x] **Migration helper** — `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — reads YAML state, writes to SQLite. Test with sample YAML.
-- [x] **Corrupt DB recovery** — Truncated DB file → graceful error, fresh start.
-
-## Phase 3: Integration
-
-- [ ] Wire into go-ml backends for automatic rate limiting on inference calls
-- [ ] Wire into go-ai facade so all providers share a unified rate limit layer
-- [ ] Add metrics export (requests/minute, tokens/minute, rejections) for monitoring
-
----
-
-## Workflow
-
-1. Virgil in core/go writes tasks here after research
-2. This repo's dedicated session picks up tasks in phase order
-3. Mark `[x]` when done, note commit hash
diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..4cefb7a
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,286 @@
+# Architecture
+
+go-ratelimit is a provider-agnostic rate limiter for LLM API calls. It enforces
+three independent quota dimensions per model — requests per minute (RPM), tokens
+per minute (TPM), and requests per day (RPD) — using an in-memory sliding window
+that can be persisted across process restarts via YAML or SQLite.
+
+Module path: `forge.lthn.ai/core/go-ratelimit`
+
+---
+
+## Sliding Window Algorithm
+
+The limiter maintains per-model `UsageStats` structs in memory:
+
+```go
+type UsageStats struct {
+    Requests []time.Time  // timestamps of recent requests (1-minute window)
+    Tokens   []TokenEntry // token counts with timestamps (1-minute window)
+    DayStart time.Time    // when the current daily window started
+    DayCount int          // total requests recorded since DayStart
+}
+```
+
+Every call to `CanSend()` or `Stats()` first calls `prune()`, which scans both
+slices and discards entries older than `now - 1 minute`. Pruning is done
+in-place to avoid allocation on the hot path:
+
+```go
+validReqs := 0
+for _, t := range stats.Requests {
+    if t.After(window) {
+        stats.Requests[validReqs] = t
+        validReqs++
+    }
+}
+stats.Requests = stats.Requests[:validReqs]
+```
+
+The same loop runs for token entries. After pruning, `CanSend()` checks each
+quota dimension in priority order: RPD first (cheapest check), then RPM, then
+TPM. A zero value for any dimension means that dimension is unlimited. If all
+three are zero the model is treated as fully unlimited and the check short-circuits
+before touching any state.
+
+### Daily Reset
+
+The daily counter resets automatically inside `prune()`. When
+`now - stats.DayStart >= 24h`, `DayCount` is set to zero and `DayStart` is set
+to the current time. This means the daily window is a rolling 24-hour period
+anchored to the first request of the day, not a calendar boundary.
+
+### Concurrency
+
+All reads and writes are protected by a single `sync.RWMutex`. Methods that
+write state — `CanSend()`, `RecordUsage()`, `Reset()`, `Load()` — acquire a
+full write lock. `Persist()`, `Stats()`, and `AllStats()` acquire a read lock
+where possible. The `CanSend()` method acquires a write lock because it calls
+`prune()`, which mutates the state slices.
+
+`go test -race ./...` passes clean with 20 goroutines performing concurrent
+`CanSend()`, `RecordUsage()`, and `Stats()` calls.
+
+---
+
+## Provider and Quota Configuration
+
+### Types
+
+```go
+type Provider string          // "gemini", "openai", "anthropic", "local"
+
+type ModelQuota struct {
+    MaxRPM int `yaml:"max_rpm"` // 0 = unlimited
+    MaxTPM int `yaml:"max_tpm"`
+    MaxRPD int `yaml:"max_rpd"`
+}
+
+type Config struct {
+    FilePath  string                 // default: ~/.core/ratelimits.yaml
+    Backend   string                 // "yaml" (default) or "sqlite"
+    Quotas    map[string]ModelQuota  // explicit per-model overrides
+    Providers []Provider             // provider profiles to load
+}
+```
+
+### Quota Resolution
+
+1. Provider profiles are loaded first (from `DefaultProfiles()`).
+2. Explicit `Config.Quotas` are merged on top, overriding any matching model.
+3. If neither `Providers` nor `Quotas` are specified, Gemini defaults are used.
+
+`SetQuota()` and `AddProvider()` allow runtime modification; both are
+mutex-protected. `AddProvider()` is additive — it does not remove existing
+quotas for models outside the new provider's profile.
+
+### Default Quotas (as of February 2026)
+
+| Provider  | Model                  | MaxRPM    | MaxTPM    | MaxRPD    |
+|-----------|------------------------|-----------|-----------|-----------|
+| Gemini    | gemini-3-pro-preview   | 150       | 1,000,000 | 1,000     |
+| Gemini    | gemini-3-flash-preview | 150       | 1,000,000 | 1,000     |
+| Gemini    | gemini-2.5-pro         | 150       | 1,000,000 | 1,000     |
+| Gemini    | gemini-2.0-flash       | 150       | 1,000,000 | unlimited |
+| Gemini    | gemini-2.0-flash-lite  | unlimited | unlimited | unlimited |
+| OpenAI    | gpt-4o, gpt-4-turbo    | 500       | 30,000    | unlimited |
+| OpenAI    | gpt-4o-mini, o1-mini   | 500       | 200,000   | unlimited |
+| OpenAI    | o1, o3-mini            | 500       | varies    | unlimited |
+| Anthropic | claude-opus-4          | 50        | 40,000    | unlimited |
+| Anthropic | claude-sonnet-4        | 50        | 40,000    | unlimited |
+| Anthropic | claude-haiku-3.5       | 50        | 50,000    | unlimited |
+| Local     | (none by default)      | user-defined                          |
+
+The Local provider exists for local inference backends (Ollama, MLX, llama.cpp)
+where the throttle limit is hardware rather than an API quota. No defaults are
+provided; callers add per-model limits via `Config.Quotas` or `SetQuota()`.
+
+---
+
+## YAML Persistence (Legacy)
+
+The default backend serialises the entire `RateLimiter` struct — both the
+`Quotas` map and the `State` map — to a YAML file at `~/.core/ratelimits.yaml`.
+
+```yaml
+quotas:
+  gemini-3-pro-preview:
+    max_rpm: 150
+    max_tpm: 1000000
+    max_rpd: 1000
+state:
+  gemini-3-pro-preview:
+    requests:
+      - 2026-02-20T14:32:01.123456789Z
+    tokens:
+      - time: 2026-02-20T14:32:01.123456789Z
+        count: 1500
+    day_start: 2026-02-20T00:00:00Z
+    day_count: 42
+```
+
+`Persist()` creates parent directories with `os.MkdirAll` before writing.
+`Load()` treats a missing file as an empty state (no error). Corrupt or
+unreadable files return an error.
+
+**Limitations of YAML backend:**
+- Single-process only. Concurrent writes from multiple processes corrupt the
+  file because the write is not atomic at the OS level.
+- The entire state is serialised on every `Persist()` call, which grows linearly
+  with the number of tracked models and entries.
+- Timestamps are serialised as RFC3339 strings; sub-nanosecond precision is
+  preserved by Go's time marshaller but depends on the YAML library.
+
+---
+
+## SQLite Backend
+
+The SQLite backend was added in Phase 2 to support multi-process scenarios and
+provide a more robust persistence layer. It uses `modernc.org/sqlite` — a pure
+Go port of SQLite that compiles without CGO.
+
+### Connection Settings
+
+```go
+db.SetMaxOpenConns(1)                      // single connection for PRAGMA consistency
+db.Exec("PRAGMA journal_mode=WAL")         // WAL mode for concurrent readers
+db.Exec("PRAGMA busy_timeout=5000")        // 5-second busy timeout
+```
+
+WAL mode allows one writer and multiple concurrent readers. The 5-second busy
+timeout prevents immediate failure when a second process is mid-commit. A single
+`sql.DB` connection is used because SQLite's WAL mode handles reader concurrency
+at the file level; multiple Go connections to the same file through a single
+process would not add throughput but would complicate locking.
+
+### Schema
+
+```sql
+CREATE TABLE IF NOT EXISTS quotas (
+    model   TEXT PRIMARY KEY,
+    max_rpm INTEGER NOT NULL DEFAULT 0,
+    max_tpm INTEGER NOT NULL DEFAULT 0,
+    max_rpd INTEGER NOT NULL DEFAULT 0
+);
+
+CREATE TABLE IF NOT EXISTS requests (
+    model TEXT NOT NULL,
+    ts    INTEGER NOT NULL         -- UnixNano
+);
+
+CREATE TABLE IF NOT EXISTS tokens (
+    model TEXT NOT NULL,
+    ts    INTEGER NOT NULL,        -- UnixNano
+    count INTEGER NOT NULL
+);
+
+CREATE TABLE IF NOT EXISTS daily (
+    model     TEXT PRIMARY KEY,
+    day_start INTEGER NOT NULL,   -- UnixNano
+    day_count INTEGER NOT NULL DEFAULT 0
+);
+
+CREATE INDEX IF NOT EXISTS idx_requests_model_ts ON requests(model, ts);
+CREATE INDEX IF NOT EXISTS idx_tokens_model_ts   ON tokens(model, ts);
+```
+
+Timestamps are stored as `INTEGER` UnixNano values. This preserves nanosecond
+precision without relying on SQLite's text date format, and allows efficient
+range queries using the composite indices.
+
+### Save Strategy
+
+`saveState()` uses a delete-then-insert pattern inside a single transaction.
+All three state tables are truncated and rewritten atomically:
+
+```go
+tx.Exec("DELETE FROM requests")
+tx.Exec("DELETE FROM tokens")
+tx.Exec("DELETE FROM daily")
+// then INSERT for every model in state
+tx.Commit()
+```
+
+`saveQuotas()` uses `INSERT ... ON CONFLICT(model) DO UPDATE` (upsert) so
+existing quota rows are updated in place without deleting unrelated models.
+
+### Constructors
+
+```go
+// YAML backend (default)
+rl, err := ratelimit.New()
+rl, err := ratelimit.NewWithConfig(cfg)
+
+// SQLite backend
+rl, err := ratelimit.NewWithSQLite(dbPath)
+rl, err := ratelimit.NewWithSQLiteConfig(dbPath, cfg)
+
+defer rl.Close()  // releases the database connection
+```
+
+`Close()` is a no-op on YAML-backed limiters.
+
+---
+
+## Migration Path
+
+`MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` reads an existing YAML
+state file and writes all quotas and usage state to a new SQLite database. The
+function is idempotent — running it again on the same YAML file overwrites the
+SQLite database state.
+
+Typical one-time migration:
+
+```go
+err := ratelimit.MigrateYAMLToSQLite(
+    filepath.Join(home, ".core", "ratelimits.yaml"),
+    filepath.Join(home, ".core", "ratelimits.db"),
+)
+```
+
+After migration, switch the constructor:
+
+```go
+// Before
+rl, _ := ratelimit.New()
+
+// After
+rl, _ := ratelimit.NewWithSQLite(filepath.Join(home, ".core", "ratelimits.db"))
+defer rl.Close()
+```
+
+The YAML file can be kept as a backup; the two backends do not share state.
+
+---
+
+## CountTokens
+
+`CountTokens(apiKey, model, text string) (int, error)` calls the Google
+Generative Language API to obtain an exact token count for a prompt string. It
+is Gemini-specific and hardcodes the `generativelanguage.googleapis.com`
+endpoint. The URL is not configurable, which prevents unit testing of the
+success path without network access.
+
+For other providers, callers must supply `estimatedTokens` directly to
+`CanSend()` and `RecordUsage()`. Accurate token counts are typically available
+in API response metadata after a call completes.
diff --git a/docs/development.md b/docs/development.md
new file mode 100644
index 0000000..471d1dc
--- /dev/null
+++ b/docs/development.md
@@ -0,0 +1,207 @@
+# Development Guide
+
+## Prerequisites
+
+- Go 1.25 or later (the module declares `go 1.25.5`)
+- No CGO required — `modernc.org/sqlite` is a pure Go port
+
+No C toolchain, no system SQLite library, no external build tools. A plain
+`go build ./...` is sufficient.
+
+---
+
+## Build and Test
+
+```bash
+# Run all tests
+go test ./...
+
+# Run all tests with the race detector (required before every commit)
+go test -race ./...
+
+# Run a single test by name
+go test -v -run TestCanSend ./...
+
+# Run a single subtest
+go test -v -run "TestCanSend/RPM_at_exact_limit_is_rejected" ./...
+
+# Run benchmarks
+go test -bench=. -benchmem ./...
+
+# Run a specific benchmark
+go test -bench=BenchmarkCanSend -benchmem ./...
+
+# Check for vet issues
+go vet ./...
+
+# Tidy dependencies
+go mod tidy
+```
+
+All three commands (`go test -race ./...`, `go vet ./...`, and `go mod tidy`)
+must produce no errors or warnings before a commit is pushed.
+
+---
+
+## Test Patterns
+
+### File Organisation
+
+- `ratelimit_test.go` — Phase 0 (core logic) and Phase 1 (provider profiles)
+- `sqlite_test.go` — Phase 2 (SQLite backend)
+
+Both files are in `package ratelimit` (white-box tests) so they can access
+unexported fields and methods such as `prune()`, `filePath`, and `sqlite`.
+
+### Naming Convention
+
+SQLite tests follow the `_Good`, `_Bad`, `_Ugly` suffix pattern:
+
+- `_Good` — happy path
+- `_Bad` — expected error conditions (invalid paths, corrupt input)
+- `_Ugly` — panic-adjacent edge cases (corrupt DB files, truncated files)
+
+Core logic tests use plain descriptive names without suffixes, grouped by
+method with table-driven subtests.
+
+### Test Helpers
+
+`newTestLimiter(t *testing.T)` creates a `RateLimiter` with Gemini defaults and
+redirects the YAML file path into `t.TempDir()`:
+
+```go
+func newTestLimiter(t *testing.T) *RateLimiter {
+    t.Helper()
+    rl, err := New()
+    require.NoError(t, err)
+    rl.filePath = filepath.Join(t.TempDir(), "ratelimits.yaml")
+    return rl
+}
+```
+
+Use `t.TempDir()` for all file paths in tests. Go cleans these up automatically
+after each test completes.
+
+### Testify Usage
+
+Tests use `github.com/stretchr/testify` exclusively:
+
+- `require.NoError(t, err)` — fail immediately on setup errors
+- `assert.NoError(t, err)` — record failure but continue
+- `assert.Equal(t, expected, actual, "message")` — prefer over raw comparisons
+- `assert.True / assert.False` — for boolean checks
+- `assert.Empty / assert.Len` — for slice length checks
+- `assert.ErrorIs(t, err, context.DeadlineExceeded)` — for sentinel errors
+
+Do not use `t.Error`, `t.Fatal`, or `t.Log` directly.
+
+### Race Tests
+
+Concurrency tests spin up goroutines and use `sync.WaitGroup`. They do not
+assert anything beyond absence of data races (the race detector does the work):
+
+```go
+var wg sync.WaitGroup
+for i := 0; i < 20; i++ {
+    wg.Add(1)
+    go func() {
+        defer wg.Done()
+        // concurrent operations
+    }()
+}
+wg.Wait()
+```
+
+Run every concurrency test with `-race`. The CI baseline is `go test -race ./...`
+clean.
+
+### Coverage
+
+Current coverage: 95.1%. The remaining 5% consists of three paths that cannot
+be covered in unit tests without modifying the production code:
+
+1. `CountTokens` success path — hardcoded Google API URL requires network access
+2. `yaml.Marshal` error path in `Persist()` — cannot be triggered with valid Go structs
+3. `os.UserHomeDir()` error path in `NewWithConfig()` — requires unsetting `$HOME`
+
+Do not lower coverage below 95% without a documented reason.
+
+---
+
+## Coding Standards
+
+### Language
+
+UK English throughout: colour, organisation, serialise, initialise, behaviour.
+Do not use American spellings in identifiers, comments, or documentation.
+
+### Go Style
+
+- All exported types, functions, and fields must have doc comments
+- Error strings must be lowercase and not end with punctuation (Go convention)
+- Contextual errors use `fmt.Errorf("package.Function: what: %w", err)` — the
+  prefix `ratelimit.` is included so errors identify their origin clearly
+- No `init()` functions
+- No global mutable state outside of `DefaultProfiles()` (which returns a fresh
+  map on each call)
+
+### Mutex Discipline
+
+The `RateLimiter.mu` mutex is the only synchronisation primitive. Rules:
+
+- Methods that call `prune()` always acquire the write lock (`mu.Lock()`),
+  even if they appear read-only, because `prune()` mutates slices
+- `Persist()` acquires only the read lock (`mu.RLock()`) because it reads a
+  snapshot of state
+- Lock acquisition always happens at the top of the public method, never inside
+  a helper — helpers document "Caller must hold the lock"
+- Never call a public method from inside another public method while holding
+  the lock (deadlock risk)
+
+### Dependencies
+
+Direct dependencies are intentionally minimal:
+
+| Dependency | Purpose |
+|------------|---------|
+| `gopkg.in/yaml.v3` | YAML serialisation for legacy backend |
+| `modernc.org/sqlite` | Pure Go SQLite for persistent backend |
+| `github.com/stretchr/testify` | Test assertions (test-only) |
+
+Do not add `database/sql` drivers beyond `modernc.org/sqlite`. Do not add HTTP
+client libraries; the existing `CountTokens` function uses the standard library.
+
+---
+
+## Licence
+
+EUPL-1.2. Every new source file must carry the standard header if the project
+adopts per-file headers in future. Confirm with the project lead before adding
+files under a different licence.
+
+---
+
+## Commit Convention
+
+Format: `type(scope): description`
+
+Common types: `feat`, `fix`, `test`, `refactor`, `docs`, `perf`, `chore`
+
+Common scopes: `ratelimit`, `sqlite`, `persist`, `config`
+
+Every commit must include:
+
+```
+Co-Authored-By: Virgil <virgil@lethean.io>
+```
+
+Example:
+
+```
+feat(sqlite): add WAL-mode SQLite backend with migration helper
+
+Co-Authored-By: Virgil <virgil@lethean.io>
+```
+
+Commits must not be pushed unless `go test -race ./...` and `go vet ./...` both
+pass. `go mod tidy` must produce no changes.
diff --git a/docs/history.md b/docs/history.md
new file mode 100644
index 0000000..78de23e
--- /dev/null
+++ b/docs/history.md
@@ -0,0 +1,197 @@
+# Project History
+
+## Origin
+
+go-ratelimit was extracted from the `pkg/ratelimit` package inside
+`forge.lthn.ai/core/go` on 19 February 2026. The extraction gave the package
+its own module path, repository, and independent development cadence.
+
+Initial commit: `fa1a6fc` — `feat: extract go-ratelimit from core/go pkg/ratelimit`
+
+At extraction the package implemented:
+
+- Sliding window rate limiter with 1-minute window
+- Daily request caps per model
+- Token counting via Google `CountTokens` API
+- Hardcoded Gemini quota defaults (`gemini-3-pro-preview`: 150 RPM / 1M TPM / 1000 RPD)
+- YAML persistence to `~/.core/ratelimits.yaml`
+- Single test file with basic sliding window and quota enforcement tests
+
+---
+
+## Phase 0 — Hardening and Test Coverage
+
+Commit: `3c63b10` — `feat(ratelimit): generalise beyond Gemini with provider profiles and push coverage to 95%`
+
+Supplementary commit: `db958f2` — `test: expand race coverage and benchmarks`
+
+Coverage increased from 77.1% to 95.1%. The test suite was rewritten using
+testify with table-driven subtests throughout.
+
+### Tests added
+
+- `TestCanSend` — boundary conditions at exact RPM, TPM, and RPD limits;
+  RPM-only and TPM-only quotas; zero-token estimates; unknown and unlimited models
+- `TestPrune` — pruning of old entries, retention of recent entries, daily reset
+  at 24-hour boundary, no-op on non-existent model, boundary-exact timestamps
+- `TestRecordUsage` — fresh state, accumulation, insertion into existing state
+- `TestReset` — single model, all models (empty string argument), non-existent model
+- `TestWaitForCapacity` — context cancellation, pre-cancelled context,
+  immediate capacity, unknown model
+- `TestStats` / `TestAllStats` — known, unknown, and quota-only models; pruning
+  and daily reset inside `AllStats()`
+- `TestPersistAndLoad` — round-trip, missing file, corrupt YAML, unreadable file,
+  nested directory creation, unwritable directory
+- `TestConcurrentAccess` — 20 goroutines x 50 ops each (CanSend + RecordUsage + Stats)
+- `TestConcurrentResetAndRecord` — concurrent Reset + RecordUsage + AllStats
+- `TestConcurrentMultipleModels` — 5 models, concurrent access
+- `TestConcurrentPersistAndLoad` — filesystem race between Persist and Load
+- `TestConcurrentWaitForCapacityAndRecordUsage` — WaitForCapacity racing RecordUsage
+
+### Benchmarks added
+
+- `BenchmarkCanSend` — 1,000-entry sliding window
+- `BenchmarkRecordUsage`
+- `BenchmarkCanSendConcurrent` — parallel goroutines
+- `BenchmarkCanSendWithPrune` — 500 old + 500 new entries
+- `BenchmarkStats` — 1,000-entry window
+- `BenchmarkAllStats` — 5 models x 200 entries
+- `BenchmarkPersist` — YAML I/O
+
+### Remaining uncovered paths (5%)
+
+These three paths are structurally impossible to cover in unit tests without
+modifying production code:
+
+1. `CountTokens` success path — the Google API URL is hardcoded; unit tests
+   cannot intercept the HTTP call without URL injection support
+2. `yaml.Marshal` error path in `Persist()` — `yaml.Marshal` does not fail on
+   valid Go structs; the error branch exists for correctness only
+3. `os.UserHomeDir()` error path in `NewWithConfig()` — triggered only when
+   `$HOME` is unset, which test infrastructure prevents
+
+`go test -race ./...` passed clean. `go vet ./...` produced no warnings.
+
+---
+
+## Phase 1 — Generalisation Beyond Gemini
+
+Commit: `3c63b10` — included in the same commit as Phase 0
+
+The hardcoded Gemini quotas in `New()` were replaced with a provider-agnostic
+configuration system without breaking the existing API.
+
+### New types and functions
+
+- `Provider` string type with constants: `ProviderGemini`, `ProviderOpenAI`,
+  `ProviderAnthropic`, `ProviderLocal`
+- `ProviderProfile` — bundles a provider identifier with its model quota map
+- `Config` — construction configuration accepting `FilePath`, `Backend`,
+  `Providers`, and `Quotas` fields
+- `DefaultProfiles()` — returns fresh pre-configured profiles for all four providers
+- `NewWithConfig(Config)` — creates a limiter from explicit configuration
+- `SetQuota(model, quota)` — runtime quota modification, mutex-protected
+- `AddProvider(provider)` — loads all default quotas for a provider at runtime,
+  additive, mutex-protected
+
+### Backward compatibility
+
+`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`.
+`TestNewBackwardCompatibility` asserts exact parity with the original hardcoded
+values. No existing call sites required modification.
+
+### Design decision: merge-on-top
+
+Explicit `Config.Quotas` override provider profile defaults. This allows callers
+to use a provider profile for most models while customising specific model limits
+without forking the entire profile.
+
+---
+
+## Phase 2 — SQLite Persistent State
+
+Commit: `1afb1d6` — `feat(persist): Phase 2 — SQLite backend with WAL mode`
+
+The YAML backend serialises the full state on every `Persist()` call and is
+not safe for concurrent multi-process access. Phase 2 added a SQLite backend
+using `modernc.org/sqlite` (pure Go, no CGO) following the go-store pattern
+established elsewhere in the ecosystem.
+
+### New constructors
+
+- `NewWithSQLite(dbPath string)` — SQLite-backed limiter with Gemini defaults
+- `NewWithSQLiteConfig(dbPath string, cfg Config)` — SQLite-backed with custom config
+- `Close() error` — releases the database connection; no-op on YAML-backed limiters
+
+### Migration
+
+- `MigrateYAMLToSQLite(yamlPath, sqlitePath string) error` — one-shot migration
+  helper that reads an existing YAML state file and writes all quotas and usage
+  state to a new SQLite database
+
+### SQLite connection settings
+
+- `PRAGMA journal_mode=WAL` — enables concurrent reads alongside a single writer
+- `PRAGMA busy_timeout=5000` — 5-second wait on lock contention before returning an error
+- `db.SetMaxOpenConns(1)` — single connection for PRAGMA consistency
+
+### Tests added (sqlite_test.go)
+
+- `TestNewSQLiteStore_Good / _Bad` — creation and invalid path handling
+- `TestSQLiteQuotasRoundTrip_Good` — save/load round-trip
+- `TestSQLiteQuotasUpsert_Good` — upsert replaces existing rows
+- `TestSQLiteStateRoundTrip_Good` — multi-model state with nanosecond precision
+- `TestSQLiteStateOverwrite_Good` — delete-then-insert atomicity
+- `TestSQLiteEmptyState_Good` — fresh database returns empty maps
+- `TestNewWithSQLite_Good / TestNewWithSQLiteConfig_Good` — constructor tests
+- `TestSQLitePersistAndLoad_Good` — full persist + reload cycle
+- `TestSQLitePersistMultipleModels_Good` — multi-provider persistence
+- `TestSQLiteConcurrent_Good` — 10 goroutines x 20 ops, race-clean
+- `TestYAMLBackwardCompat_Good` — existing YAML tests pass unchanged
+- `TestMigrateYAMLToSQLite_Good / _Bad` — migration round-trip and error paths
+- `TestSQLiteCorruptDB_Ugly / TestSQLiteTruncatedDB_Ugly` — graceful corrupt DB recovery
+- `TestSQLiteEndToEnd_Good` — full two-session scenario
+
+---
+
+## Phase 3 — Integration (Planned)
+
+Not yet implemented. Intended downstream integrations:
+
+- Wire into `go-ml` backends so rate limiting is enforced automatically on
+  inference calls without caller involvement
+- Wire into the `go-ai` facade so all providers share a single rate limit layer
+- Export metrics (requests/minute, tokens/minute, rejection counts) for
+  monitoring dashboards
+
+---
+
+## Known Limitations
+
+**CountTokens URL is hardcoded.** The `CountTokens` helper calls
+`generativelanguage.googleapis.com` directly. There is no way to override the
+base URL, which prevents testing the success path in unit tests and prevents
+use with Gemini-compatible proxies. A future refactor would accept a base URL
+parameter or an `http.Client`.
+
+**saveState is a full table replace.** On every `Persist()` call, the `requests`,
+`tokens`, and `daily` tables are truncated and rewritten. For a limiter tracking
+many models with high RPM, this means writing hundreds of rows on every persist
+call. A future optimisation would use incremental writes (insert-only, with
+periodic vacuuming of expired rows).
+
+**No TTL on SQLite rows.** Historical rows older than one minute are pruned from
+the in-memory `UsageStats` on every operation but are written wholesale to
+SQLite on `Persist()`. The database does not grow unboundedly between persist
+cycles because `saveState` replaces all rows, but if `Persist()` is called
+frequently the WAL file can grow transiently.
+
+**WaitForCapacity polling interval is fixed at 1 second.** This is appropriate
+for RPM-scale limits but is coarse for sub-second limits. If a caller needs
+finer-grained waiting (e.g., smoothing requests within a minute), they must
+implement their own loop.
+
+**No automatic persistence.** `Persist()` must be called explicitly. If a
+process exits without calling `Persist()`, any usage recorded since the last
+persist is lost. Callers are responsible for calling `Persist()` at appropriate
+intervals (e.g., after each `RecordUsage()` call, or on a ticker).