feat(ratelimit): generalise beyond Gemini with provider profiles and push coverage to 95%

Phase 0: Rewrite test suite with testify (77.1% -> 95.1% coverage). Add boundary tests, concurrent access tests, benchmarks, error path coverage for Load/Persist, Reset, Stats, and CountTokens. Phase 1: Extract hardcoded Gemini quotas into provider-agnostic config. Add Provider type, DefaultProfiles(), NewWithConfig(), SetQuota(), and AddProvider(). Pre-configured profiles for Gemini, OpenAI, Anthropic, and Local. New() retains exact backward compatibility via delegation. Co-Authored-By: Charon <developers@lethean.io>
2026-02-20 01:07:57 +00:00 · 2026-02-20 01:07:57 +00:00 · 3c63b1022a
commit 3c63b1022a
parent 666deed718
6 changed files with 1339 additions and 152 deletions
--- a/FINDINGS.md
+++ b/FINDINGS.md
@ -21,3 +21,86 @@ Extracted from `forge.lthn.ai/core/go` on 19 Feb 2026.
 ### Tests

 - 1 test file covering sliding window and quota enforcement
+
+---
+
+## 2026-02-20: Phase 0 -- Hardening (Charon)
+
+### Coverage: 77.1% -> 95.1%
+
+Rewrote test suite with testify assert/require. Table-driven subtests throughout.
+
+#### Tests added
+
+- **CanSend boundaries**: exact RPM/TPM/RPD limits, RPM-only, TPM-only, zero-token estimates, unknown models, unlimited models
+- **Prune**: keeps recent entries, prunes old ones, daily reset at 24h, boundary-exact timestamps, noop on non-existent model
+- **RecordUsage**: fresh state, accumulation, existing state
+- **Reset**: single model, all models (empty string), non-existent model
+- **WaitForCapacity**: immediate capacity, context cancellation, pre-cancelled context, unknown model
+- **Stats/AllStats**: known/unknown/quota-only models, pruning in AllStats, daily reset in AllStats
+- **Persist/Load**: round-trip, non-existent file, corrupt YAML, unreadable file, nested directory creation, unwritable directory
+- **Concurrency**: 20 goroutines x 50 ops (CanSend + RecordUsage + Stats), concurrent Reset + RecordUsage + AllStats
+- **Benchmarks**: BenchmarkCanSend (1000-entry window), BenchmarkRecordUsage, BenchmarkCanSendConcurrent
+
+#### Remaining uncovered (5%)
+
+- `CountTokens` success path: hardcoded Google URL prevents unit testing without URL injection. Only the connection-error path is covered.
+- `yaml.Marshal` error in `Persist()`: virtually impossible to trigger with valid structs.
+- `os.UserHomeDir` error in `NewWithConfig()`: only fails when `$HOME` is unset.
+
+### Race detector
+
+`go test -race ./...` passes clean. The `sync.RWMutex` correctly guards all shared state.
+
+### go vet
+
+No warnings.
+
+---
+
+## 2026-02-20: Phase 1 -- Generalisation (Charon)
+
+### Problem
+
+Hardcoded Gemini-specific quotas in `New()`. No way to configure for other providers.
+
+### Solution
+
+Introduced provider-agnostic configuration without breaking existing API.
+
+#### New types
+
+- `Provider` -- string type with constants: `ProviderGemini`, `ProviderOpenAI`, `ProviderAnthropic`, `ProviderLocal`
+- `ProviderProfile` -- bundles provider identity with model quotas map
+- `Config` -- construction config with `FilePath`, `Providers` list, `Quotas` map
+
+#### New functions
+
+- `DefaultProfiles()` -- returns pre-configured profiles for all four providers
+- `NewWithConfig(Config)` -- creates limiter from explicit configuration
+- `SetQuota(model, quota)` -- runtime quota modification
+- `AddProvider(provider)` -- loads all default quotas for a provider at runtime
+
+#### Provider defaults (Feb 2026)
+
+| Provider | Models | RPM | TPM | RPD |
+|----------|--------|-----|-----|-----|
+| Gemini | gemini-3-pro-preview, gemini-3-flash-preview, gemini-2.5-pro | 150 | 1M | 1000 |
+| Gemini | gemini-2.0-flash | 150 | 1M | unlimited |
+| Gemini | gemini-2.0-flash-lite | unlimited | unlimited | unlimited |
+| OpenAI | gpt-4o, gpt-4-turbo, o1 | 500 | 30K | unlimited |
+| OpenAI | gpt-4o-mini, o1-mini, o3-mini | 500 | 200K | unlimited |
+| Anthropic | claude-opus-4, claude-sonnet-4 | 50 | 40K | unlimited |
+| Anthropic | claude-haiku-3.5 | 50 | 50K | unlimited |
+| Local | (none by default) | -- | -- | -- |
+
+#### Backward compatibility
+
+`New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Verified by `TestNewBackwardCompatibility` which asserts exact parity with the original hardcoded values.
+
+#### Design notes
+
+- Explicit quotas in `Config.Quotas` override provider defaults (merge-on-top pattern)
+- Local provider has no default quotas -- users add per-model limits for hardware throttling
+- `AddProvider()` is additive -- calling it does not remove existing quotas
+- All new methods are mutex-protected and safe for concurrent use
--- a/TODO.md
+++ b/TODO.md
@ -1,4 +1,4 @@
-# TODO.md — go-ratelimit
+# TODO.md -- go-ratelimit

 Dispatched from core/go orchestration. Pick up tasks in order.

@ -6,20 +6,23 @@ Dispatched from core/go orchestration. Pick up tasks in order.

 ## Phase 0: Hardening & Test Coverage

- [ ] **Expand test coverage** — `ratelimit_test.go` exists. Add tests for: `CanSend()` at exact limits (RPM, TPM, RPD boundaries), `RecordUsage()` with concurrent goroutines (race test), `WaitForCapacity()` timeout behaviour, `prune()` sliding window edge cases, daily reset logic (cross-midnight), YAML persistence (save + reload state), empty/corrupt state file recovery.
- [ ] **Race condition test** — `go test -race ./...` with 10 goroutines calling `CanSend()` + `RecordUsage()` concurrently. The `sync.RWMutex` should handle it but verify.
- [ ] **Benchmark** — Add `BenchmarkCanSend` and `BenchmarkRecordUsage` with 1000 entries in sliding window. Measure prune() overhead.
- [ ] **`go vet ./...` clean** — Fix any warnings.
+- [x] **Expand test coverage** -- `ratelimit_test.go` rewritten with testify. Tests for: `CanSend()` at exact limits (RPM, TPM, RPD boundaries), `RecordUsage()` with concurrent goroutines, `WaitForCapacity()` timeout and immediate-capacity paths, `prune()` sliding window edge cases, daily reset logic (24h boundary), YAML persistence (save + reload), corrupt/unreadable state file recovery, `Reset()` single/all/nonexistent, `Stats()` known/unknown/quota-only models, `AllStats()` with pruning and daily reset.
+- [x] **Race condition test** -- `go test -race ./...` with 20 goroutines calling `CanSend()` + `RecordUsage()` + `Stats()` concurrently. Additional test with concurrent `Reset()` + `RecordUsage()` + `AllStats()`. All pass clean.
+- [x] **Benchmark** -- `BenchmarkCanSend` (1000-entry window), `BenchmarkRecordUsage`, `BenchmarkCanSendConcurrent` (parallel). Measures prune() overhead.
+- [x] **`go vet ./...` clean** -- No warnings.
+- **Coverage: 95.1%** (up from 77.1%). Remaining uncovered: `CountTokens` success path (hardcoded Google URL), `yaml.Marshal` error path in `Persist()`, `os.UserHomeDir` error path in `NewWithConfig`.

 ## Phase 1: Generalise Beyond Gemini

- [ ] Hardcoded model quotas are Gemini-specific — abstract to provider-agnostic config
- [ ] Add quota profiles for OpenAI, Anthropic, and local (Ollama/MLX) backends
- [ ] Make default quotas configurable via YAML or environment variables
+- [x] **Provider-agnostic config** -- Added `Provider` type, `ProviderProfile`, `Config` struct, `NewWithConfig()` constructor. Quotas are no longer hardcoded in `New()`.
+- [x] **Quota profiles** -- `DefaultProfiles()` returns pre-configured profiles for Gemini, OpenAI (gpt-4o, o1, o3-mini), Anthropic (claude-opus-4, claude-sonnet-4, claude-haiku-3.5), and Local (empty, user-configurable).
+- [x] **Configurable defaults** -- `Config` struct accepts `FilePath`, `Providers` list, and explicit `Quotas` map. Explicit quotas override provider defaults. YAML-serialisable.
+- [x] **Backward compatibility** -- `New()` delegates to `NewWithConfig(Config{Providers: []Provider{ProviderGemini}})`. Existing API unchanged. Test `TestNewBackwardCompatibility` verifies exact parity.
+- [x] **Runtime configuration** -- `SetQuota()` and `AddProvider()` allow modifying quotas after construction. Both are mutex-protected.

 ## Phase 2: Persistent State

- [ ] Currently stores state in YAML file — not safe for multi-process access
+- [ ] Currently stores state in YAML file -- not safe for multi-process access
 - [ ] Consider SQLite for concurrent read/write safety (WAL mode)
 - [ ] Add state recovery on restart (reload sliding window from persisted data)

--- a/go.mod
+++ b/go.mod
@ -3,3 +3,9 @@ module forge.lthn.ai/core/go-ratelimit
 go 1.25.5

 require gopkg.in/yaml.v3 v3.0.1
+
+require (
+	github.com/davecgh/go-spew v1.1.1 // indirect
+	github.com/pmezard/go-difflib v1.0.0 // indirect
+	github.com/stretchr/testify v1.11.1
+)
--- a/go.sum
+++ b/go.sum
@ -1,3 +1,9 @@
+github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
+github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
+github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
+github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
+github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
+github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
--- a/ratelimit.go
+++ b/ratelimit.go
@ -15,13 +15,48 @@ import (
 	"gopkg.in/yaml.v3"
 )

+// Provider identifies an LLM provider for quota profiles.
+type Provider string
+
+const (
+	// ProviderGemini is Google's Gemini family (default).
+	ProviderGemini Provider = "gemini"
+	// ProviderOpenAI is OpenAI's GPT/o-series family.
+	ProviderOpenAI Provider = "openai"
+	// ProviderAnthropic is Anthropic's Claude family.
+	ProviderAnthropic Provider = "anthropic"
+	// ProviderLocal is for local inference (Ollama, MLX, llama.cpp).
+	ProviderLocal Provider = "local"
+)
+
 // ModelQuota defines the rate limits for a specific model.
 type ModelQuota struct {
-	MaxRPM int `yaml:"max_rpm"` // Requests per minute
-	MaxTPM int `yaml:"max_tpm"` // Tokens per minute
+	MaxRPM int `yaml:"max_rpm"` // Requests per minute (0 = unlimited)
+	MaxTPM int `yaml:"max_tpm"` // Tokens per minute (0 = unlimited)
 	MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
 }

+// ProviderProfile bundles model quotas for a provider.
+type ProviderProfile struct {
+	Provider Provider                `yaml:"provider"`
+	Models   map[string]ModelQuota   `yaml:"models"`
+}
+
+// Config controls RateLimiter initialisation.
+type Config struct {
+	// FilePath overrides the default state file location.
+	// If empty, defaults to ~/.core/ratelimits.yaml.
+	FilePath string `yaml:"file_path,omitempty"`
+
+	// Quotas sets per-model rate limits directly.
+	// These are merged on top of any provider profile defaults.
+	Quotas map[string]ModelQuota `yaml:"quotas,omitempty"`
+
+	// Providers lists provider profiles to load.
+	// If empty and Quotas is also empty, Gemini defaults are used.
+	Providers []Provider `yaml:"providers,omitempty"`
+}
+
 // TokenEntry records a token usage event.
 type TokenEntry struct {
 	Time  time.Time `yaml:"time"`
@ -44,29 +79,121 @@ type RateLimiter struct {
 	filePath string
 }

-// New creates a new RateLimiter with default quotas.
+// DefaultProfiles returns pre-configured quota profiles for each provider.
+// Values are based on published rate limits as of Feb 2026.
+func DefaultProfiles() map[Provider]ProviderProfile {
+	return map[Provider]ProviderProfile{
+		ProviderGemini: {
+			Provider: ProviderGemini,
+			Models: map[string]ModelQuota{
+				"gemini-3-pro-preview":   {MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 1000},
+				"gemini-3-flash-preview": {MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 1000},
+				"gemini-2.5-pro":         {MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 1000},
+				"gemini-2.0-flash":       {MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 0}, // Unlimited RPD
+				"gemini-2.0-flash-lite":  {MaxRPM: 0, MaxTPM: 0, MaxRPD: 0},         // Unlimited
+			},
+		},
+		ProviderOpenAI: {
+			Provider: ProviderOpenAI,
+			Models: map[string]ModelQuota{
+				"gpt-4o":       {MaxRPM: 500, MaxTPM: 30000, MaxRPD: 0},
+				"gpt-4o-mini":  {MaxRPM: 500, MaxTPM: 200000, MaxRPD: 0},
+				"gpt-4-turbo":  {MaxRPM: 500, MaxTPM: 30000, MaxRPD: 0},
+				"o1":           {MaxRPM: 500, MaxTPM: 30000, MaxRPD: 0},
+				"o1-mini":      {MaxRPM: 500, MaxTPM: 200000, MaxRPD: 0},
+				"o3-mini":      {MaxRPM: 500, MaxTPM: 200000, MaxRPD: 0},
+			},
+		},
+		ProviderAnthropic: {
+			Provider: ProviderAnthropic,
+			Models: map[string]ModelQuota{
+				"claude-opus-4":    {MaxRPM: 50, MaxTPM: 40000, MaxRPD: 0},
+				"claude-sonnet-4":  {MaxRPM: 50, MaxTPM: 40000, MaxRPD: 0},
+				"claude-haiku-3.5": {MaxRPM: 50, MaxTPM: 50000, MaxRPD: 0},
+			},
+		},
+		ProviderLocal: {
+			Provider: ProviderLocal,
+			Models: map[string]ModelQuota{
+				// Local inference has no external rate limits by default.
+				// Users can override per-model if their hardware requires throttling.
+			},
+		},
+	}
+}
+
+// New creates a new RateLimiter with Gemini defaults.
+// This preserves backward compatibility -- existing callers are unaffected.
 func New() (*RateLimiter, error) {
-	home, err := os.UserHomeDir()
-	if err != nil {
-		return nil, err
+	return NewWithConfig(Config{
+		Providers: []Provider{ProviderGemini},
+	})
+}
+
+// NewWithConfig creates a RateLimiter from explicit configuration.
+// If no providers or quotas are specified, Gemini defaults are used.
+func NewWithConfig(cfg Config) (*RateLimiter, error) {
+	filePath := cfg.FilePath
+	if filePath == "" {
+		home, err := os.UserHomeDir()
+		if err != nil {
+			return nil, err
+		}
+		filePath = filepath.Join(home, ".core", "ratelimits.yaml")
 	}

 	rl := &RateLimiter{
 		Quotas:   make(map[string]ModelQuota),
 		State:    make(map[string]*UsageStats),
-		filePath: filepath.Join(home, ".core", "ratelimits.yaml"),
+		filePath: filePath,
 	}

-	// Default quotas based on Tier 1 observations (Feb 2026)
-	rl.Quotas["gemini-3-pro-preview"] = ModelQuota{MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 1000}
-	rl.Quotas["gemini-3-flash-preview"] = ModelQuota{MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 1000}
-	rl.Quotas["gemini-2.5-pro"] = ModelQuota{MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 1000}
-	rl.Quotas["gemini-2.0-flash"] = ModelQuota{MaxRPM: 150, MaxTPM: 1000000, MaxRPD: 0} // Unlimited RPD
-	rl.Quotas["gemini-2.0-flash-lite"] = ModelQuota{MaxRPM: 0, MaxTPM: 0, MaxRPD: 0}   // Unlimited
+	// Load provider profiles
+	profiles := DefaultProfiles()
+	providers := cfg.Providers
+
+	// If nothing specified at all, default to Gemini
+	if len(providers) == 0 && len(cfg.Quotas) == 0 {
+		providers = []Provider{ProviderGemini}
+	}
+
+	for _, p := range providers {
+		if profile, ok := profiles[p]; ok {
+			for model, quota := range profile.Models {
+				rl.Quotas[model] = quota
+			}
+		}
+	}
+
+	// Merge explicit quotas on top (allows overrides)
+	for model, quota := range cfg.Quotas {
+		rl.Quotas[model] = quota
+	}

 	return rl, nil
 }

+// SetQuota sets or updates the quota for a specific model at runtime.
+func (rl *RateLimiter) SetQuota(model string, quota ModelQuota) {
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+	rl.Quotas[model] = quota
+}
+
+// AddProvider loads all default quotas for a provider.
+// Existing quotas for models in the profile are overwritten.
+func (rl *RateLimiter) AddProvider(provider Provider) {
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+
+	profiles := DefaultProfiles()
+	if profile, ok := profiles[provider]; ok {
+		for model, quota := range profile.Models {
+			rl.Quotas[model] = quota
+		}
+	}
+}
+
 // Load reads the state from disk.
 func (rl *RateLimiter) Load() error {
 	rl.mu.Lock()
--- a/ratelimit_test.go
+++ b/ratelimit_test.go