feat(ratelimit): add agent decision guidance

Co-Authored-By: Virgil <virgil@lethean.io>
2026-03-30 08:16:44 +00:00 · 2026-03-30 08:16:44 +00:00 · ed5949ec3a
commit ed5949ec3a
parent 61ccc226b2
9 changed files with 330 additions and 87 deletions
--- a/README.md
+++ b/README.md
@ -39,6 +39,15 @@ if err := rl.Persist(); err != nil {
 }
 ```

+For agent workflows, `Decide` returns a structured verdict with retry guidance:
+
+```go
+decision := rl.Decide("gemini-2.0-flash", 1500)
+if !decision.Allowed {
+    log.Printf("throttled (%s); retry after %s", decision.Code, decision.RetryAfter)
+}
+```
+
 ## Documentation

 - [Architecture](docs/architecture.md) — sliding window algorithm, provider quotas, YAML and SQLite backends
--- a/docs/api-contract.md
+++ b/docs/api-contract.md
@ -14,6 +14,8 @@ Test coverage is marked `yes` when the symbol is exercised by the existing test
 | Type | `UsageStats` | `type UsageStats struct { Requests []time.Time; Tokens []TokenEntry; DayStart time.Time; DayCount int }` | Stores per-model sliding-window request and token history plus rolling daily usage state. | yes |
 | Type | `RateLimiter` | `type RateLimiter struct { Quotas map[string]ModelQuota; State map[string]*UsageStats }` | Manages quotas, usage state, persistence, and concurrency across models. | yes |
 | Type | `ModelStats` | `type ModelStats struct { RPM int; MaxRPM int; TPM int; MaxTPM int; RPD int; MaxRPD int; DayStart time.Time }` | Represents a snapshot of current usage and configured limits for a model. | yes |
+| Type | `DecisionCode` | `type DecisionCode string` | Machine-readable allow/deny codes returned by `Decide` (e.g., `ok`, `rpm_exceeded`). | yes |
+| Type | `Decision` | `type Decision struct { Allowed bool; Code DecisionCode; Reason string; RetryAfter time.Duration; Stats ModelStats }` | Structured decision result with a code, human-readable reason, optional retry guidance, and a stats snapshot. | yes |
 | Function | `DefaultProfiles` | `func DefaultProfiles() map[Provider]ProviderProfile` | Returns the built-in quota profiles for the supported providers. | yes |
 | Function | `New` | `func New() (*RateLimiter, error)` | Creates a new limiter with Gemini defaults for backward-compatible YAML-backed usage. | yes |
 | Function | `NewWithConfig` | `func NewWithConfig(cfg Config) (*RateLimiter, error)` | Creates a YAML-backed limiter from explicit configuration, defaulting to Gemini when config is empty. | yes |
@ -27,8 +29,9 @@ Test coverage is marked `yes` when the symbol is exercised by the existing test
 | Method | `Persist` | `func (rl *RateLimiter) Persist() error` | Persists a snapshot of quotas and usage state to YAML or SQLite. | yes |
 | Method | `BackgroundPrune` | `func (rl *RateLimiter) BackgroundPrune(interval time.Duration) func()` | Starts periodic pruning of expired usage state and returns a stop function. | yes |
 | Method | `CanSend` | `func (rl *RateLimiter) CanSend(model string, estimatedTokens int) bool` | Reports whether a request with the estimated token count fits within current limits. | yes |
+| Method | `Decide` | `func (rl *RateLimiter) Decide(model string, estimatedTokens int) Decision` | Returns structured allow/deny information including code, reason, retry guidance, and stats snapshot without recording usage. | yes |
 | Method | `RecordUsage` | `func (rl *RateLimiter) RecordUsage(model string, promptTokens, outputTokens int)` | Records a successful request into the sliding-window and daily counters. | yes |
-| Method | `WaitForCapacity` | `func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens int) error` | Blocks until `CanSend` succeeds or the context is cancelled. | yes |
+| Method | `WaitForCapacity` | `func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens int) error` | Blocks until `Decide` allows the request, sleeping according to `RetryAfter` hints or one-second polls. | yes |
 | Method | `Reset` | `func (rl *RateLimiter) Reset(model string)` | Clears usage state for one model or for all models when `model` is empty. | yes |
 | Method | `Models` | `func (rl *RateLimiter) Models() iter.Seq[string]` | Returns a sorted iterator of all model names known from quotas or state. | yes |
 | Method | `Iter` | `func (rl *RateLimiter) Iter() iter.Seq2[string, ModelStats]` | Returns a sorted iterator of model names paired with current stats snapshots. | yes |
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -119,6 +119,12 @@ The check order is: RPD, then RPM, then TPM. RPD is checked first because it
 is the cheapest comparison (a single integer). TPM is checked last because it
 requires summing the token counts in the sliding window.

+`Decide()` follows the same path as `CanSend()` but returns a structured
+`Decision` containing a machine-readable code, reason, `RetryAfter` guidance,
+and a `ModelStats` snapshot. It is agent-facing and does not record usage;
+`WaitForCapacity()` consumes its `RetryAfter` hint to avoid unnecessary
+one-second polling when limits are saturated.
+
 ### Daily Reset

 The daily counter resets automatically inside `prune()`. When
--- a/docs/history.md
+++ b/docs/history.md
@ -176,10 +176,10 @@ SQLite on `Persist()`. The database does not grow unboundedly between persist
 cycles because `saveState` replaces all rows, but if `Persist()` is called
 frequently the WAL file can grow transiently.

-**WaitForCapacity polling interval is fixed at 1 second.** This is appropriate
-for RPM-scale limits but is coarse for sub-second limits. If a caller needs
-finer-grained waiting (e.g., smoothing requests within a minute), they must
-implement their own loop.
+**WaitForCapacity now sleeps using `Decide`’s `RetryAfter` hint** (with a
+one-second fallback when no hint exists). This reduces busy looping on long
+windows but remains coarse for sub-second smoothing; callers that need
+sub-second pacing should implement their own loop.

 **No automatic persistence.** `Persist()` must be called explicitly. If a
 process exits without calling `Persist()`, any usage recorded since the last
--- a/docs/index.md
+++ b/docs/index.md
@ -86,6 +86,8 @@ if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
    return
 }
 // Capacity is available; proceed with the API call.
+
+// WaitForCapacity uses Decide's RetryAfter hint to avoid tight polling.
 ```

 ## Package Layout
--- a/docs/security-attack-vector-mapping.md
+++ b/docs/security-attack-vector-mapping.md
@ -16,7 +16,7 @@ Note: `CODEX.md` was not present anywhere under `/workspace` during this scan, s
 | `(*RateLimiter).BackgroundPrune(interval time.Duration)` | `ratelimit.go:328` | Caller-controlled `interval` | Passed to `time.NewTicker(interval)` and drives a background goroutine that repeatedly locks and prunes state | None | `interval <= 0` causes a panic; very small intervals can create CPU and lock-contention DoS; repeated calls without using the returned cancel function leak goroutines |
 | `(*RateLimiter).CanSend(model string, estimatedTokens int)` | `ratelimit.go:350` | Caller-controlled `model` and `estimatedTokens` | `model` indexes `rl.Quotas` / `rl.State`; `estimatedTokens` is added to the current token total before the TPM comparison | Unknown models are allowed immediately; no non-negative or range checks on `estimatedTokens` | Passing an unconfigured model name bypasses throttling entirely; negative or overflowed token values can undercount the TPM check and permit oversend |
 | `(*RateLimiter).RecordUsage(model string, promptTokens, outputTokens int)` | `ratelimit.go:396` | Caller-controlled `model`, `promptTokens`, `outputTokens` | Creates or updates `rl.State[model]`; stores `promptTokens + outputTokens` in the token window and increments `DayCount` | None | Arbitrary model names create unbounded state that will later persist to YAML/SQLite; negative or overflowed token totals poison accounting and can reduce future TPM totals below the real usage |
-| `(*RateLimiter).WaitForCapacity(ctx context.Context, model string, tokens int)` | `ratelimit.go:414` | Caller-controlled `ctx`, `model`, `tokens` | Calls `CanSend(model, tokens)` once per second until capacity is available or `ctx.Done()` fires | No direct validation; relies on downstream `CanSend()` and caller-supplied context cancellation | Inherits the unknown-model and negative-token bypasses from `CanSend()`; repeated calls with long-lived contexts can accumulate goroutines and lock pressure |
+| `(*RateLimiter).WaitForCapacity(ctx context.Context, model string, tokens int)` | `ratelimit.go:429` | Caller-controlled `ctx`, `model`, `tokens` | Calls `Decide(model, tokens)` in a loop and sleeps for the returned `RetryAfter` (or 1s fallback) until allowed or `ctx.Done()` fires | No direct validation beyond negative-token guard; relies on downstream `Decide()` and caller-supplied context cancellation | Long `RetryAfter` values can delay rechecks; repeated calls with long-lived contexts can still accumulate goroutines and lock pressure |
 | `(*RateLimiter).Reset(model string)` | `ratelimit.go:433` | Caller-controlled `model` | `model == ""` replaces the entire `rl.State` map; otherwise `delete(rl.State, model)` | Empty string is treated as a wildcard reset | If reachable by an untrusted actor, an empty string clears all rate-limit history and targeted resets erase throttling state for chosen models |
 | `(*RateLimiter).Stats(model string)` | `ratelimit.go:484` | Caller-controlled `model` | Prunes `rl.State[model]`, reads `rl.Quotas[model]`, and returns a usage snapshot | None | If exposed through a service boundary, it discloses per-model quota ceilings and live usage counts that can help an attacker tune evasion or timing |
 | `NewWithSQLite(dbPath string)` | `ratelimit.go:567` | Caller-controlled `dbPath` | Thin wrapper that forwards `dbPath` into `NewWithSQLiteConfig()` and then `newSQLiteStore()` | No additional validation in the wrapper | Untrusted `dbPath` can steer database creation/opening to unintended local filesystem locations, including companion `-wal` and `-shm` files |
--- a/ratelimit.go
+++ b/ratelimit.go
@ -399,49 +399,7 @@ func (rl *RateLimiter) BackgroundPrune(interval time.Duration) func() {
 //
 //	ok := rl.CanSend("gemini-3-pro-preview", 1200)
 func (rl *RateLimiter) CanSend(model string, estimatedTokens int) bool {
-	if estimatedTokens < 0 {
-		return false
-	}
-
-	rl.mu.Lock()
-	defer rl.mu.Unlock()
-
-	quota, ok := rl.Quotas[model]
-	if !ok {
-		return true // Unknown models are allowed
-	}
-
-	// Unlimited check
-	if quota.MaxRPM == 0 && quota.MaxTPM == 0 && quota.MaxRPD == 0 {
-		return true
-	}
-
-	rl.prune(model)
-	stats, ok := rl.State[model]
-	if !ok {
-		stats = &UsageStats{DayStart: time.Now()}
-		rl.State[model] = stats
-	}
-
-	// Check RPD
-	if quota.MaxRPD > 0 && stats.DayCount >= quota.MaxRPD {
-		return false
-	}
-
-	// Check RPM
-	if quota.MaxRPM > 0 && len(stats.Requests) >= quota.MaxRPM {
-		return false
-	}
-
-	// Check TPM
-	if quota.MaxTPM > 0 {
-		currentTokens := totalTokenCount(stats.Tokens)
-		if estimatedTokens > quota.MaxTPM || currentTokens > quota.MaxTPM-estimatedTokens {
-			return false
-		}
-	}
-
-	return true
+	return rl.Decide(model, estimatedTokens).Allowed
 }

 // RecordUsage records a successful API call.
@ -473,19 +431,24 @@ func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens
 		return core.E("ratelimit.WaitForCapacity", "negative tokens", nil)
 	}

-	ticker := time.NewTicker(1 * time.Second)
-	defer ticker.Stop()
-
 	for {
-		if rl.CanSend(model, tokens) {
+		decision := rl.Decide(model, tokens)
+		if decision.Allowed {
 			return nil
 		}

+		sleep := decision.RetryAfter
+		if sleep <= 0 {
+			sleep = time.Second
+		}
+
+		timer := time.NewTimer(sleep)
 		select {
 		case <-ctx.Done():
+			timer.Stop()
 			return ctx.Err()
-		case <-ticker.C:
-			// check again
+		case <-timer.C:
+			timer.Stop()
 		}
 	}
 }
@ -524,6 +487,36 @@ type ModelStats struct {
 	DayStart time.Time
 }

+// DecisionCode identifies the reason for an allow or deny outcome from Decide.
+type DecisionCode string
+
+const (
+	// DecisionAllowed means the request fits within all configured limits.
+	DecisionAllowed DecisionCode = "ok"
+	// DecisionUnknownModel means the model has no configured quotas and is therefore allowed.
+	DecisionUnknownModel DecisionCode = "unknown_model"
+	// DecisionUnlimited means the model is configured with no limits.
+	DecisionUnlimited DecisionCode = "unlimited"
+	// DecisionInvalidTokens means a negative token estimate was provided.
+	DecisionInvalidTokens DecisionCode = "invalid_tokens"
+	// DecisionRPDLimit means the rolling 24-hour request limit has been reached.
+	DecisionRPDLimit DecisionCode = "rpd_exceeded"
+	// DecisionRPMLimit means the per-minute request limit has been reached.
+	DecisionRPMLimit DecisionCode = "rpm_exceeded"
+	// DecisionTPMLimit means the per-minute token limit would be exceeded.
+	DecisionTPMLimit DecisionCode = "tpm_exceeded"
+)
+
+// Decision captures an allow/deny decision with context for agents.
+// RetryAfter is zero when the request is allowed or when no meaningful wait time exists.
+type Decision struct {
+	Allowed    bool
+	Code       DecisionCode
+	Reason     string
+	RetryAfter time.Duration
+	Stats      ModelStats
+}
+
 // Models returns a sorted iterator over all model names tracked by the limiter.
 //
 //	for model := range rl.Models() { println(model) }
@ -565,24 +558,7 @@ func (rl *RateLimiter) Stats(model string) ModelStats {

 	rl.prune(model)

-	stats := ModelStats{}
-	quota, ok := rl.Quotas[model]
-	if ok {
-		stats.MaxRPM = quota.MaxRPM
-		stats.MaxTPM = quota.MaxTPM
-		stats.MaxRPD = quota.MaxRPD
-	}
-
-	if s, ok := rl.State[model]; ok {
-		stats.RPM = len(s.Requests)
-		stats.RPD = s.DayCount
-		stats.DayStart = s.DayStart
-		for _, t := range s.Tokens {
-			stats.TPM += t.Count
-		}
-	}
-
-	return stats
+	return rl.snapshotLocked(model)
 }

 // AllStats returns stats for all tracked models.
@ -605,24 +581,108 @@ func (rl *RateLimiter) AllStats() map[string]ModelStats {
 	for m := range result {
 		rl.prune(m)

-		ms := ModelStats{}
-		if q, ok := rl.Quotas[m]; ok {
-			ms.MaxRPM = q.MaxRPM
-			ms.MaxTPM = q.MaxTPM
-			ms.MaxRPD = q.MaxRPD
-		}
-		if s, ok := rl.State[m]; ok && s != nil {
-			ms.RPM = len(s.Requests)
-			ms.RPD = s.DayCount
-			ms.DayStart = s.DayStart
-			ms.TPM = totalTokenCount(s.Tokens)
-		}
-		result[m] = ms
+		result[m] = rl.snapshotLocked(m)
 	}

 	return result
 }

+// Decide returns structured allow/deny information for an estimated request.
+// It never records usage; call RecordUsage after a successful decision.
+func (rl *RateLimiter) Decide(model string, estimatedTokens int) Decision {
+	if estimatedTokens < 0 {
+		return Decision{
+			Allowed: false,
+			Code:    DecisionInvalidTokens,
+			Reason:  "estimated tokens must be non-negative",
+			Stats:   rl.Stats(model),
+		}
+	}
+
+	rl.mu.Lock()
+	defer rl.mu.Unlock()
+
+	now := time.Now()
+	decision := Decision{}
+
+	quota, ok := rl.Quotas[model]
+	if !ok {
+		decision.Allowed = true
+		decision.Code = DecisionUnknownModel
+		decision.Reason = "model has no configured quota"
+		decision.Stats = rl.snapshotLocked(model)
+		return decision
+	}
+
+	if quota.MaxRPM == 0 && quota.MaxTPM == 0 && quota.MaxRPD == 0 {
+		decision.Allowed = true
+		decision.Code = DecisionUnlimited
+		decision.Reason = "all limits are unlimited"
+		decision.Stats = rl.snapshotLocked(model)
+		return decision
+	}
+
+	rl.prune(model)
+	stats, ok := rl.State[model]
+	if !ok || stats == nil {
+		stats = &UsageStats{DayStart: now}
+		rl.State[model] = stats
+	}
+
+	decision.Stats = rl.snapshotLocked(model)
+
+	if quota.MaxRPD > 0 && stats.DayCount >= quota.MaxRPD {
+		decision.Code = DecisionRPDLimit
+		decision.Reason = "daily request limit reached"
+		decision.RetryAfter = nonNegativeDuration(stats.DayStart.Add(24 * time.Hour).Sub(now))
+		return decision
+	}
+
+	if quota.MaxRPM > 0 && len(stats.Requests) >= quota.MaxRPM {
+		decision.Code = DecisionRPMLimit
+		decision.Reason = "per-minute request limit reached"
+		if len(stats.Requests) > 0 {
+			decision.RetryAfter = nonNegativeDuration(stats.Requests[0].Add(time.Minute).Sub(now))
+		}
+		return decision
+	}
+
+	if quota.MaxTPM > 0 {
+		currentTokens := totalTokenCount(stats.Tokens)
+		if estimatedTokens > quota.MaxTPM || currentTokens > quota.MaxTPM-estimatedTokens {
+			decision.Code = DecisionTPMLimit
+			decision.Reason = "per-minute token limit reached"
+			decision.RetryAfter = retryAfterForTokens(now, stats.Tokens, quota.MaxTPM, estimatedTokens)
+			return decision
+		}
+	}
+
+	decision.Allowed = true
+	decision.Code = DecisionAllowed
+	decision.Reason = "within quota"
+	return decision
+}
+
+// snapshotLocked builds ModelStats for the provided model.
+// Caller must hold rl.mu.
+func (rl *RateLimiter) snapshotLocked(model string) ModelStats {
+	stats := ModelStats{}
+
+	if q, ok := rl.Quotas[model]; ok {
+		stats.MaxRPM = q.MaxRPM
+		stats.MaxTPM = q.MaxTPM
+		stats.MaxRPD = q.MaxRPD
+	}
+
+	if s, ok := rl.State[model]; ok && s != nil {
+		stats.RPM = len(s.Requests)
+		stats.RPD = s.DayCount
+		stats.DayStart = s.DayStart
+		stats.TPM = totalTokenCount(s.Tokens)
+	}
+	return stats
+}
+
 // NewWithSQLite creates a SQLite-backed RateLimiter with Gemini defaults.
 // The database is created at dbPath if it does not exist. Use Close() to
 // release the database connection when finished.
@ -846,6 +906,37 @@ func safeTokenTotal(tokens []TokenEntry) int {
 	return total
 }

+func retryAfterForTokens(now time.Time, tokens []TokenEntry, maxTPM, estimatedTokens int) time.Duration {
+	if maxTPM <= 0 {
+		return 0
+	}
+
+	deficit := totalTokenCount(tokens) + estimatedTokens - maxTPM
+	if deficit <= 0 {
+		return 0
+	}
+
+	remaining := deficit
+	for _, entry := range tokens {
+		if entry.Count < 0 {
+			continue
+		}
+		remaining -= entry.Count
+		if remaining <= 0 {
+			return nonNegativeDuration(entry.Time.Add(time.Minute).Sub(now))
+		}
+	}
+
+	return 0
+}
+
+func nonNegativeDuration(value time.Duration) time.Duration {
+	if value < 0 {
+		return 0
+	}
+	return value
+}
+
 func countTokensURL(baseURL, model string) (string, error) {
 	if core.Trim(model) == "" {
 		return "", core.E("ratelimit.countTokensURL", "empty model", nil)
--- a/ratelimit_test.go
+++ b/ratelimit_test.go
@ -253,6 +253,125 @@ func TestRatelimit_CanSend_Good(t *testing.T) {
 	})
 }

+// --- Phase 0: Decide surface area ---
+
+func TestRatelimit_Decide_Good(t *testing.T) {
+	t.Run("unknown model remains allowed with unknown code", func(t *testing.T) {
+		rl := newTestLimiter(t)
+
+		decision := rl.Decide("unknown-model", 50)
+
+		assert.True(t, decision.Allowed)
+		assert.Equal(t, DecisionUnknownModel, decision.Code)
+		assert.Zero(t, decision.RetryAfter)
+	})
+
+	t.Run("unlimited quota reports unlimited decision", func(t *testing.T) {
+		rl := newTestLimiter(t)
+		model := "unlimited"
+		rl.Quotas[model] = ModelQuota{}
+
+		decision := rl.Decide(model, 100)
+
+		assert.True(t, decision.Allowed)
+		assert.Equal(t, DecisionUnlimited, decision.Code)
+		assert.Equal(t, 0, decision.Stats.MaxRPM)
+		assert.Equal(t, 0, decision.Stats.MaxTPM)
+		assert.Equal(t, 0, decision.Stats.MaxRPD)
+	})
+
+	t.Run("rpd limit returns retry window", func(t *testing.T) {
+		rl := newTestLimiter(t)
+		model := "rpd-limit"
+		now := time.Now()
+		rl.Quotas[model] = ModelQuota{MaxRPM: 10, MaxTPM: 1000, MaxRPD: 2}
+		rl.State[model] = &UsageStats{DayStart: now.Add(-23 * time.Hour), DayCount: 2}
+
+		decision := rl.Decide(model, 10)
+
+		assert.False(t, decision.Allowed)
+		assert.Equal(t, DecisionRPDLimit, decision.Code)
+		assert.InDelta(t, time.Hour.Seconds(), decision.RetryAfter.Seconds(), 2)
+		assert.Equal(t, 2, decision.Stats.MaxRPD)
+		assert.Equal(t, 2, decision.Stats.RPD)
+	})
+
+	t.Run("rpm limit includes retry-after estimate", func(t *testing.T) {
+		rl := newTestLimiter(t)
+		model := "rpm-limit"
+		now := time.Now()
+		rl.Quotas[model] = ModelQuota{MaxRPM: 1, MaxTPM: 1000, MaxRPD: 5}
+		rl.State[model] = &UsageStats{
+			Requests: []time.Time{now.Add(-10 * time.Second)},
+			Tokens:   []TokenEntry{{Time: now.Add(-10 * time.Second), Count: 10}},
+			DayStart: now,
+			DayCount: 1,
+		}
+
+		decision := rl.Decide(model, 5)
+
+		assert.False(t, decision.Allowed)
+		assert.Equal(t, DecisionRPMLimit, decision.Code)
+		assert.InDelta(t, 50, decision.RetryAfter.Seconds(), 1)
+	})
+
+	t.Run("tpm limit surfaces earliest expiry", func(t *testing.T) {
+		rl := newTestLimiter(t)
+		model := "tpm-limit"
+		now := time.Now()
+		rl.Quotas[model] = ModelQuota{MaxRPM: 10, MaxTPM: 100, MaxRPD: 10}
+		rl.State[model] = &UsageStats{
+			Requests: []time.Time{now.Add(-30 * time.Second)},
+			Tokens: []TokenEntry{
+				{Time: now.Add(-50 * time.Second), Count: 70},
+				{Time: now.Add(-10 * time.Second), Count: 20},
+			},
+			DayStart: now,
+			DayCount: 2,
+		}
+
+		decision := rl.Decide(model, 20)
+
+		assert.False(t, decision.Allowed)
+		assert.Equal(t, DecisionTPMLimit, decision.Code)
+		assert.InDelta(t, 10, decision.RetryAfter.Seconds(), 1)
+	})
+
+	t.Run("allowed decision carries stats snapshot", func(t *testing.T) {
+		rl := newTestLimiter(t)
+		model := "decide-allowed"
+		rl.Quotas[model] = ModelQuota{MaxRPM: 5, MaxTPM: 200, MaxRPD: 3}
+		now := time.Now()
+		rl.State[model] = &UsageStats{
+			Requests: []time.Time{now.Add(-5 * time.Second)},
+			Tokens:   []TokenEntry{{Time: now.Add(-5 * time.Second), Count: 30}},
+			DayStart: now,
+			DayCount: 1,
+		}
+
+		decision := rl.Decide(model, 20)
+
+		assert.True(t, decision.Allowed)
+		assert.Equal(t, DecisionAllowed, decision.Code)
+		assert.Equal(t, 1, decision.Stats.RPM)
+		assert.Equal(t, 30, decision.Stats.TPM)
+		assert.Equal(t, 1, decision.Stats.RPD)
+		assert.Equal(t, 5, decision.Stats.MaxRPM)
+		assert.Equal(t, 200, decision.Stats.MaxTPM)
+		assert.Equal(t, 3, decision.Stats.MaxRPD)
+	})
+
+	t.Run("negative estimate returns invalid decision", func(t *testing.T) {
+		rl := newTestLimiter(t)
+
+		decision := rl.Decide("neg", -5)
+
+		assert.False(t, decision.Allowed)
+		assert.Equal(t, DecisionInvalidTokens, decision.Code)
+		assert.Zero(t, decision.RetryAfter)
+	})
+}
+
 // --- Phase 0: Sliding window / prune tests ---

 func TestRatelimit_Prune_Good(t *testing.T) {
--- a/specs/RFC.md
+++ b/specs/RFC.md
@ -75,6 +75,16 @@
 - `MaxRPD int`: configured requests-per-day limit.
 - `DayStart time.Time`: start of the current rolling 24-hour window. This is zero if the model has no recorded state.

+### `DecisionCode`
+`type DecisionCode string`
+
+`DecisionCode` enumerates machine-readable allow/deny codes returned by `Decide`. Defined values: `ok`, `unknown_model`, `unlimited`, `invalid_tokens`, `rpd_exceeded`, `rpm_exceeded`, and `tpm_exceeded`.
+
+### `Decision`
+`type Decision struct`
+
+`Decision` bundles the outcome from `Decide`, including whether the request is allowed, a `DecisionCode`, a human-readable `Reason`, an optional `RetryAfter` duration when throttled, and a `ModelStats` snapshot at the time of evaluation.
+
 ## Functions

 ### `DefaultProfiles() map[Provider]ProviderProfile`
@ -104,11 +114,14 @@ Starts a background goroutine that prunes expired entries from every tracked mod
 ### `func (rl *RateLimiter) CanSend(model string, estimatedTokens int) bool`
 Reports whether a request for `model` can be sent without violating the configured limits. Negative token estimates are rejected. Models with no configured quota are allowed. If all three limits for a known model are `0`, the model is treated as unlimited. Before evaluating the request, the limiter prunes entries older than one minute and resets the rolling daily counter when its 24-hour window has elapsed. The method then checks requests-per-day, requests-per-minute, and tokens-per-minute against the estimated token count.

+### `func (rl *RateLimiter) Decide(model string, estimatedTokens int) Decision`
+Returns a structured allow/deny decision for the estimated request. The result includes a `DecisionCode`, a human-readable `Reason`, optional `RetryAfter` guidance when throttled, and a `ModelStats` snapshot. It prunes expired state, initialises empty state for configured models, but does not record usage.
+
 ### `func (rl *RateLimiter) RecordUsage(model string, promptTokens, outputTokens int)`
 Records a successful request for `model`. The limiter prunes stale entries first, creates state for the model if needed, appends the current timestamp to the request window, appends a token entry containing the combined prompt and output token count, and increments the rolling daily counter. Negative token values are ignored by the internal token summation logic rather than reducing the recorded total.

 ### `func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens int) error`
-Blocks until `CanSend(model, tokens)` succeeds or `ctx` is cancelled. The method polls once per second. If `tokens` is negative, it returns an error immediately.
+Blocks until `Decide(model, tokens)` allows the request or `ctx` is cancelled. The method uses the `RetryAfter` hint from `Decide` to sleep between checks, falling back to one-second polling when no hint is available. If `tokens` is negative, it returns an error immediately.

 ### `func (rl *RateLimiter) Reset(model string)`
 Clears usage state without changing quotas. If `model` is empty, it drops all tracked state. Otherwise it removes state only for the named model.