feat(ratelimit): add agent decision guidance
All checks were successful
Security Scan / security (push) Successful in 9s
Test / test (push) Successful in 2m19s

Co-Authored-By: Virgil <virgil@lethean.io>
This commit is contained in:
Virgil 2026-03-30 08:16:44 +00:00
parent 61ccc226b2
commit ed5949ec3a
9 changed files with 330 additions and 87 deletions

View file

@ -39,6 +39,15 @@ if err := rl.Persist(); err != nil {
}
```
For agent workflows, `Decide` returns a structured verdict with retry guidance:
```go
decision := rl.Decide("gemini-2.0-flash", 1500)
if !decision.Allowed {
log.Printf("throttled (%s); retry after %s", decision.Code, decision.RetryAfter)
}
```
## Documentation
- [Architecture](docs/architecture.md) — sliding window algorithm, provider quotas, YAML and SQLite backends

View file

@ -14,6 +14,8 @@ Test coverage is marked `yes` when the symbol is exercised by the existing test
| Type | `UsageStats` | `type UsageStats struct { Requests []time.Time; Tokens []TokenEntry; DayStart time.Time; DayCount int }` | Stores per-model sliding-window request and token history plus rolling daily usage state. | yes |
| Type | `RateLimiter` | `type RateLimiter struct { Quotas map[string]ModelQuota; State map[string]*UsageStats }` | Manages quotas, usage state, persistence, and concurrency across models. | yes |
| Type | `ModelStats` | `type ModelStats struct { RPM int; MaxRPM int; TPM int; MaxTPM int; RPD int; MaxRPD int; DayStart time.Time }` | Represents a snapshot of current usage and configured limits for a model. | yes |
| Type | `DecisionCode` | `type DecisionCode string` | Machine-readable allow/deny codes returned by `Decide` (e.g., `ok`, `rpm_exceeded`). | yes |
| Type | `Decision` | `type Decision struct { Allowed bool; Code DecisionCode; Reason string; RetryAfter time.Duration; Stats ModelStats }` | Structured decision result with a code, human-readable reason, optional retry guidance, and a stats snapshot. | yes |
| Function | `DefaultProfiles` | `func DefaultProfiles() map[Provider]ProviderProfile` | Returns the built-in quota profiles for the supported providers. | yes |
| Function | `New` | `func New() (*RateLimiter, error)` | Creates a new limiter with Gemini defaults for backward-compatible YAML-backed usage. | yes |
| Function | `NewWithConfig` | `func NewWithConfig(cfg Config) (*RateLimiter, error)` | Creates a YAML-backed limiter from explicit configuration, defaulting to Gemini when config is empty. | yes |
@ -27,8 +29,9 @@ Test coverage is marked `yes` when the symbol is exercised by the existing test
| Method | `Persist` | `func (rl *RateLimiter) Persist() error` | Persists a snapshot of quotas and usage state to YAML or SQLite. | yes |
| Method | `BackgroundPrune` | `func (rl *RateLimiter) BackgroundPrune(interval time.Duration) func()` | Starts periodic pruning of expired usage state and returns a stop function. | yes |
| Method | `CanSend` | `func (rl *RateLimiter) CanSend(model string, estimatedTokens int) bool` | Reports whether a request with the estimated token count fits within current limits. | yes |
| Method | `Decide` | `func (rl *RateLimiter) Decide(model string, estimatedTokens int) Decision` | Returns structured allow/deny information including code, reason, retry guidance, and stats snapshot without recording usage. | yes |
| Method | `RecordUsage` | `func (rl *RateLimiter) RecordUsage(model string, promptTokens, outputTokens int)` | Records a successful request into the sliding-window and daily counters. | yes |
| Method | `WaitForCapacity` | `func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens int) error` | Blocks until `CanSend` succeeds or the context is cancelled. | yes |
| Method | `WaitForCapacity` | `func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens int) error` | Blocks until `Decide` allows the request, sleeping according to `RetryAfter` hints or one-second polls. | yes |
| Method | `Reset` | `func (rl *RateLimiter) Reset(model string)` | Clears usage state for one model or for all models when `model` is empty. | yes |
| Method | `Models` | `func (rl *RateLimiter) Models() iter.Seq[string]` | Returns a sorted iterator of all model names known from quotas or state. | yes |
| Method | `Iter` | `func (rl *RateLimiter) Iter() iter.Seq2[string, ModelStats]` | Returns a sorted iterator of model names paired with current stats snapshots. | yes |

View file

@ -119,6 +119,12 @@ The check order is: RPD, then RPM, then TPM. RPD is checked first because it
is the cheapest comparison (a single integer). TPM is checked last because it
requires summing the token counts in the sliding window.
`Decide()` follows the same path as `CanSend()` but returns a structured
`Decision` containing a machine-readable code, reason, `RetryAfter` guidance,
and a `ModelStats` snapshot. It is agent-facing and does not record usage;
`WaitForCapacity()` consumes its `RetryAfter` hint to avoid unnecessary
one-second polling when limits are saturated.
### Daily Reset
The daily counter resets automatically inside `prune()`. When

View file

@ -176,10 +176,10 @@ SQLite on `Persist()`. The database does not grow unboundedly between persist
cycles because `saveState` replaces all rows, but if `Persist()` is called
frequently the WAL file can grow transiently.
**WaitForCapacity polling interval is fixed at 1 second.** This is appropriate
for RPM-scale limits but is coarse for sub-second limits. If a caller needs
finer-grained waiting (e.g., smoothing requests within a minute), they must
implement their own loop.
**WaitForCapacity now sleeps using `Decide`s `RetryAfter` hint** (with a
one-second fallback when no hint exists). This reduces busy looping on long
windows but remains coarse for sub-second smoothing; callers that need
sub-second pacing should implement their own loop.
**No automatic persistence.** `Persist()` must be called explicitly. If a
process exits without calling `Persist()`, any usage recorded since the last

View file

@ -86,6 +86,8 @@ if err := rl.WaitForCapacity(ctx, "claude-opus-4", 2000); err != nil {
return
}
// Capacity is available; proceed with the API call.
// WaitForCapacity uses Decide's RetryAfter hint to avoid tight polling.
```
## Package Layout

View file

@ -16,7 +16,7 @@ Note: `CODEX.md` was not present anywhere under `/workspace` during this scan, s
| `(*RateLimiter).BackgroundPrune(interval time.Duration)` | `ratelimit.go:328` | Caller-controlled `interval` | Passed to `time.NewTicker(interval)` and drives a background goroutine that repeatedly locks and prunes state | None | `interval <= 0` causes a panic; very small intervals can create CPU and lock-contention DoS; repeated calls without using the returned cancel function leak goroutines |
| `(*RateLimiter).CanSend(model string, estimatedTokens int)` | `ratelimit.go:350` | Caller-controlled `model` and `estimatedTokens` | `model` indexes `rl.Quotas` / `rl.State`; `estimatedTokens` is added to the current token total before the TPM comparison | Unknown models are allowed immediately; no non-negative or range checks on `estimatedTokens` | Passing an unconfigured model name bypasses throttling entirely; negative or overflowed token values can undercount the TPM check and permit oversend |
| `(*RateLimiter).RecordUsage(model string, promptTokens, outputTokens int)` | `ratelimit.go:396` | Caller-controlled `model`, `promptTokens`, `outputTokens` | Creates or updates `rl.State[model]`; stores `promptTokens + outputTokens` in the token window and increments `DayCount` | None | Arbitrary model names create unbounded state that will later persist to YAML/SQLite; negative or overflowed token totals poison accounting and can reduce future TPM totals below the real usage |
| `(*RateLimiter).WaitForCapacity(ctx context.Context, model string, tokens int)` | `ratelimit.go:414` | Caller-controlled `ctx`, `model`, `tokens` | Calls `CanSend(model, tokens)` once per second until capacity is available or `ctx.Done()` fires | No direct validation; relies on downstream `CanSend()` and caller-supplied context cancellation | Inherits the unknown-model and negative-token bypasses from `CanSend()`; repeated calls with long-lived contexts can accumulate goroutines and lock pressure |
| `(*RateLimiter).WaitForCapacity(ctx context.Context, model string, tokens int)` | `ratelimit.go:429` | Caller-controlled `ctx`, `model`, `tokens` | Calls `Decide(model, tokens)` in a loop and sleeps for the returned `RetryAfter` (or 1s fallback) until allowed or `ctx.Done()` fires | No direct validation beyond negative-token guard; relies on downstream `Decide()` and caller-supplied context cancellation | Long `RetryAfter` values can delay rechecks; repeated calls with long-lived contexts can still accumulate goroutines and lock pressure |
| `(*RateLimiter).Reset(model string)` | `ratelimit.go:433` | Caller-controlled `model` | `model == ""` replaces the entire `rl.State` map; otherwise `delete(rl.State, model)` | Empty string is treated as a wildcard reset | If reachable by an untrusted actor, an empty string clears all rate-limit history and targeted resets erase throttling state for chosen models |
| `(*RateLimiter).Stats(model string)` | `ratelimit.go:484` | Caller-controlled `model` | Prunes `rl.State[model]`, reads `rl.Quotas[model]`, and returns a usage snapshot | None | If exposed through a service boundary, it discloses per-model quota ceilings and live usage counts that can help an attacker tune evasion or timing |
| `NewWithSQLite(dbPath string)` | `ratelimit.go:567` | Caller-controlled `dbPath` | Thin wrapper that forwards `dbPath` into `NewWithSQLiteConfig()` and then `newSQLiteStore()` | No additional validation in the wrapper | Untrusted `dbPath` can steer database creation/opening to unintended local filesystem locations, including companion `-wal` and `-shm` files |

View file

@ -399,49 +399,7 @@ func (rl *RateLimiter) BackgroundPrune(interval time.Duration) func() {
//
// ok := rl.CanSend("gemini-3-pro-preview", 1200)
func (rl *RateLimiter) CanSend(model string, estimatedTokens int) bool {
if estimatedTokens < 0 {
return false
}
rl.mu.Lock()
defer rl.mu.Unlock()
quota, ok := rl.Quotas[model]
if !ok {
return true // Unknown models are allowed
}
// Unlimited check
if quota.MaxRPM == 0 && quota.MaxTPM == 0 && quota.MaxRPD == 0 {
return true
}
rl.prune(model)
stats, ok := rl.State[model]
if !ok {
stats = &UsageStats{DayStart: time.Now()}
rl.State[model] = stats
}
// Check RPD
if quota.MaxRPD > 0 && stats.DayCount >= quota.MaxRPD {
return false
}
// Check RPM
if quota.MaxRPM > 0 && len(stats.Requests) >= quota.MaxRPM {
return false
}
// Check TPM
if quota.MaxTPM > 0 {
currentTokens := totalTokenCount(stats.Tokens)
if estimatedTokens > quota.MaxTPM || currentTokens > quota.MaxTPM-estimatedTokens {
return false
}
}
return true
return rl.Decide(model, estimatedTokens).Allowed
}
// RecordUsage records a successful API call.
@ -473,19 +431,24 @@ func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens
return core.E("ratelimit.WaitForCapacity", "negative tokens", nil)
}
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
if rl.CanSend(model, tokens) {
decision := rl.Decide(model, tokens)
if decision.Allowed {
return nil
}
sleep := decision.RetryAfter
if sleep <= 0 {
sleep = time.Second
}
timer := time.NewTimer(sleep)
select {
case <-ctx.Done():
timer.Stop()
return ctx.Err()
case <-ticker.C:
// check again
case <-timer.C:
timer.Stop()
}
}
}
@ -524,6 +487,36 @@ type ModelStats struct {
DayStart time.Time
}
// DecisionCode identifies the reason for an allow or deny outcome from Decide.
type DecisionCode string
const (
// DecisionAllowed means the request fits within all configured limits.
DecisionAllowed DecisionCode = "ok"
// DecisionUnknownModel means the model has no configured quotas and is therefore allowed.
DecisionUnknownModel DecisionCode = "unknown_model"
// DecisionUnlimited means the model is configured with no limits.
DecisionUnlimited DecisionCode = "unlimited"
// DecisionInvalidTokens means a negative token estimate was provided.
DecisionInvalidTokens DecisionCode = "invalid_tokens"
// DecisionRPDLimit means the rolling 24-hour request limit has been reached.
DecisionRPDLimit DecisionCode = "rpd_exceeded"
// DecisionRPMLimit means the per-minute request limit has been reached.
DecisionRPMLimit DecisionCode = "rpm_exceeded"
// DecisionTPMLimit means the per-minute token limit would be exceeded.
DecisionTPMLimit DecisionCode = "tpm_exceeded"
)
// Decision captures an allow/deny decision with context for agents.
// RetryAfter is zero when the request is allowed or when no meaningful wait time exists.
type Decision struct {
Allowed bool
Code DecisionCode
Reason string
RetryAfter time.Duration
Stats ModelStats
}
// Models returns a sorted iterator over all model names tracked by the limiter.
//
// for model := range rl.Models() { println(model) }
@ -565,24 +558,7 @@ func (rl *RateLimiter) Stats(model string) ModelStats {
rl.prune(model)
stats := ModelStats{}
quota, ok := rl.Quotas[model]
if ok {
stats.MaxRPM = quota.MaxRPM
stats.MaxTPM = quota.MaxTPM
stats.MaxRPD = quota.MaxRPD
}
if s, ok := rl.State[model]; ok {
stats.RPM = len(s.Requests)
stats.RPD = s.DayCount
stats.DayStart = s.DayStart
for _, t := range s.Tokens {
stats.TPM += t.Count
}
}
return stats
return rl.snapshotLocked(model)
}
// AllStats returns stats for all tracked models.
@ -605,24 +581,108 @@ func (rl *RateLimiter) AllStats() map[string]ModelStats {
for m := range result {
rl.prune(m)
ms := ModelStats{}
if q, ok := rl.Quotas[m]; ok {
ms.MaxRPM = q.MaxRPM
ms.MaxTPM = q.MaxTPM
ms.MaxRPD = q.MaxRPD
}
if s, ok := rl.State[m]; ok && s != nil {
ms.RPM = len(s.Requests)
ms.RPD = s.DayCount
ms.DayStart = s.DayStart
ms.TPM = totalTokenCount(s.Tokens)
}
result[m] = ms
result[m] = rl.snapshotLocked(m)
}
return result
}
// Decide returns structured allow/deny information for an estimated request.
// It never records usage; call RecordUsage after a successful decision.
func (rl *RateLimiter) Decide(model string, estimatedTokens int) Decision {
if estimatedTokens < 0 {
return Decision{
Allowed: false,
Code: DecisionInvalidTokens,
Reason: "estimated tokens must be non-negative",
Stats: rl.Stats(model),
}
}
rl.mu.Lock()
defer rl.mu.Unlock()
now := time.Now()
decision := Decision{}
quota, ok := rl.Quotas[model]
if !ok {
decision.Allowed = true
decision.Code = DecisionUnknownModel
decision.Reason = "model has no configured quota"
decision.Stats = rl.snapshotLocked(model)
return decision
}
if quota.MaxRPM == 0 && quota.MaxTPM == 0 && quota.MaxRPD == 0 {
decision.Allowed = true
decision.Code = DecisionUnlimited
decision.Reason = "all limits are unlimited"
decision.Stats = rl.snapshotLocked(model)
return decision
}
rl.prune(model)
stats, ok := rl.State[model]
if !ok || stats == nil {
stats = &UsageStats{DayStart: now}
rl.State[model] = stats
}
decision.Stats = rl.snapshotLocked(model)
if quota.MaxRPD > 0 && stats.DayCount >= quota.MaxRPD {
decision.Code = DecisionRPDLimit
decision.Reason = "daily request limit reached"
decision.RetryAfter = nonNegativeDuration(stats.DayStart.Add(24 * time.Hour).Sub(now))
return decision
}
if quota.MaxRPM > 0 && len(stats.Requests) >= quota.MaxRPM {
decision.Code = DecisionRPMLimit
decision.Reason = "per-minute request limit reached"
if len(stats.Requests) > 0 {
decision.RetryAfter = nonNegativeDuration(stats.Requests[0].Add(time.Minute).Sub(now))
}
return decision
}
if quota.MaxTPM > 0 {
currentTokens := totalTokenCount(stats.Tokens)
if estimatedTokens > quota.MaxTPM || currentTokens > quota.MaxTPM-estimatedTokens {
decision.Code = DecisionTPMLimit
decision.Reason = "per-minute token limit reached"
decision.RetryAfter = retryAfterForTokens(now, stats.Tokens, quota.MaxTPM, estimatedTokens)
return decision
}
}
decision.Allowed = true
decision.Code = DecisionAllowed
decision.Reason = "within quota"
return decision
}
// snapshotLocked builds ModelStats for the provided model.
// Caller must hold rl.mu.
func (rl *RateLimiter) snapshotLocked(model string) ModelStats {
stats := ModelStats{}
if q, ok := rl.Quotas[model]; ok {
stats.MaxRPM = q.MaxRPM
stats.MaxTPM = q.MaxTPM
stats.MaxRPD = q.MaxRPD
}
if s, ok := rl.State[model]; ok && s != nil {
stats.RPM = len(s.Requests)
stats.RPD = s.DayCount
stats.DayStart = s.DayStart
stats.TPM = totalTokenCount(s.Tokens)
}
return stats
}
// NewWithSQLite creates a SQLite-backed RateLimiter with Gemini defaults.
// The database is created at dbPath if it does not exist. Use Close() to
// release the database connection when finished.
@ -846,6 +906,37 @@ func safeTokenTotal(tokens []TokenEntry) int {
return total
}
func retryAfterForTokens(now time.Time, tokens []TokenEntry, maxTPM, estimatedTokens int) time.Duration {
if maxTPM <= 0 {
return 0
}
deficit := totalTokenCount(tokens) + estimatedTokens - maxTPM
if deficit <= 0 {
return 0
}
remaining := deficit
for _, entry := range tokens {
if entry.Count < 0 {
continue
}
remaining -= entry.Count
if remaining <= 0 {
return nonNegativeDuration(entry.Time.Add(time.Minute).Sub(now))
}
}
return 0
}
func nonNegativeDuration(value time.Duration) time.Duration {
if value < 0 {
return 0
}
return value
}
func countTokensURL(baseURL, model string) (string, error) {
if core.Trim(model) == "" {
return "", core.E("ratelimit.countTokensURL", "empty model", nil)

View file

@ -253,6 +253,125 @@ func TestRatelimit_CanSend_Good(t *testing.T) {
})
}
// --- Phase 0: Decide surface area ---
func TestRatelimit_Decide_Good(t *testing.T) {
t.Run("unknown model remains allowed with unknown code", func(t *testing.T) {
rl := newTestLimiter(t)
decision := rl.Decide("unknown-model", 50)
assert.True(t, decision.Allowed)
assert.Equal(t, DecisionUnknownModel, decision.Code)
assert.Zero(t, decision.RetryAfter)
})
t.Run("unlimited quota reports unlimited decision", func(t *testing.T) {
rl := newTestLimiter(t)
model := "unlimited"
rl.Quotas[model] = ModelQuota{}
decision := rl.Decide(model, 100)
assert.True(t, decision.Allowed)
assert.Equal(t, DecisionUnlimited, decision.Code)
assert.Equal(t, 0, decision.Stats.MaxRPM)
assert.Equal(t, 0, decision.Stats.MaxTPM)
assert.Equal(t, 0, decision.Stats.MaxRPD)
})
t.Run("rpd limit returns retry window", func(t *testing.T) {
rl := newTestLimiter(t)
model := "rpd-limit"
now := time.Now()
rl.Quotas[model] = ModelQuota{MaxRPM: 10, MaxTPM: 1000, MaxRPD: 2}
rl.State[model] = &UsageStats{DayStart: now.Add(-23 * time.Hour), DayCount: 2}
decision := rl.Decide(model, 10)
assert.False(t, decision.Allowed)
assert.Equal(t, DecisionRPDLimit, decision.Code)
assert.InDelta(t, time.Hour.Seconds(), decision.RetryAfter.Seconds(), 2)
assert.Equal(t, 2, decision.Stats.MaxRPD)
assert.Equal(t, 2, decision.Stats.RPD)
})
t.Run("rpm limit includes retry-after estimate", func(t *testing.T) {
rl := newTestLimiter(t)
model := "rpm-limit"
now := time.Now()
rl.Quotas[model] = ModelQuota{MaxRPM: 1, MaxTPM: 1000, MaxRPD: 5}
rl.State[model] = &UsageStats{
Requests: []time.Time{now.Add(-10 * time.Second)},
Tokens: []TokenEntry{{Time: now.Add(-10 * time.Second), Count: 10}},
DayStart: now,
DayCount: 1,
}
decision := rl.Decide(model, 5)
assert.False(t, decision.Allowed)
assert.Equal(t, DecisionRPMLimit, decision.Code)
assert.InDelta(t, 50, decision.RetryAfter.Seconds(), 1)
})
t.Run("tpm limit surfaces earliest expiry", func(t *testing.T) {
rl := newTestLimiter(t)
model := "tpm-limit"
now := time.Now()
rl.Quotas[model] = ModelQuota{MaxRPM: 10, MaxTPM: 100, MaxRPD: 10}
rl.State[model] = &UsageStats{
Requests: []time.Time{now.Add(-30 * time.Second)},
Tokens: []TokenEntry{
{Time: now.Add(-50 * time.Second), Count: 70},
{Time: now.Add(-10 * time.Second), Count: 20},
},
DayStart: now,
DayCount: 2,
}
decision := rl.Decide(model, 20)
assert.False(t, decision.Allowed)
assert.Equal(t, DecisionTPMLimit, decision.Code)
assert.InDelta(t, 10, decision.RetryAfter.Seconds(), 1)
})
t.Run("allowed decision carries stats snapshot", func(t *testing.T) {
rl := newTestLimiter(t)
model := "decide-allowed"
rl.Quotas[model] = ModelQuota{MaxRPM: 5, MaxTPM: 200, MaxRPD: 3}
now := time.Now()
rl.State[model] = &UsageStats{
Requests: []time.Time{now.Add(-5 * time.Second)},
Tokens: []TokenEntry{{Time: now.Add(-5 * time.Second), Count: 30}},
DayStart: now,
DayCount: 1,
}
decision := rl.Decide(model, 20)
assert.True(t, decision.Allowed)
assert.Equal(t, DecisionAllowed, decision.Code)
assert.Equal(t, 1, decision.Stats.RPM)
assert.Equal(t, 30, decision.Stats.TPM)
assert.Equal(t, 1, decision.Stats.RPD)
assert.Equal(t, 5, decision.Stats.MaxRPM)
assert.Equal(t, 200, decision.Stats.MaxTPM)
assert.Equal(t, 3, decision.Stats.MaxRPD)
})
t.Run("negative estimate returns invalid decision", func(t *testing.T) {
rl := newTestLimiter(t)
decision := rl.Decide("neg", -5)
assert.False(t, decision.Allowed)
assert.Equal(t, DecisionInvalidTokens, decision.Code)
assert.Zero(t, decision.RetryAfter)
})
}
// --- Phase 0: Sliding window / prune tests ---
func TestRatelimit_Prune_Good(t *testing.T) {

View file

@ -75,6 +75,16 @@
- `MaxRPD int`: configured requests-per-day limit.
- `DayStart time.Time`: start of the current rolling 24-hour window. This is zero if the model has no recorded state.
### `DecisionCode`
`type DecisionCode string`
`DecisionCode` enumerates machine-readable allow/deny codes returned by `Decide`. Defined values: `ok`, `unknown_model`, `unlimited`, `invalid_tokens`, `rpd_exceeded`, `rpm_exceeded`, and `tpm_exceeded`.
### `Decision`
`type Decision struct`
`Decision` bundles the outcome from `Decide`, including whether the request is allowed, a `DecisionCode`, a human-readable `Reason`, an optional `RetryAfter` duration when throttled, and a `ModelStats` snapshot at the time of evaluation.
## Functions
### `DefaultProfiles() map[Provider]ProviderProfile`
@ -104,11 +114,14 @@ Starts a background goroutine that prunes expired entries from every tracked mod
### `func (rl *RateLimiter) CanSend(model string, estimatedTokens int) bool`
Reports whether a request for `model` can be sent without violating the configured limits. Negative token estimates are rejected. Models with no configured quota are allowed. If all three limits for a known model are `0`, the model is treated as unlimited. Before evaluating the request, the limiter prunes entries older than one minute and resets the rolling daily counter when its 24-hour window has elapsed. The method then checks requests-per-day, requests-per-minute, and tokens-per-minute against the estimated token count.
### `func (rl *RateLimiter) Decide(model string, estimatedTokens int) Decision`
Returns a structured allow/deny decision for the estimated request. The result includes a `DecisionCode`, a human-readable `Reason`, optional `RetryAfter` guidance when throttled, and a `ModelStats` snapshot. It prunes expired state, initialises empty state for configured models, but does not record usage.
### `func (rl *RateLimiter) RecordUsage(model string, promptTokens, outputTokens int)`
Records a successful request for `model`. The limiter prunes stale entries first, creates state for the model if needed, appends the current timestamp to the request window, appends a token entry containing the combined prompt and output token count, and increments the rolling daily counter. Negative token values are ignored by the internal token summation logic rather than reducing the recorded total.
### `func (rl *RateLimiter) WaitForCapacity(ctx context.Context, model string, tokens int) error`
Blocks until `CanSend(model, tokens)` succeeds or `ctx` is cancelled. The method polls once per second. If `tokens` is negative, it returns an error immediately.
Blocks until `Decide(model, tokens)` allows the request or `ctx` is cancelled. The method uses the `RetryAfter` hint from `Decide` to sleep between checks, falling back to one-second polling when no hint is available. If `tokens` is negative, it returns an error immediately.
### `func (rl *RateLimiter) Reset(model string)`
Clears usage state without changing quotas. If `model` is empty, it drops all tracked state. Otherwise it removes state only for the named model.