Usage Tracking
go-ratelimit uses a sliding window algorithm to track API usage and provides blocking waits, stats snapshots, and YAML persistence.
Sliding Window
All rate checks operate on a 1-minute sliding window. Before every CanSend or Stats call, stale entries are pruned:
now = current time
window = now - 1 minute
requests[] = [ keep only entries where timestamp > window ]
tokens[] = [ keep only entries where timestamp > window ]
Daily counters (DayCount) reset when 24 hours have elapsed since DayStart.
Usage State
Per-model usage is tracked in UsageStats:
type UsageStats struct {
Requests []time.Time // Timestamps of requests in the sliding window
Tokens []TokenEntry // Token counts in the sliding window
DayStart time.Time // When the current day window started
DayCount int // Total requests today
}
type TokenEntry struct {
Time time.Time
Count int // prompt + output tokens combined
}
Checking Capacity
CanSend performs three checks in order:
- RPD -- Has the daily request count reached
MaxRPD? - RPM -- Are there
MaxRPMor more requests in the current minute? - TPM -- Would adding
estimatedTokensexceedMaxTPMfor the current minute?
if rl.CanSend("gemini-2.5-pro", 5000) {
// Safe to make the API call
}
If the model has no configured quota, CanSend returns true. If all quota dimensions are zero (unlimited), it also returns true.
Recording Usage
After a successful API call, record both prompt and output tokens:
rl.RecordUsage("gemini-2.5-pro", promptTokens, outputTokens)
This appends the current timestamp to the request log, records a TokenEntry with the combined token count, and increments the daily counter.
Blocking Waits
WaitForCapacity polls CanSend every second until capacity is available or the context is cancelled:
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
defer cancel()
if err := rl.WaitForCapacity(ctx, "gemini-2.5-pro", 5000); err != nil {
// Context cancelled or deadline exceeded
log.Printf("timed out waiting for capacity: %v", err)
return
}
// Capacity is available, proceed with the API call
This is useful in agent loops where you want to respect rate limits without dropping requests.
Stats Snapshots
Get a point-in-time view of usage for one or all models:
// Single model
stats := rl.Stats("gemini-2.5-pro")
fmt.Printf("RPM: %d/%d TPM: %d/%d RPD: %d/%d\n",
stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM, stats.RPD, stats.MaxRPD)
// All models
all := rl.AllStats()
for model, ms := range all {
fmt.Printf("%-30s RPM %3d/%-3d TPM %7d/%-7d RPD %4d/%-4d\n",
model, ms.RPM, ms.MaxRPM, ms.TPM, ms.MaxTPM, ms.RPD, ms.MaxRPD)
}
type ModelStats struct {
RPM int
MaxRPM int
TPM int
MaxTPM int
RPD int
MaxRPD int
DayStart time.Time
}
Persistence
State is persisted to ~/.core/ratelimits.yaml as YAML. Call Load() on startup and Persist() after recording usage:
rl, _ := ratelimit.New()
_ = rl.Load() // Restore previous state (no-op if file missing)
// ... use rl ...
_ = rl.Persist() // Save to disk
The file contains both quotas and current state, so custom quotas survive restarts.
Resetting
Clear usage counters without affecting quota configuration:
rl.Reset("gemini-2.5-pro") // Clear one model
rl.Reset("") // Clear all models
Thread Safety
All methods on RateLimiter are safe for concurrent use. A sync.RWMutex protects the shared state, with write locks for mutations (CanSend, RecordUsage, Reset) and read locks for persistence.
See also: Home | Model-Quotas