3 Usage-Tracking
Virgil edited this page 2026-02-19 16:58:16 +00:00

Usage Tracking

go-ratelimit uses a sliding window algorithm to track API usage and provides blocking waits, stats snapshots, and YAML persistence.

Sliding Window

All rate checks operate on a 1-minute sliding window. Before every CanSend or Stats call, stale entries are pruned:

now = current time
window = now - 1 minute

requests[] = [ keep only entries where timestamp > window ]
tokens[]   = [ keep only entries where timestamp > window ]

Daily counters (DayCount) reset when 24 hours have elapsed since DayStart.

Usage State

Per-model usage is tracked in UsageStats:

type UsageStats struct {
    Requests []time.Time  // Timestamps of requests in the sliding window
    Tokens   []TokenEntry // Token counts in the sliding window
    DayStart time.Time    // When the current day window started
    DayCount int          // Total requests today
}

type TokenEntry struct {
    Time  time.Time
    Count int       // prompt + output tokens combined
}

Checking Capacity

CanSend performs three checks in order:

  1. RPD -- Has the daily request count reached MaxRPD?
  2. RPM -- Are there MaxRPM or more requests in the current minute?
  3. TPM -- Would adding estimatedTokens exceed MaxTPM for the current minute?
if rl.CanSend("gemini-2.5-pro", 5000) {
    // Safe to make the API call
}

If the model has no configured quota, CanSend returns true. If all quota dimensions are zero (unlimited), it also returns true.

Recording Usage

After a successful API call, record both prompt and output tokens:

rl.RecordUsage("gemini-2.5-pro", promptTokens, outputTokens)

This appends the current timestamp to the request log, records a TokenEntry with the combined token count, and increments the daily counter.

Blocking Waits

WaitForCapacity polls CanSend every second until capacity is available or the context is cancelled:

ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
defer cancel()

if err := rl.WaitForCapacity(ctx, "gemini-2.5-pro", 5000); err != nil {
    // Context cancelled or deadline exceeded
    log.Printf("timed out waiting for capacity: %v", err)
    return
}

// Capacity is available, proceed with the API call

This is useful in agent loops where you want to respect rate limits without dropping requests.

Stats Snapshots

Get a point-in-time view of usage for one or all models:

// Single model
stats := rl.Stats("gemini-2.5-pro")
fmt.Printf("RPM: %d/%d  TPM: %d/%d  RPD: %d/%d\n",
    stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM, stats.RPD, stats.MaxRPD)

// All models
all := rl.AllStats()
for model, ms := range all {
    fmt.Printf("%-30s  RPM %3d/%-3d  TPM %7d/%-7d  RPD %4d/%-4d\n",
        model, ms.RPM, ms.MaxRPM, ms.TPM, ms.MaxTPM, ms.RPD, ms.MaxRPD)
}
type ModelStats struct {
    RPM      int
    MaxRPM   int
    TPM      int
    MaxTPM   int
    RPD      int
    MaxRPD   int
    DayStart time.Time
}

Persistence

State is persisted to ~/.core/ratelimits.yaml as YAML. Call Load() on startup and Persist() after recording usage:

rl, _ := ratelimit.New()
_ = rl.Load()  // Restore previous state (no-op if file missing)

// ... use rl ...

_ = rl.Persist()  // Save to disk

The file contains both quotas and current state, so custom quotas survive restarts.

Resetting

Clear usage counters without affecting quota configuration:

rl.Reset("gemini-2.5-pro")  // Clear one model
rl.Reset("")                  // Clear all models

Thread Safety

All methods on RateLimiter are safe for concurrent use. A sync.RWMutex protects the shared state, with write locks for mutations (CanSend, RecordUsage, Reset) and read locks for persistence.

See also: Home | Model-Quotas