Model Quotas

go-ratelimit ships with default quotas for Google Generative AI models and supports server-side token counting via the Google API.

Default Quotas

New() pre-configures quotas based on Tier 1 Google AI observations:

Model	RPM	TPM	RPD
`gemini-3-pro-preview`	150	1,000,000	1,000
`gemini-3-flash-preview`	150	1,000,000	1,000
`gemini-2.5-pro`	150	1,000,000	1,000
`gemini-2.0-flash`	150	1,000,000	Unlimited
`gemini-2.0-flash-lite`	Unlimited	Unlimited	Unlimited

A value of 0 means unlimited for that dimension.

Custom Quotas

Quotas are stored in the Quotas map and can be modified directly:

rl, _ := ratelimit.New()

// Add a custom model
rl.Quotas["my-fine-tuned-model"] = ratelimit.ModelQuota{
    MaxRPM: 60,
    MaxTPM: 500000,
    MaxRPD: 500,
}

// Remove a quota (model becomes unlimited)
delete(rl.Quotas, "gemini-2.0-flash-lite")

Unknown models (those not in the Quotas map) are always allowed through -- CanSend returns true for any model without a configured quota.

Token Counting

CountTokens calls the Google Generative AI countTokens endpoint to get an exact server-side token count for a prompt:

count, err := ratelimit.CountTokens(apiKey, "gemini-2.5-pro", promptText)
if err != nil {
    log.Printf("token count failed: %v", err)
    // Fall back to an estimate
    count = len(promptText) / 4
}

if rl.CanSend("gemini-2.5-pro", count) {
    // Safe to send
}

API Details

The function sends a POST request to:

https://generativelanguage.googleapis.com/v1beta/models/{model}:countTokens

Request body:

{
  "contents": [
    {
      "parts": [
        {"text": "your prompt here"}
      ]
    }
  ]
}

Authentication is via the x-goog-api-key header. The response contains a totalTokens integer.

Error Handling

CountTokens returns an error if:

The HTTP request fails (network error)
The API returns a non-200 status (invalid key, model not found, quota exceeded)
The response JSON cannot be decoded

In production code, always have a fallback estimate (e.g. len(text) / 4) when the API is unavailable.

Quota Structure

type ModelQuota struct {
    MaxRPM int `yaml:"max_rpm"` // Requests per minute
    MaxTPM int `yaml:"max_tpm"` // Tokens per minute
    MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
}

Quotas are persisted alongside usage state in ~/.core/ratelimits.yaml via the YAML tags.