Table of Contents
Model Quotas
go-ratelimit ships with default quotas for Google Generative AI models and supports server-side token counting via the Google API.
Default Quotas
New() pre-configures quotas based on Tier 1 Google AI observations:
| Model | RPM | TPM | RPD |
|---|---|---|---|
gemini-3-pro-preview |
150 | 1,000,000 | 1,000 |
gemini-3-flash-preview |
150 | 1,000,000 | 1,000 |
gemini-2.5-pro |
150 | 1,000,000 | 1,000 |
gemini-2.0-flash |
150 | 1,000,000 | Unlimited |
gemini-2.0-flash-lite |
Unlimited | Unlimited | Unlimited |
A value of 0 means unlimited for that dimension.
Custom Quotas
Quotas are stored in the Quotas map and can be modified directly:
rl, _ := ratelimit.New()
// Add a custom model
rl.Quotas["my-fine-tuned-model"] = ratelimit.ModelQuota{
MaxRPM: 60,
MaxTPM: 500000,
MaxRPD: 500,
}
// Remove a quota (model becomes unlimited)
delete(rl.Quotas, "gemini-2.0-flash-lite")
Unknown models (those not in the Quotas map) are always allowed through -- CanSend returns true for any model without a configured quota.
Token Counting
CountTokens calls the Google Generative AI countTokens endpoint to get an exact server-side token count for a prompt:
count, err := ratelimit.CountTokens(apiKey, "gemini-2.5-pro", promptText)
if err != nil {
log.Printf("token count failed: %v", err)
// Fall back to an estimate
count = len(promptText) / 4
}
if rl.CanSend("gemini-2.5-pro", count) {
// Safe to send
}
API Details
The function sends a POST request to:
https://generativelanguage.googleapis.com/v1beta/models/{model}:countTokens
Request body:
{
"contents": [
{
"parts": [
{"text": "your prompt here"}
]
}
]
}
Authentication is via the x-goog-api-key header. The response contains a totalTokens integer.
Error Handling
CountTokens returns an error if:
- The HTTP request fails (network error)
- The API returns a non-200 status (invalid key, model not found, quota exceeded)
- The response JSON cannot be decoded
In production code, always have a fallback estimate (e.g. len(text) / 4) when the API is unavailable.
Quota Structure
type ModelQuota struct {
MaxRPM int `yaml:"max_rpm"` // Requests per minute
MaxTPM int `yaml:"max_tpm"` // Tokens per minute
MaxRPD int `yaml:"max_rpd"` // Requests per day (0 = unlimited)
}
Quotas are persisted alongside usage state in ~/.core/ratelimits.yaml via the YAML tags.
See also: Home | Usage-Tracking