5 Home
Virgil edited this page 2026-02-19 16:54:29 +00:00

go-ratelimit

forge.lthn.ai/core/go-ratelimit -- Token counting and rate limiting for LLM API calls.

Manages per-model rate limits with sliding-window tracking for requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD). Includes Google Generative AI token counting and YAML-based state persistence.

Installation

go get forge.lthn.ai/core/go-ratelimit@latest

Dependencies: gopkg.in/yaml.v3

Core Types

// ModelQuota defines the rate limits for a specific model.
type ModelQuota struct {
    MaxRPM int  // Requests per minute (0 = unlimited)
    MaxTPM int  // Tokens per minute (0 = unlimited)
    MaxRPD int  // Requests per day (0 = unlimited)
}

// RateLimiter manages rate limits across multiple models.
type RateLimiter struct {
    Quotas map[string]ModelQuota   // Model name -> quota
    State  map[string]*UsageStats  // Model name -> current usage
}

Quick Start

package main

import (
    "context"
    "fmt"
    "log"

    "forge.lthn.ai/core/go-ratelimit"
)

func main() {
    rl, err := ratelimit.New()
    if err != nil {
        log.Fatal(err)
    }
    _ = rl.Load() // Load persisted state

    model := "gemini-2.5-pro"

    // Check before sending
    if rl.CanSend(model, 5000) {
        // ... make API call ...
        rl.RecordUsage(model, 3000, 2000)
        _ = rl.Persist()
    }

    // Or block until capacity is available
    ctx := context.Background()
    if err := rl.WaitForCapacity(ctx, model, 5000); err != nil {
        log.Fatal(err)
    }

    // View current stats
    stats := rl.Stats(model)
    fmt.Printf("RPM: %d/%d  TPM: %d/%d  RPD: %d/%d\n",
        stats.RPM, stats.MaxRPM, stats.TPM, stats.MaxTPM, stats.RPD, stats.MaxRPD)
}

API Summary

Function / Method Description
New() Create a limiter with default Gemini quotas
Load() Read persisted state from ~/.core/ratelimits.yaml
Persist() Write current state to disk
CanSend(model, tokens) Check if a request would violate limits
RecordUsage(model, prompt, output) Record a completed API call
WaitForCapacity(ctx, model, tokens) Block until capacity is available
Stats(model) Get current usage snapshot for one model
AllStats() Get usage snapshots for all tracked models
Reset(model) Clear stats for one model (or all if empty)
CountTokens(apiKey, model, text) Count tokens via Google Generative AI API

Pages

  • Model-Quotas -- Token counting and per-model quota configuration
  • Usage-Tracking -- Sliding window algorithm, persistence, and blocking waits

Licence

EUPL-1.2