This commit introduces a configurable rate-limiting system for all HTTP requests made by the application.
Key features include:
- A token bucket algorithm for rate limiting.
- Per-domain configuration via a YAML file (`--rate-config`).
- Wildcard domain matching (e.g., `*.archive.org`).
- Dynamic adjustments based on `429` responses and `Retry-After` headers.
- New CLI flags (`--rate-limit`, `--burst`) for on-the-fly configuration.
I began by creating a new `http` package to centralize the rate-limiting logic. I then integrated this package into the `website` and `github` collectors, ensuring that all outgoing HTTP requests are subject to the new rate-limiting rules.
Throughout the implementation, I added comprehensive unit and integration tests to validate the new functionality. This process also uncovered several pre-existing issues in the test suite, which I have now fixed. These fixes include:
- Correcting mock implementations for `http.Client` and `vcs.GitCloner`.
- Updating outdated function signatures in tests and examples.
- Resolving missing dependencies and syntax errors in test files.
- Stabilizing flaky tests.
Co-authored-by: Snider <631881+Snider@users.noreply.github.com>