Implement a circuit breaker for the website collector to prevent hammering domains that are consistently failing.
The circuit breaker has three states: CLOSED, OPEN, and HALF-OPEN. It tracks failures per-domain and will open the circuit after a configurable number of consecutive failures. After a cooldown period, the circuit will transition to HALF-OPEN and allow a limited number of test requests to check for recovery.
The following command-line flags have been added to the `collect website` command:
- `--no-circuit-breaker`: Disable the circuit breaker
- `--circuit-failures`: Number of failures to trip the circuit breaker
- `--circuit-cooldown`: Cooldown time for the circuit breaker
- `--circuit-success-threshold`: Number of successes to close the circuit breaker
- `--circuit-half-open-requests`: Number of test requests in the half-open state
The implementation also includes:
- A new `circuitbreaker` package with the core logic
- Integration into the `website` package with per-domain tracking
- Improved logging to include the domain name and state changes
- Integration tests to verify the circuit breaker's behavior
Co-authored-by: Snider <631881+Snider@users.noreply.github.com>