This commit introduces parallel collection capabilities to the `borg` CLI, significantly improving the performance of large-scale data collection.
Key features and changes include:
- **Parallel Downloads:** A `--parallel` flag has been added to the `collect github repos` and `collect website` commands, allowing users to specify the number of concurrent workers for downloading and processing.
- **Rate Limiting:** A `--rate-limit` flag has been added to the `collect website` command to control the maximum number of requests per second to a single domain, preventing the crawler from overwhelming servers.
- **Graceful Shutdown:** The worker pools now respect context cancellation, allowing for a graceful shutdown on interrupt (e.g., Ctrl+C). This improves the user experience for long-running collection tasks.
- **Refactored Downloaders:** The `github` and `website` downloaders have been refactored to use a robust worker pool pattern, with proper synchronization primitives to ensure thread safety.
Co-authored-by: Snider <631881+Snider@users.noreply.github.com>