Borg/pkg
google-labs-jules[bot] e3efb59d98 feat: Add deduplication cache for collections
This commit introduces a deduplication cache to avoid re-downloading files across multiple collection jobs.

Key changes include:
- A new `pkg/cache` package that provides content-addressable storage using SHA256 hashes of the file content.
- Integration of the cache into the `collect website` command. Downloads are now skipped if the content already exists in the cache.
- The addition of `--no-cache` and `--cache-dir` flags to give users control over the caching behavior.
- New `borg cache stats` and `borg cache clear` commands to allow users to manage the cache.
- A performance improvement to the cache implementation, which now only writes the URL-to-hash index file once at the end of the collection process, rather than on every file download.
- Centralized logic for determining the default cache directory, removing code duplication.
- Improved error handling and refactored duplicated cache-checking logic in the website collector.
- Added comprehensive unit tests for the new cache package and an integration test to verify that the website collector correctly uses the cache.

The implementation of cache size limiting and LRU eviction is still pending and will be addressed in a future commit.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
2026-02-02 00:46:07 +00:00
..
cache feat: Add deduplication cache for collections 2026-02-02 00:46:07 +00:00
compress feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
console feat: Add Borg Console and release workflow 2025-12-27 02:32:31 +00:00
datanode Improve test coverage for datanode and tim packages, and fix cmd tests 2025-11-23 18:58:32 +00:00
github feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
logger feat: Improve test coverage and refactor for testability 2025-11-03 18:25:04 +00:00
mocks feat: Improve test coverage and refactor for testability 2025-11-03 16:31:26 +00:00
player feat: v3 streaming with LTHN rolling keys and configurable cadence 2026-01-12 16:01:59 +00:00
pwa feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
smsg feat: adaptive bitrate streaming (ABR) for HLS-style encrypted video 2026-01-13 15:40:15 +00:00
stmf feat: Add STMF form encryption and SMSG secure message packages 2025-12-27 00:49:07 +00:00
tarfs feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
tim feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
trix feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
ui feat: Improve test coverage and refactor for testability 2025-11-03 18:25:04 +00:00
vcs feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
wasm/stmf feat: adaptive bitrate streaming (ABR) for HLS-style encrypted video 2026-01-13 15:40:15 +00:00
website feat: Add deduplication cache for collections 2026-02-02 00:46:07 +00:00