Borg/examples
google-labs-jules[bot] e3efb59d98 feat: Add deduplication cache for collections
This commit introduces a deduplication cache to avoid re-downloading files across multiple collection jobs.

Key changes include:
- A new `pkg/cache` package that provides content-addressable storage using SHA256 hashes of the file content.
- Integration of the cache into the `collect website` command. Downloads are now skipped if the content already exists in the cache.
- The addition of `--no-cache` and `--cache-dir` flags to give users control over the caching behavior.
- New `borg cache stats` and `borg cache clear` commands to allow users to manage the cache.
- A performance improvement to the cache implementation, which now only writes the URL-to-hash index file once at the end of the collection process, rather than on every file download.
- Centralized logic for determining the default cache directory, removing code duplication.
- Improved error handling and refactored duplicated cache-checking logic in the website collector.
- Added comprehensive unit tests for the new cache package and an integration test to verify that the website collector correctly uses the cache.

The implementation of cache size limiting and LRU eviction is still pending and will be addressed in a future commit.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
2026-02-02 00:46:07 +00:00
..
all feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_github_release feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_github_repo feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_github_repos feat: Add placeholder examples for all features 2025-11-13 19:38:23 +00:00
collect_pwa feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_website feat: Add deduplication cache for collections 2026-02-02 00:46:07 +00:00
create_tim_programmatically feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
encrypt_media feat: Add dapp.fm native desktop player (Wails) 2026-01-06 18:42:30 +00:00
failures feat: SMSG v2 binary format with zstd compression + RFC-001 spec 2026-01-10 19:57:33 +00:00
formats feat: SMSG v2 binary format with zstd compression + RFC-001 spec 2026-01-10 19:57:33 +00:00
inspect_datanode feat: Add programmatic examples for matrix creation and execution 2025-11-13 19:27:12 +00:00
run_tim_programmatically feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
serve feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
smsg-reply feat: Add STMF form encryption and SMSG secure message packages 2025-12-27 00:49:07 +00:00
collect_github_repo.sh feat: Add documentation and examples 2025-11-02 12:23:25 +00:00
collect_pwa.sh feat: Add documentation and examples 2025-11-02 12:23:25 +00:00
collect_website.sh feat: Add documentation and examples 2025-11-02 12:23:25 +00:00
compress_datanode.sh feat: Add optional compression to collect commands 2025-11-02 13:27:04 +00:00
create_tim.sh feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
demo-sample.smsg fix: mobile scrolling + clean up mkdemo hardcoded values 2026-01-12 15:35:13 +00:00
serve_tim.sh feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00