Borg/examples
google-labs-jules[bot] 1d8ff02f5c feat: add robots.txt support to website collector
Adds support for parsing and respecting robots.txt during website collection.

This change introduces the following features:
- Fetches and parses /robots.txt before crawling a website.
- Respects `Disallow` patterns to avoid crawling restricted areas.
- Honors the `Crawl-delay` directive to prevent hammering sites.
- Adds command-line flags to configure the behavior:
  - `--ignore-robots`: Ignores robots.txt rules.
  - `--user-agent`: Sets a custom user-agent string.
  - `--min-delay`: Overrides the crawl-delay with a minimum value.

The implementation includes a new `robots` package for parsing robots.txt files and integrates it into the existing website downloader. Tests have been added to verify the new functionality.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
2026-02-02 00:42:20 +00:00
..
all feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_github_release feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_github_repo feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_github_repos feat: Add placeholder examples for all features 2025-11-13 19:38:23 +00:00
collect_pwa feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
collect_website feat: add robots.txt support to website collector 2026-02-02 00:42:20 +00:00
create_tim_programmatically feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
encrypt_media feat: Add dapp.fm native desktop player (Wails) 2026-01-06 18:42:30 +00:00
failures feat: SMSG v2 binary format with zstd compression + RFC-001 spec 2026-01-10 19:57:33 +00:00
formats feat: SMSG v2 binary format with zstd compression + RFC-001 spec 2026-01-10 19:57:33 +00:00
inspect_datanode feat: Add programmatic examples for matrix creation and execution 2025-11-13 19:27:12 +00:00
run_tim_programmatically feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
serve feat: Implement Go examples and refactor matrix execution 2025-11-14 11:12:15 +00:00
smsg-reply feat: Add STMF form encryption and SMSG secure message packages 2025-12-27 00:49:07 +00:00
collect_github_repo.sh feat: Add documentation and examples 2025-11-02 12:23:25 +00:00
collect_pwa.sh feat: Add documentation and examples 2025-11-02 12:23:25 +00:00
collect_website.sh feat: Add documentation and examples 2025-11-02 12:23:25 +00:00
compress_datanode.sh feat: Add optional compression to collect commands 2025-11-02 13:27:04 +00:00
create_tim.sh feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
demo-sample.smsg fix: mobile scrolling + clean up mkdemo hardcoded values 2026-01-12 15:35:13 +00:00
serve_tim.sh feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00