Borg/cmd
google-labs-jules[bot] c7e3ba297f feat: PDF metadata extraction
This commit introduces a new feature to extract and index metadata from collected PDF files.

The following changes have been made:
- Added a new `pdf` command with a `metadata` subcommand to extract metadata from a single PDF file.
- Added a new `extract-metadata` command to extract metadata from all PDF files within a given archive and create an `INDEX.json` file.
- Added a `--extract-pdf-metadata` flag to the `collect website` command to extract metadata from downloaded PDF files.
- Created a new `pdf` package to encapsulate the PDF metadata extraction logic, which uses the `pdfinfo` command from the `poppler-utils` package.
- Added unit tests for the new `pdf` package, including mocking the `pdfinfo` command.
- Modified `Taskfile.yml` to install `poppler-utils` as a dependency.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>
2026-02-02 00:46:59 +00:00
..
dapp-fm feat: Add dapp.fm native desktop player (Wails) 2026-01-06 18:42:30 +00:00
dapp-fm-app feat: SMSG v2 binary format with zstd compression + RFC-001 spec 2026-01-10 19:57:33 +00:00
extract-demo feat: lazy loading profile page + v3 streaming polish 2026-01-12 17:48:32 +00:00
mkdemo fix: mobile scrolling + clean up mkdemo hardcoded values 2026-01-12 15:35:13 +00:00
mkdemo-abr feat: adaptive bitrate streaming (ABR) for HLS-style encrypted video 2026-01-13 15:40:15 +00:00
mkdemo-v3 feat: lazy loading profile page + v3 streaming polish 2026-01-12 17:48:32 +00:00
all.go feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
all_test.go feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
collect.go feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
collect_github.go feat: Improve test coverage and refactor for testability 2025-11-03 19:34:36 +00:00
collect_github_release_subcommand.go feat: Bug fixes and refactoring 2025-11-03 20:14:47 +00:00
collect_github_repo.go feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
collect_github_repo_test.go feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
collect_github_repos.go feat: Improve test coverage and refactor for testability 2025-11-03 18:25:04 +00:00
collect_pwa.go feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
collect_website.go feat: PDF metadata extraction 2026-02-02 00:46:59 +00:00
collect_website_test.go feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
compile.go feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
compile_test.go feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
console.go feat: Add Borg Console and release workflow 2025-12-27 02:32:31 +00:00
decode.go feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
decode_test.go feat: Add trix encryption and format 2025-11-14 13:47:27 +00:00
exec.go feat: Add compile and run commands for RUNC matrices 2025-11-13 19:16:12 +00:00
extract_metadata.go feat: PDF metadata extraction 2026-02-02 00:46:59 +00:00
inspect.go feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
pdf.go feat: PDF metadata extraction 2026-02-02 00:46:59 +00:00
pdf_metadata.go feat: PDF metadata extraction 2026-02-02 00:46:59 +00:00
root.go feat: Improve test coverage and refactor for testability 2025-11-03 18:25:04 +00:00
root_test.go feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00
run.go feat: Add ChaCha20-Poly1305 encryption and decryption for TIM files (.stim), enhance CLI for encryption format handling (stim), and include metadata inspection support 2025-12-26 01:25:03 +00:00
run_test.go Improve test coverage for datanode and tim packages, and fix cmd tests 2025-11-23 18:58:32 +00:00
serve.go feat: Add _Good, _Bad, and _Ugly tests 2025-11-14 10:36:35 +00:00