Borg/pkg/pdf at c7e3ba297f3a8d5d8facf8ecb802db34362e673d - Snider/Borg

History

google-labs-jules[bot] c7e3ba297f feat: PDF metadata extraction This commit introduces a new feature to extract and index metadata from collected PDF files. The following changes have been made: - Added a new `pdf` command with a `metadata` subcommand to extract metadata from a single PDF file. - Added a new `extract-metadata` command to extract metadata from all PDF files within a given archive and create an `INDEX.json` file. - Added a `--extract-pdf-metadata` flag to the `collect website` command to extract metadata from downloaded PDF files. - Created a new `pdf` package to encapsulate the PDF metadata extraction logic, which uses the `pdfinfo` command from the `poppler-utils` package. - Added unit tests for the new `pdf` package, including mocking the `pdfinfo` command. - Modified `Taskfile.yml` to install `poppler-utils` as a dependency. Co-authored-by: Snider <631881+Snider@users.noreply.github.com>	2026-02-02 00:46:59 +00:00
..
metadata.go	feat: PDF metadata extraction	2026-02-02 00:46:59 +00:00
metadata_test.go	feat: PDF metadata extraction	2026-02-02 00:46:59 +00:00

google-labs-jules[bot] c7e3ba297f feat: PDF metadata extraction

This commit introduces a new feature to extract and index metadata from collected PDF files.

The following changes have been made:
- Added a new `pdf` command with a `metadata` subcommand to extract metadata from a single PDF file.
- Added a new `extract-metadata` command to extract metadata from all PDF files within a given archive and create an `INDEX.json` file.
- Added a `--extract-pdf-metadata` flag to the `collect website` command to extract metadata from downloaded PDF files.
- Created a new `pdf` package to encapsulate the PDF metadata extraction logic, which uses the `pdfinfo` command from the `poppler-utils` package.
- Added unit tests for the new `pdf` package, including mocking the `pdfinfo` command.
- Modified `Taskfile.yml` to install `poppler-utils` as a dependency.

Co-authored-by: Snider <631881+Snider@users.noreply.github.com>

2026-02-02 00:46:59 +00:00

metadata.go

feat: PDF metadata extraction

2026-02-02 00:46:59 +00:00

metadata_test.go

feat: PDF metadata extraction

2026-02-02 00:46:59 +00:00