Implements Phase 7.7.1: /Threads array discovery + /I thread info
metadata extraction.
Changes:
- Add threads_ref field to Catalog struct and parse /Threads in catalog
- Create threads module with ThreadHeader struct
- Implement discover() function to extract thread metadata
- Handle PDFDocEncoding and UTF-16BE string decoding
- Empty strings return Some("") to distinguish from None
Acceptance criteria:
- Thread with no /I info dict -> title/author/subject/keywords null
- 3 threads with various info configurations
- Thread with no /Title (but /I present)
- Thread missing /F skipped with diagnostic
- UTF-16BE title decoding
Closes: pdftract-1c4j2
|
||
|---|---|---|
| .. | ||
| pdftract-cer-diff | ||
| pdftract-cli | ||
| pdftract-core | ||
| pdftract-libpdftract | ||
| pdftract-py | ||