pdftract/crates
jedarden fb774af74e feat(pdftract-2r11u): implement TH-04 JavaScript detection
Add JavascriptActionJson schema field and detection logic for embedded
JavaScript in PDFs. Per TH-04 security requirement, JavaScript is
detected but NEVER executed. Presence is flagged via JAVASCRIPT_PRESENT
diagnostic and surfaced in metadata.javascript_actions[].

Schema changes:
- Add JavascriptActionJson struct with location and code_excerpt fields
- Add javascript_actions array to DocumentMetadata and ExtractionResult
- Update Output::new() to initialize empty javascript_actions array

JavaScript detection:
- Create javascript module with detect_javascript() function
- Scan /OpenAction, /AA, page /AA, and annotation /A entries
- Emit SecurityJavascriptPresent diagnostic at INFO level when JS found
- Return actions with truncated code excerpts (200 char max)

Integration:
- Call detect_javascript() in extract_pdf() after thread extraction
- Include javascript_actions in result_to_json() output

Tests:
- Create TH-04-js-presence.rs with 4 test cases
- Verify 3 JS actions detected, diagnostic emitted, JSON output correct
- Include negative test for PDFs without JavaScript
- Tests skip gracefully when fixture not yet created

Closes: pdftract-2r11u
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-25 14:04:29 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-3h9xo): implement threads JSON output + schema integration 2026-05-25 13:40:15 -04:00
pdftract-core feat(pdftract-2r11u): implement TH-04 JavaScript detection 2026-05-25 14:04:29 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py feat(pdftract-3h9xo): implement threads JSON output + schema integration 2026-05-25 13:40:15 -04:00