Add JavascriptActionJson schema field and detection logic for embedded
JavaScript in PDFs. Per TH-04 security requirement, JavaScript is
detected but NEVER executed. Presence is flagged via JAVASCRIPT_PRESENT
diagnostic and surfaced in metadata.javascript_actions[].
Schema changes:
- Add JavascriptActionJson struct with location and code_excerpt fields
- Add javascript_actions array to DocumentMetadata and ExtractionResult
- Update Output::new() to initialize empty javascript_actions array
JavaScript detection:
- Create javascript module with detect_javascript() function
- Scan /OpenAction, /AA, page /AA, and annotation /A entries
- Emit SecurityJavascriptPresent diagnostic at INFO level when JS found
- Return actions with truncated code excerpts (200 char max)
Integration:
- Call detect_javascript() in extract_pdf() after thread extraction
- Include javascript_actions in result_to_json() output
Tests:
- Create TH-04-js-presence.rs with 4 test cases
- Verify 3 JS actions detected, diagnostic emitted, JSON output correct
- Include negative test for PDFs without JavaScript
- Tests skip gracefully when fixture not yet created
Closes: pdftract-2r11u
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>