Commit graph

4 commits

Author SHA1 Message Date
jedarden
39ca6a3552 feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator
Add image_coverage_fraction signal evaluator that computes the union
image coverage fraction from individual image XObject areas.

- Computes total image coverage as sum of image_xobject_areas
- Divides by page area (width * height) to get coverage fraction
- Clamps to [0.0, 1.0] to handle overlapping images (defensive)
- Returns Some(Vote::scanned(0.85)) if fraction > 0.85

Implementation uses sum for simplicity (overestimates coverage when
images overlap), which is acceptable for the 0.85 threshold as it's
a conservative signal. Can be revisited with Klee's algorithm for
greater accuracy if needed.

Acceptance criteria PASS:
✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned })
✓ Page with multiple small images totaling 50% → None (below threshold)
✓ Page with no images → None
✓ Coverage clamped to 1.0 on overlapping images

Also includes pre-existing infrastructure:
- tr3_op_count field in PageContext
- image_xobject_areas field in PageContext
- all_tr3_with_full_page_image function
- CharDensityRatioSignal evaluator

These were necessary dependencies for the new evaluator to function.

Refs: Plan section Phase 5.1.2, coordinator pdftract-22p
2026-05-31 23:42:38 -04:00
jedarden
1ff8c2fcdc docs(pdftract-145s8): fix broken MCP cross-references in Python SDK docs
- Fix broken links from ../integrations/mcp-clients.md to ../cli/mcp.md
- Update link text from 'MCP Client Configuration Guide' to 'MCP Server Documentation'
- Ensures all cross-references work in mdBook build
2026-05-31 23:34:41 -04:00
jedarden
b93bb53ac2 docs(pdftract-46tdo): add comprehensive troubleshooting guide with diagnostic code mappings
- Created troubleshooting.md mapping 22+ user-visible diagnostic codes
- Added symptom-to-diagnostic lookup table for quick navigation
- Each diagnostic code includes: what it means, cause, fix, severity
- Cross-references the Diagnostics Reference for full catalog
- Updated SUMMARY.md to include new troubleshooting guide
- Verified mdBook builds successfully

Acceptance criteria:
- Covers 15+ diagnostic codes (actual: 22+)
- Top-level TOC for navigation
- Cross-links to Diagnostic Code Catalog
- mdBook renders cleanly

Diagnostic codes covered:
XREF_REPAIRED, STREAM_BOMB, ENCRYPTION_UNSUPPORTED,
OCR_JBIG2_UNSUPPORTED, OCR_JPX_UNSUPPORTED, OCR_CCITT_UNSUPPORTED,
BROKENVECTOR_OCR_UNAVAILABLE, MCP_PATH_TRAVERSAL, PATH_OUTSIDE_ROOT,
URL_PRIVATE_NETWORK, CACHE_ENTRY_CORRUPT, CACHE_INTEGRITY_FAIL,
PROFILE_INVALID, PROFILE_SECRETS_FORBIDDEN, PAGE_OUT_OF_RANGE,
GLYPH_UNMAPPED, JAVASCRIPT_PRESENT, STRUCT_CIRCULAR_REF,
STRUCT_XOBJECT_CYCLE, GSTATE_STACK_OVERFLOW, REMOTE_FETCH_INTERRUPTED,
REMOTE_NO_RANGE_SUPPORT, TAGGED_PDF_STRUCT_TREE_DEFERRED
2026-05-31 23:24:42 -04:00
jedarden
a34f9c18d0 docs(pdftract-1g87): create mdBook scaffolding for user documentation
- book.toml with title, authors, build directory, edit-url-template
- src/SUMMARY.md with complete TOC for all planned sections
- src/introduction.md: what pdftract does and doesn't do (Non-Goals)
- src/installation.md: cargo, pip, Homebrew, Docker; KU-12 caveat verbatim
- src/quickstart.md: five-minute walkthrough with executable commands
- 39 draft placeholder files for CLI reference, schema, profiles, SDKs, advanced topics, troubleshooting, FAQ

mdbook build completes cleanly with zero warnings (linkcheck optional).

See notes/pdftract-1g87.md for verification details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:38:51 -04:00