Commit graph

7 commits

Author SHA1 Message Date
jedarden
39ca6a3552 feat(pdftract-2b7ff): implement image_coverage_fraction signal evaluator
Add image_coverage_fraction signal evaluator that computes the union
image coverage fraction from individual image XObject areas.

- Computes total image coverage as sum of image_xobject_areas
- Divides by page area (width * height) to get coverage fraction
- Clamps to [0.0, 1.0] to handle overlapping images (defensive)
- Returns Some(Vote::scanned(0.85)) if fraction > 0.85

Implementation uses sum for simplicity (overestimates coverage when
images overlap), which is acceptable for the 0.85 threshold as it's
a conservative signal. Can be revisited with Klee's algorithm for
greater accuracy if needed.

Acceptance criteria PASS:
✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned })
✓ Page with multiple small images totaling 50% → None (below threshold)
✓ Page with no images → None
✓ Coverage clamped to 1.0 on overlapping images

Also includes pre-existing infrastructure:
- tr3_op_count field in PageContext
- image_xobject_areas field in PageContext
- all_tr3_with_full_page_image function
- CharDensityRatioSignal evaluator

These were necessary dependencies for the new evaluator to function.

Refs: Plan section Phase 5.1.2, coordinator pdftract-22p
2026-05-31 23:42:38 -04:00
jedarden
1ff8c2fcdc docs(pdftract-145s8): fix broken MCP cross-references in Python SDK docs
- Fix broken links from ../integrations/mcp-clients.md to ../cli/mcp.md
- Update link text from 'MCP Client Configuration Guide' to 'MCP Server Documentation'
- Ensures all cross-references work in mdBook build
2026-05-31 23:34:41 -04:00
jedarden
b93bb53ac2 docs(pdftract-46tdo): add comprehensive troubleshooting guide with diagnostic code mappings
- Created troubleshooting.md mapping 22+ user-visible diagnostic codes
- Added symptom-to-diagnostic lookup table for quick navigation
- Each diagnostic code includes: what it means, cause, fix, severity
- Cross-references the Diagnostics Reference for full catalog
- Updated SUMMARY.md to include new troubleshooting guide
- Verified mdBook builds successfully

Acceptance criteria:
- Covers 15+ diagnostic codes (actual: 22+)
- Top-level TOC for navigation
- Cross-links to Diagnostic Code Catalog
- mdBook renders cleanly

Diagnostic codes covered:
XREF_REPAIRED, STREAM_BOMB, ENCRYPTION_UNSUPPORTED,
OCR_JBIG2_UNSUPPORTED, OCR_JPX_UNSUPPORTED, OCR_CCITT_UNSUPPORTED,
BROKENVECTOR_OCR_UNAVAILABLE, MCP_PATH_TRAVERSAL, PATH_OUTSIDE_ROOT,
URL_PRIVATE_NETWORK, CACHE_ENTRY_CORRUPT, CACHE_INTEGRITY_FAIL,
PROFILE_INVALID, PROFILE_SECRETS_FORBIDDEN, PAGE_OUT_OF_RANGE,
GLYPH_UNMAPPED, JAVASCRIPT_PRESENT, STRUCT_CIRCULAR_REF,
STRUCT_XOBJECT_CYCLE, GSTATE_STACK_OVERFLOW, REMOTE_FETCH_INTERRUPTED,
REMOTE_NO_RANGE_SUPPORT, TAGGED_PDF_STRUCT_TREE_DEFERRED
2026-05-31 23:24:42 -04:00
jedarden
4ec9ff7470 docs(pdftract-5boam): add JSON schema reference page
- Created comprehensive json-schema-reference.md with:
  - Top-level structure documentation
  - Document metadata, page result, span, block fields
  - Table structure (row/cell) with examples
  - Form fields and signatures (Phase 7 placeholders)
  - Receipts and coordinate system docs
  - Cross-references to plan sections (INV-11, Phase 6.1, etc.)
- Added to mdBook SUMMARY.md as top-level reference page
- All examples use real JSON from the schema
- Builds successfully (46KB HTML output)

Acceptance criteria:
- PASS: docs/user-docs/src/json-schema-reference.md exists
- PASS: Covers all top-level types and enums (Document, Page, Span, Block, Table, FormField, Signature, Receipt)
- PASS: Examples for each major type
- PASS: mdBook renders cleanly (verified)
- PASS: Cross-references to plan sections included

Closes: pdftract-5boam
2026-05-25 05:18:53 -04:00
jedarden
2ccdaecda1 docs(pdftract-5nare): add comprehensive FAQ with 24 questions
Added docs/user-docs/src/faq.md with 24 FAQ entries covering:
- General questions (what is pdftract, extract vs extract_text, JS execution)
- Installation and setup (proxy, system requirements)
- Usage (broken_vector, OCR speed, page ranges, images, batch processing)
- Configuration (custom profiles, OCR accuracy, confidence scores)
- Output formats (Markdown, tables, metadata, passwords)
- Troubleshooting (errors, empty output, debugging, memory usage)

Each answer is 1-3 paragraphs with cross-links to fuller docs.
mdBook builds successfully.

Acceptance criteria:
- PASS: docs/user-docs/src/faq.md exists
- PASS: 24 questions covered (target: 15-25)
- PASS: Each answer is 1-3 paragraphs
- PASS: Cross-links work
- PASS: mdBook renders cleanly

Closes: pdftract-5nare
2026-05-25 00:22:48 -04:00
jedarden
d9d21df157 docs(pdftract-653ah): add runbook integration for pdftract doctor
- Created docs/operations/manual-platform-smoke.md with comprehensive
  smoke test runbook for KU-12 quarterly manual platform testing
- Added troubleshooting table covering all 14 doctor checks
- Cross-referenced runbook from installation.md and quickstart.md
- Added CI gate test (doctor_runbook_coverage.rs) to verify
  troubleshooting table completeness

Acceptance criteria:
✓ Step 1: pdftract doctor as first section in runbook
✓ Troubleshooting table covers all FAIL-capable checks
✓ installation.md mentions pdftract doctor with runbook link
✓ quickstart.md uses pdftract doctor as first example command
✓ CI gate parses runbook and asserts all checks are present
✓ mdBook build succeeds
✓ No broken internal links

Closes: pdftract-653ah
2026-05-24 13:26:31 -04:00
jedarden
a34f9c18d0 docs(pdftract-1g87): create mdBook scaffolding for user documentation
- book.toml with title, authors, build directory, edit-url-template
- src/SUMMARY.md with complete TOC for all planned sections
- src/introduction.md: what pdftract does and doesn't do (Non-Goals)
- src/installation.md: cargo, pip, Homebrew, Docker; KU-12 caveat verbatim
- src/quickstart.md: five-minute walkthrough with executable commands
- 39 draft placeholder files for CLI reference, schema, profiles, SDKs, advanced topics, troubleshooting, FAQ

mdbook build completes cleanly with zero warnings (linkcheck optional).

See notes/pdftract-1g87.md for verification details.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:38:51 -04:00