- Add worked example to Glyph struct showing all 11 fields
- Add worked example to Span struct showing all 10 fields
- Examples use rust,no_run for internal dependencies
- cargo doc passes with docs.rs feature set
- Verification note added at notes/pdftract-3eohy.md
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Update setupTooltips to display data-bbox, data-block-ref, data-mcid, and data-reading-idx
- These attributes are already emitted by spans.rs but weren't being shown in tooltip
- Tooltip now shows complete span information on hover
References pdftract-3mdb7 acceptance criteria:
- Tooltip shows the data-* attrs as formatted rows
Bead-Id: pdftract-145s8
Add image_coverage_fraction signal evaluator that computes the union
image coverage fraction from individual image XObject areas.
- Computes total image coverage as sum of image_xobject_areas
- Divides by page area (width * height) to get coverage fraction
- Clamps to [0.0, 1.0] to handle overlapping images (defensive)
- Returns Some(Vote::scanned(0.85)) if fraction > 0.85
Implementation uses sum for simplicity (overestimates coverage when
images overlap), which is acceptable for the 0.85 threshold as it's
a conservative signal. Can be revisited with Klee's algorithm for
greater accuracy if needed.
Acceptance criteria PASS:
✓ Page with one image covering 90% area → Some(Vote { 0.85, Scanned })
✓ Page with multiple small images totaling 50% → None (below threshold)
✓ Page with no images → None
✓ Coverage clamped to 1.0 on overlapping images
Also includes pre-existing infrastructure:
- tr3_op_count field in PageContext
- image_xobject_areas field in PageContext
- all_tr3_with_full_page_image function
- CharDensityRatioSignal evaluator
These were necessary dependencies for the new evaluator to function.
Refs: Plan section Phase 5.1.2, coordinator pdftract-22p
- Fix broken links from ../integrations/mcp-clients.md to ../cli/mcp.md
- Update link text from 'MCP Client Configuration Guide' to 'MCP Server Documentation'
- Ensures all cross-references work in mdBook build
- Fixed missing fields in BlockJson, SpanJson, ExtractionOptions initializations
- Added feature gates to ocr_integration tests for conditional compilation
- Fixed McpServerState::new calls to include audit writer argument
- Fixed CCITTFaxDecoder::decode calls to use instance method
- Fixed type casts for ObjRef::new calls
- Fixed serde_json::Value method calls (is_some -> !is_null)
- Fixed ProfileType test feature gates
- Worked around lifetime issues in schema roundtrip tests
These changes fix numerous compilation errors that were blocking the
codebase from building. The main library and tests now compile successfully.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Created comprehensive json-schema-reference.md with:
- Top-level structure documentation
- Document metadata, page result, span, block fields
- Table structure (row/cell) with examples
- Form fields and signatures (Phase 7 placeholders)
- Receipts and coordinate system docs
- Cross-references to plan sections (INV-11, Phase 6.1, etc.)
- Added to mdBook SUMMARY.md as top-level reference page
- All examples use real JSON from the schema
- Builds successfully (46KB HTML output)
Acceptance criteria:
- PASS: docs/user-docs/src/json-schema-reference.md exists
- PASS: Covers all top-level types and enums (Document, Page, Span, Block, Table, FormField, Signature, Receipt)
- PASS: Examples for each major type
- PASS: mdBook renders cleanly (verified)
- PASS: Cross-references to plan sections included
Closes: pdftract-5boam
- Created docs/operations/manual-platform-smoke.md with comprehensive
smoke test runbook for KU-12 quarterly manual platform testing
- Added troubleshooting table covering all 14 doctor checks
- Cross-referenced runbook from installation.md and quickstart.md
- Added CI gate test (doctor_runbook_coverage.rs) to verify
troubleshooting table completeness
Acceptance criteria:
✓ Step 1: pdftract doctor as first section in runbook
✓ Troubleshooting table covers all FAIL-capable checks
✓ installation.md mentions pdftract doctor with runbook link
✓ quickstart.md uses pdftract doctor as first example command
✓ CI gate parses runbook and asserts all checks are present
✓ mdBook build succeeds
✓ No broken internal links
Closes: pdftract-653ah
- book.toml with title, authors, build directory, edit-url-template
- src/SUMMARY.md with complete TOC for all planned sections
- src/introduction.md: what pdftract does and doesn't do (Non-Goals)
- src/installation.md: cargo, pip, Homebrew, Docker; KU-12 caveat verbatim
- src/quickstart.md: five-minute walkthrough with executable commands
- 39 draft placeholder files for CLI reference, schema, profiles, SDKs, advanced topics, troubleshooting, FAQ
mdbook build completes cleanly with zero warnings (linkcheck optional).
See notes/pdftract-1g87.md for verification details.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>