- Fixed missing fields in BlockJson, SpanJson, ExtractionOptions initializations
- Added feature gates to ocr_integration tests for conditional compilation
- Fixed McpServerState::new calls to include audit writer argument
- Fixed CCITTFaxDecoder::decode calls to use instance method
- Fixed type casts for ObjRef::new calls
- Fixed serde_json::Value method calls (is_some -> !is_null)
- Fixed ProfileType test feature gates
- Worked around lifetime issues in schema roundtrip tests
These changes fix numerous compilation errors that were blocking the
codebase from building. The main library and tests now compile successfully.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add classifier corpus test harness for 200-document labeled corpus:
- Move test from tests/ to crates/pdftract-core/tests/classifier_corpus.rs
- Implement classify_document() using pdftract_core::profiles
- Add robust path resolution for workspace and crate test directories
- Fix PdfObject number extraction in threads module (compilation error)
Corpus infrastructure is complete but PDF generation needs fix:
- Generated PDFs have non-standard trailer structure
- ReportLab embeds comment inside trailer dictionary
- Causes pdftract parser to fail with "/Root is not a dictionary"
- Test harness ready to run once PDFs are regenerated
Closes: pdftract-4exg (partial - infrastructure complete, PDF generation blocked)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>