pdftract/tests
jedarden 2f010c51fb feat(pdftract-206o6): implement scientific_paper profile with fixtures and tests
Author profiles/builtin/scientific_paper.yaml per Phase 7.10 YAML schema:
- Match predicates: text_contains (Abstract, References, doi:, arXiv:, Bibliography)
- Structural predicates: has_math, heading_depth, page_count
- Extraction tuning: xy_cut reading order for 2-column layout
- Fields: title, authors, abstract, doi, journal, publication_date, references

Add 5 fixtures covering diverse scientific paper types:
- arXiv preprint (CC-BY license)
- PLOS ONE journal article
- IEEE-style 2-column paper
- Nature-style single-column with sidebar
- ACM/IEEE conference proceedings

Add comprehensive regression tests in test_scientific_paper.rs:
- Profile schema validation
- Fixture structure verification
- Expected output consistency checks
- Match predicate validation
- Fixture diversity verification
- xy_cut reading order verification
- DOI regex format validation

Co-Authored-By: Claude Code (claude-opus-4-7) <noreply@anthropic.com>
2026-05-27 20:19:10 -04:00
..
c-client feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance feat(pdftract-5omc): implement per-language conformance test runner pattern 2026-05-18 01:32:24 -04:00
error_recovery/fixtures test(pdftract-4w0v4): implement adversarial test corpus + integration harness 2026-05-25 14:30:24 -04:00
fixtures feat(pdftract-206o6): implement scientific_paper profile with fixtures and tests 2026-05-27 20:19:10 -04:00
lexer/fixtures test(pdftract-sy8x): implement lexer proptest harness and curated corpus 2026-05-24 02:36:37 -04:00
proptest test(pdftract-sy8x): implement lexer proptest harness and curated corpus 2026-05-24 02:36:37 -04:00
proptest-regressions docs(pdftract-49f8): establish Cargo.lock policy and documentation 2026-05-20 18:13:14 -04:00
python-conformance feat(pdftract-5omc): implement SDK conformance test runner pattern 2026-05-18 01:22:23 -04:00
sdk-conformance feat(pdftract-mcp): add MCP server implementation changes 2026-05-23 03:09:56 -04:00
xref/fixtures feat(pdftract-1s2uj): add xref test fixture corpus and integration test runner 2026-05-24 08:20:04 -04:00
conformance.c feat(pdftract-1eaxm): implement libpdftract C FFI library 2026-05-23 08:55:12 -04:00
conformance_fixed feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_fixed.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_run feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_test feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_test_simple feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance_test_simple.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
debug_parse.rs feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
debug_stream feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
debug_stream.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
doctor_runbook_coverage.rs docs(pdftract-653ah): add runbook integration for pdftract doctor 2026-05-24 13:26:31 -04:00
gen_lexer_golden.rs test(pdftract-sy8x): implement lexer proptest harness and curated corpus 2026-05-24 02:36:37 -04:00
proptest-panic-verification.rs feat(pdftract-33v): add property tests and nightly fuzz job 2026-05-20 19:18:03 -04:00
test_api_basic feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_basic.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_null feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_null.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_real feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_real.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_valid feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_api_valid.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_atomic_writer.rs feat(pdftract-68wfa): implement AtomicFileWriter for atomic file writes 2026-05-24 13:02:37 -04:00
test_debug feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_debug.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_parse_fixture.rs feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_simple.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_simple_run feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_stream feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_stream.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_valid.c feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
test_valid_run feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00