pdftract

History

jedarden 90d1b9a83d test(pdftract-4c8qu): add page_label tests and fix JSON schema - Add test_page_json_with_page_labels_roman_numerals: verifies page_label serialization with roman numeral values (i, ii, iii, etc) - Add test_page_json_without_page_labels_absent: verifies page_label is absent (null) when PDF has no /PageLabels - Add test_page_json_page_index_and_page_number_both_present: verifies both page_index and page_number are always present and page_number = page_index + 1 - Add test_page_json_roundtrip_with_all_fields: verifies full roundtrip serde preservation of all PageJson fields - Update docs/schema/v1.0/pdftract.schema.json PageResult definition: - Add page_number field (1-based, = page_index + 1) - Add page_label field (optional, from /PageLabels number tree) - Add width and height fields (page geometry in points) - Add rotation field (0, 90, 180, 270 degrees) - Add type field with enum: text, scanned, mixed, broken_vector, blank, figure_only - Update required fields to include all page-level fields Acceptance criteria: ✅ Page serializes with both page_index AND page_number ✅ PDF with /PageLabels [{S: "r"}] produces page_label "i", "ii", "iii" etc ✅ PDF without /PageLabels -> page_label absent ✅ JSON Schema enum for page_type includes all values ✅ Roundtrip serde Page test passes Closes: pdftract-4c8qu		2026-05-25 14:43:31 -04:00
..
adr	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance	feat(pdftract-5omc): implement SDK conformance test runner pattern	2026-05-18 01:22:23 -04:00
integrations	feat(pdftract-2u6q2): implement diagnostic infrastructure	2026-05-25 13:16:38 -04:00
notes	docs(pdftract-3wrx): add release signing strategy note	2026-05-24 11:12:56 -04:00
operations	docs(manual-release): add PB-13 fallback release runbook	2026-05-25 03:23:29 -04:00
plan	feat(pdftract-3zhf): add unified TableDetector::detect entry point	2026-05-24 00:51:59 -04:00
research	docs(pdftract-1tjn): finalize OpenType MATH and formula extraction research note v1.0	2026-05-24 10:41:39 -04:00
schema/v1.0	test(pdftract-4c8qu): add page_label tests and fix JSON schema	2026-05-25 14:43:31 -04:00
security	docs(pdftract-58kz): add security policy documentation	2026-05-20 19:39:24 -04:00
user-docs	fix: resolve compilation errors across codebase	2026-05-25 08:38:04 -04:00
research-index.md	Add parallel extraction research and comprehensive research index	2026-05-16 16:30:35 -04:00