pdftract/docs
jedarden 90d1b9a83d test(pdftract-4c8qu): add page_label tests and fix JSON schema
- Add test_page_json_with_page_labels_roman_numerals: verifies page_label
  serialization with roman numeral values (i, ii, iii, etc)
- Add test_page_json_without_page_labels_absent: verifies page_label is
  absent (null) when PDF has no /PageLabels
- Add test_page_json_page_index_and_page_number_both_present: verifies
  both page_index and page_number are always present and page_number = page_index + 1
- Add test_page_json_roundtrip_with_all_fields: verifies full roundtrip
  serde preservation of all PageJson fields

- Update docs/schema/v1.0/pdftract.schema.json PageResult definition:
  - Add page_number field (1-based, = page_index + 1)
  - Add page_label field (optional, from /PageLabels number tree)
  - Add width and height fields (page geometry in points)
  - Add rotation field (0, 90, 180, 270 degrees)
  - Add type field with enum: text, scanned, mixed, broken_vector, blank, figure_only
  - Update required fields to include all page-level fields

Acceptance criteria:
 Page serializes with both page_index AND page_number
 PDF with /PageLabels [{S: "r"}] produces page_label "i", "ii", "iii" etc
 PDF without /PageLabels -> page_label absent
 JSON Schema enum for page_type includes all values
 Roundtrip serde Page test passes

Closes: pdftract-4c8qu
2026-05-25 14:43:31 -04:00
..
adr feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction 2026-05-23 12:30:26 -04:00
conformance feat(pdftract-5omc): implement SDK conformance test runner pattern 2026-05-18 01:22:23 -04:00
integrations feat(pdftract-2u6q2): implement diagnostic infrastructure 2026-05-25 13:16:38 -04:00
notes docs(pdftract-3wrx): add release signing strategy note 2026-05-24 11:12:56 -04:00
operations docs(manual-release): add PB-13 fallback release runbook 2026-05-25 03:23:29 -04:00
plan feat(pdftract-3zhf): add unified TableDetector::detect entry point 2026-05-24 00:51:59 -04:00
research docs(pdftract-1tjn): finalize OpenType MATH and formula extraction research note v1.0 2026-05-24 10:41:39 -04:00
schema/v1.0 test(pdftract-4c8qu): add page_label tests and fix JSON schema 2026-05-25 14:43:31 -04:00
security docs(pdftract-58kz): add security policy documentation 2026-05-20 19:39:24 -04:00
user-docs fix: resolve compilation errors across codebase 2026-05-25 08:38:04 -04:00
research-index.md Add parallel extraction research and comprehensive research index 2026-05-16 16:30:35 -04:00