pdftract

History

jedarden bb7146cffe fix(pdftract-2uk9z): wrap native module results in typed Python objects The native PyO3 module returns raw dicts via pythonize, but the Python SDK API expects typed dataclass objects (Document, Page, Metadata, etc.) to be consistent with the subprocess fallback and test expectations. Updated wrapper functions in __init__.py to convert native results: - extract(): wraps dict in Document.from_dict() - extract_stream(): wraps yielded page dicts in Page.from_dict() - get_metadata(): wraps dict in Metadata() - hash(): wraps string in Fingerprint.from_string() - classify(): wraps dict in Classification() - search(): wraps yielded match dicts in Match The native PyO3 entry points (extract, extract_text, extract_stream) were already implemented with: - extract: uses extract_pdf + pythonize for PyDict conversion - extract_text: uses extract_text for plain String return - extract_stream: uses extract_pdf_streaming with custom StreamIterator All kwargs parsing with strict validation (unknown kwargs raise TypeError) was already in place. Acceptance criteria: - pdftract.extract() returns Document object with pages/metadata - pdftract.extract_text() returns plain text string - pdftract.extract_stream() yields Page objects - Unknown kwarg raises TypeError		2026-05-28 21:18:38 -04:00
..
c-client	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance	feat(pdftract-5omc): implement per-language conformance test runner pattern	2026-05-18 01:32:24 -04:00
document_model	fix(pdftract-2uk9z): wrap native module results in typed Python objects	2026-05-28 21:18:38 -04:00
error_recovery/fixtures	test(pdftract-4w0v4): implement adversarial test corpus + integration harness	2026-05-25 14:30:24 -04:00
fingerprint/fixtures	fix(pdftract-2uk9z): wrap native module results in typed Python objects	2026-05-28 21:18:38 -04:00
fixtures	fix(pdftract-2uk9z): wrap native module results in typed Python objects	2026-05-28 21:18:38 -04:00
lexer/fixtures	test(pdftract-sy8x): implement lexer proptest harness and curated corpus	2026-05-24 02:36:37 -04:00
proptest	feat(pdftract-91e1i): HTTP fetch sequence implementation	2026-05-28 13:17:00 -04:00
proptest-regressions	docs(pdftract-49f8): establish Cargo.lock policy and documentation	2026-05-20 18:13:14 -04:00
python-conformance	feat(pdftract-5omc): implement SDK conformance test runner pattern	2026-05-18 01:22:23 -04:00
sdk-conformance	feat(pdftract-mcp): add MCP server implementation changes	2026-05-23 03:09:56 -04:00
security	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
stream_decoder/fixtures	fix(pdftract-2uk9z): wrap native module results in typed Python objects	2026-05-28 21:18:38 -04:00
xref/fixtures	feat(pdftract-1s2uj): add xref test fixture corpus and integration test runner	2026-05-24 08:20:04 -04:00
conformance.c	feat(pdftract-1eaxm): implement libpdftract C FFI library	2026-05-23 08:55:12 -04:00
conformance_fixed	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance_fixed.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance_run	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance_test	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance_test_simple	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
conformance_test_simple.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
debug_content_streams.rs	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
debug_lzw.rs	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
debug_missing_mediabox.rs	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
debug_parse.rs	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
debug_stream	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
debug_stream.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
doctor_runbook_coverage.rs	docs(pdftract-653ah): add runbook integration for pdftract doctor	2026-05-24 13:26:31 -04:00
fingerprint_reproducibility.rs	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
gen_lexer_golden.rs	test(pdftract-sy8x): implement lexer proptest harness and curated corpus	2026-05-24 02:36:37 -04:00
log_secret_fuzz.rs	fix(pdftract-4pnmd): build.rs doc comment format string parsing	2026-05-28 14:36:45 -04:00
proptest-panic-verification.rs	feat(pdftract-33v): add property tests and nightly fuzz job	2026-05-20 19:18:03 -04:00
stream_decoder_fixtures.rs	feat(pdftract-91e1i): HTTP fetch sequence implementation	2026-05-28 13:17:00 -04:00
test_api_basic	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_basic.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_null	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_null.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_real	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_real.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_valid	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_api_valid.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_atomic_writer.rs	feat(pdftract-68wfa): implement AtomicFileWriter for atomic file writes	2026-05-24 13:02:37 -04:00
test_debug	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_debug.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_fingerprint_debug.rs	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
test_parse_fixture.rs	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_simple.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_simple_run	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_stream	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_stream.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_valid.c	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00
test_valid_run	feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction	2026-05-23 12:30:26 -04:00