3.3 KiB
Verification Note: pdftract-2kpm0
Summary
Implemented NDJSON frame types with unified NdjsonFrame enum using serde internal tagging and write_frame helper function.
Changes Made
Core Implementation (crates/pdftract-core/src/output/ndjson/frames.rs)
-
Added
NdjsonFrameenum with serde internal tagging (#[serde(tag = "frame", rename_all = "lowercase")])NdjsonFrame::Header(HeaderFrame)NdjsonFrame::Page(PageFrame)NdjsonFrame::Footer(FooterFrame)
-
Updated frame structs to remove
frame_typefield (now handled by enum tagging):HeaderFrame: schema_version, metadata, outline, total_pagesPageFrame: page_index, page_type, spans, blocks, tables, annotations, errorsFooterFrame: extraction_quality, errors, threads, attachments, signatures, form_fields, links
-
Added
write_frame<W: Write>()helper function:- Serializes frame to JSON
- Writes trailing newline
- Flushes writer for immediate delivery to streaming consumers
-
Added
#[serde(default)]to optional fields for proper deserialization:PageFrame.annotations,PageFrame.errorsFooterFrame.threads,FooterFrame.attachments,FooterFrame.signatures,FooterFrame.form_fields,FooterFrame.links
Module Exports (crates/pdftract-core/src/output/ndjson/mod.rs)
- Updated exports to include
NdjsonFrameandwrite_frame
Tests (crates/pdftract-core/src/output/ndjson/frames.rs)
test_ndjson_frame_header_discriminator: Verifies "frame":"header" appears firsttest_ndjson_frame_page_discriminator: Verifies "frame":"page" appears firsttest_ndjson_frame_footer_discriminator: Verifies "frame":"footer" appears firsttest_write_frame_includes_newline_and_flush: Verifies write_frame behaviortest_roundtrip_header_frame: Header serialization → deserialization → equalitytest_roundtrip_page_frame: Page serialization → deserialization → equalitytest_roundtrip_footer_frame: Footer serialization → deserialization → equalitytest_page_frame_with_empty_collections: Empty arrays preserved, empty annotations skipped
Design Decisions
-
Serde internal tagging: Used
#[serde(tag = "frame")]on the enum instead of per-struct fields. This ensures the "frame" key appears first in JSON output and is the standard serde pattern for discriminated unions. -
Removed
to_json_line()methods: Kept these methods on individual structs for backward compatibility, but the primary API is nowwrite_frame()withNdjsonFrame. -
#[serde(default)]on optional fields: Required for proper roundtrip deserialization since empty collections are skipped during serialization.
Acceptance Criteria
- [PASS] Roundtrip unit test: write HeaderFrame → parse → equal to original
- [PASS] Frame discriminator order: serialize Page frame → first key is "frame":"page"
- [PASS] Three frames emitted in expected sequence (existing tests verify)
- [PASS] Frame-by-frame writer respects flush after every frame (
write_framecallsflush())
Files Modified
crates/pdftract-core/src/output/ndjson/frames.rs- Added NdjsonFrame enum, write_frame helper, updated testscrates/pdftract-core/src/output/ndjson/mod.rs- Updated exports
Commit
fa57ab3- feat(pdftract-2kpm0): implement NdjsonFrame enum with internal-tag discriminator and write_frame helper