Commit graph

3 commits

Author SHA1 Message Date
jedarden
db92403bd5 chore(pdftract-36glh): remove unused JpxDecoder import and add verification note
Some checks are pending
Schema Generation Validation / Validate JSON Schema (push) Waiting to run
Schema Generation Validation / Validate JSON Syntax (push) Waiting to run
- Remove unused jpx::JpxDecoder import from stream.rs (code uses fully qualified paths)
- Add notes/pdftract-36glh.md with acceptance criteria verification

The JPXDecode passthrough implementation was already complete in commit 4ba4687.
This change is minor cleanup only.

References: pdftract-36glh
2026-05-28 05:23:13 -04:00
jedarden
d14ec92fcb feat(pdftract-3zhf): add unified TableDetector::detect entry point
Add unified detect() method to TableDetector that combines both
line-based and borderless table detection pipelines. This completes
the coordinator bead for Phase 7.2: Table Detection and Structure
Reconstruction.

All child beads (7.2.1-7.2.6) are closed:
- 7.2.1: Line-based detection (path segment clustering)
- 7.2.2: Borderless detection (x0 alignment heuristic)
- 7.2.3: Span-to-cell assignment (centroid containment)
- 7.2.4: Header row detection (bold + StructTree TH)
- 7.2.5: Merged cell detection (missing interior edges)
- 7.2.6: Table JSON output schema integration

Critical tests pass:
- 5x3 bordered table (15 cells extracted)
- Merged header cell colspan=3
- Borderless 3-column table detection
- Two-page table continuation detection

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 00:51:59 -04:00
jedarden
9b5fbc9b5e feat(pdftract-bf-2y2rp): implement lazy stream decoding for PDF extraction
- Add decode_page_content_streams() function for per-page lazy decode
- Update extract_page_from_dict() to support lazy stream decoding
- Modify extract_pdf() and extract_pdf_ndjson() to enable lazy decoding
- Fix borrow checker issue in LazyPageIter::next()

This ensures content streams are decoded lazily per page and dropped
immediately after processing, keeping peak RSS flat across page count.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 12:30:26 -04:00