pdftract/docs/notes
jedarden d14ec92fcb feat(pdftract-3zhf): add unified TableDetector::detect entry point
Add unified detect() method to TableDetector that combines both
line-based and borderless table detection pipelines. This completes
the coordinator bead for Phase 7.2: Table Detection and Structure
Reconstruction.

All child beads (7.2.1-7.2.6) are closed:
- 7.2.1: Line-based detection (path segment clustering)
- 7.2.2: Borderless detection (x0 alignment heuristic)
- 7.2.3: Span-to-cell assignment (centroid containment)
- 7.2.4: Header row detection (bold + StructTree TH)
- 7.2.5: Merged cell detection (missing interior edges)
- 7.2.6: Table JSON output schema integration

Critical tests pass:
- 5x3 bordered table (15 cells extracted)
- Merged header cell colspan=3
- Borderless 3-column table detection
- Two-page table continuation detection

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 00:51:59 -04:00
..
.gitkeep Initial repo scaffold with README and docs structure 2026-05-16 14:26:16 -04:00
ocr-language-packs.md feat(pdftract-3zhf): add unified TableDetector::detect entry point 2026-05-24 00:51:59 -04:00
pdftract-3c4i.md fix(pdftract-3c4i): export detect_merged_cells from table module 2026-05-24 00:23:14 -04:00
sdk-architecture.md Add SDK architecture notes covering top 10 languages 2026-05-16 14:51:25 -04:00
sdk-conformance-runner.md feat(pdftract-5omc): implement per-language conformance test runner pattern 2026-05-18 01:32:24 -04:00
sdk-contract.md docs(pdftract-147a): author SDK contract specification 2026-05-17 23:13:55 -04:00
sdk-invocation.md Add research docs and SDK invocation notes 2026-05-16 14:33:34 -04:00