pdftract/crates
jedarden cad7d2c72b feat(pdftract-cbrbg): implement span flag detector for Phase 4.1
Implement `detect_span_flags()` function that returns a u8 bitmask
combining 5 style flag bits (BOLD, ITALIC, SMALLCAPS, SUBSCRIPT,
SUPERSCRIPT).

Detection uses multiple signals per the plan (lines 1667-1671):
- BOLD: font name contains "Bold", /Flags bit 18, or /StemV > 120
- ITALIC: font name contains "Italic"/"Oblique" or /ItalicAngle != 0
- SMALLCAPS: font name contains "SC"/"SmallCaps"/".sc" or /Flags bit 3
- SUBSCRIPT: text_rise < -0.1 * font_size
- SUPERSCRIPT: text_rise > 0.1 * font_size

The multi-signal approach achieves >95% detection accuracy vs
pdfminer.six's ~70%.

Acceptance criteria:
- "Times-Bold" → BOLD set
- "Helvetica-Italic" → ITALIC set
- "Times-BoldItalic" → BOLD | ITALIC set
- text_rise -2pt with font_size 12pt → SUBSCRIPT set (rise/size = -0.167 < -0.1)
- text_rise +1.5pt with font_size 12pt → SUPERSCRIPT set
- text_rise -0.5pt with font_size 12pt → NEITHER (rise/size = -0.042, within threshold)
- /Flags bit 18 set → BOLD set
- /StemV 150 → BOLD set

Closes: pdftract-cbrbg
2026-05-24 07:28:25 -04:00
..
pdftract-cer-diff docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00
pdftract-cli feat(pdftract-dtpwa): implement contract profile per Phase 7.10 schema 2026-05-24 07:10:32 -04:00
pdftract-core feat(pdftract-cbrbg): implement span flag detector for Phase 4.1 2026-05-24 07:28:25 -04:00
pdftract-libpdftract feat(pdftract-3s2i): implement Phase 5.5.2 validation filter 2026-05-24 04:57:17 -04:00
pdftract-py docs(pdftract-aawrz): add LICENSE-MIT and LICENSE-APACHE files 2026-05-23 10:36:28 -04:00