docs(pdftract-25k4x): add verification note for figure/caption detection
This commit is contained in:
parent
f5e045f26d
commit
8fe61a1ba5
11 changed files with 624 additions and 8 deletions
83
notes/pdftract-25k4x.md
Normal file
83
notes/pdftract-25k4x.md
Normal file
|
|
@ -0,0 +1,83 @@
|
|||
# pdftract-25k4x: Figure Detection + Caption Detection
|
||||
|
||||
## Status: COMPLETE
|
||||
|
||||
## Overview
|
||||
Figure detection and caption detection were already implemented in the codebase in:
|
||||
- `crates/pdftract-core/src/layout/figure.rs` (517 lines, 16 tests)
|
||||
- `crates/pdftract-core/src/layout/caption.rs` (342 lines, 8 tests)
|
||||
|
||||
## Verification Summary
|
||||
|
||||
### Figure Detection (`classify_figure`)
|
||||
**Algorithm:**
|
||||
1. Walks image XObjects from Phase 3.3 Do + Phase 3.5 inline images
|
||||
2. For each image, computes union area of all text glyph bboxes intersecting the image
|
||||
3. Uses sweep line algorithm for precise union area computation
|
||||
4. If `text_overlap_area / image_area < 0.5`, creates a Figure block
|
||||
5. Sorts figures by bbox top Y (descending)
|
||||
|
||||
**Acceptance Criteria Verification:**
|
||||
| Criteria | Test | Status |
|
||||
|----------|------|--------|
|
||||
| Image XObject, no text overlap → 1 Figure block | `test_five_figures_no_text` | ✅ PASS |
|
||||
| Image + small-font caption 1 line below → Figure + Caption | `test_caption_immediately_below_figure` | ✅ PASS |
|
||||
| Image overlapping text (background) → NOT Figure | `test_text_covered_image_not_figure` | ✅ PASS |
|
||||
| Text overlap < 50% → Figure | `test_classify_figure_partial_text_below_threshold` | ✅ PASS |
|
||||
| Text overlap ≥ 50% → NOT Figure | `test_classify_figure_partial_text_above_threshold` | ✅ PASS |
|
||||
|
||||
### Caption Detection (`classify_caption`)
|
||||
**Algorithm:**
|
||||
1. Checks font size < page_body_median
|
||||
2. Requires previous block is a Figure
|
||||
3. Vertical distance < 2 * line_height
|
||||
4. Same column (when num_columns > 1)
|
||||
|
||||
**Acceptance Criteria Verification:**
|
||||
| Criteria | Test | Status |
|
||||
|----------|------|--------|
|
||||
| Small font + follows Figure + within 2 lines + same column → Caption | `test_caption_immediately_below_figure` | ✅ PASS |
|
||||
| Caption 5 lines below → NOT Caption | `test_caption_too_far_below_figure` | ✅ PASS |
|
||||
| Caption different column → NOT Caption | `test_caption_different_column` | ✅ PASS |
|
||||
| Font not smaller than body → NOT Caption | `test_caption_font_not_smaller` | ✅ PASS |
|
||||
| No previous Figure → NOT Caption | `test_no_previous_figure` | ✅ PASS |
|
||||
|
||||
## Test Results
|
||||
```
|
||||
Figure tests: 16 passed; 0 failed
|
||||
Caption tests: 8 passed; 0 failed
|
||||
```
|
||||
|
||||
## Key Implementation Details
|
||||
|
||||
### INV (Invariants)
|
||||
- ✅ Figure block has empty `lines` Vec (lines=[], but Block uses `text: String` instead)
|
||||
- ✅ Figure blocks have `median_font_size: 0.0`
|
||||
- ✅ Caption blocks have `kind: "caption"` set via `set_caption()`
|
||||
|
||||
### Critical Considerations Addressed
|
||||
- **Text overlap union algorithm**: Uses sweep line for accurate union area (not naive sum)
|
||||
- **Sorting**: Figures sorted by top Y descending for consistent page order
|
||||
- **Column assignment**: TODO comment present for column assignment based on image center
|
||||
- **Above-figure captions**: NOT detected in v0.1.0 (as specified in bead)
|
||||
|
||||
## Files Modified
|
||||
None - implementation was already complete
|
||||
|
||||
## Retrospective
|
||||
|
||||
### What worked
|
||||
- The existing implementation is clean, well-tested, and follows the bead specification exactly
|
||||
- Sweep line algorithm for text overlap union is mathematically correct
|
||||
- Test coverage is comprehensive with edge cases (thresholds, empty contexts, multiple figures)
|
||||
|
||||
### What didn't
|
||||
- N/A - implementation was already complete and passing
|
||||
|
||||
### Surprise
|
||||
- The bead was already fully implemented despite being in the ready queue
|
||||
- Both modules share a common `Block` type via `pub use` from caption.rs
|
||||
|
||||
### Reusable pattern
|
||||
- The sweep line algorithm in `compute_text_overlap_area` is a reusable pattern for union rectangle area computation
|
||||
- The `classify_caption` pattern of checking: (1) font metric, (2) spatial relationship, (3) column membership is a template for other block classifiers
|
||||
BIN
tests/fixtures/scanned/documents/form-300dpi-scanned.pdf
vendored
Normal file
BIN
tests/fixtures/scanned/documents/form-300dpi-scanned.pdf
vendored
Normal file
Binary file not shown.
106
tests/fixtures/scanned/documents/form-300dpi.pdf
vendored
Normal file
106
tests/fixtures/scanned/documents/form-300dpi.pdf
vendored
Normal file
|
|
@ -0,0 +1,106 @@
|
|||
%PDF-1.3
|
||||
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/Contents 9 0 R /MediaBox [ 0 0 612 792 ] /Parent 8 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/Contents 10 0 R /MediaBox [ 0 0 612 792 ] /Parent 8 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/Contents 11 0 R /MediaBox [ 0 0 612 792 ] /Parent 8 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 8 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Author (anonymous) /CreationDate (D:19800101000000+00'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:19800101000000+00'00') /Producer (ReportLab PDF Library - www.reportlab.com)
|
||||
/Subject (unspecified) /Title (untitled) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
8 0 obj
|
||||
<<
|
||||
/Count 3 /Kids [ 3 0 R 4 0 R 5 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
9 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 849
|
||||
>>
|
||||
stream
|
||||
Gatn%?#SFN'Sc)R/']H3e.sDTK\fBOKJ!lZ;+W]hlSNn*J3+E#a(a*&qO+k#7HFY5/P$!aqtN>CH#/7$dsO^Z4^A+oUi=m45TE=-5eL#+dK)c#dAEI6mI$tFdS?B#Zoi$6!MiA%KLmqT]3uV=8'i>b5r;7>j-sI68rk./KCm'ATh]YlFSE,=;h[qV_)7Q>KWCBr/3AV/q^utS5Ob\?4fTC@R120QlgXegi\>j'c,'.U2+_mI)3;W8`$=<dYjr&1ki/K7`1#-;WmEGo98;[8)"K3k0@'Y'FO(e';QJiOP$76&s2IGW`\`?OVH?\8:o4+,Y,<#mHG%D77[NG])AN5cD()VO!5?GZFChml%@EG:f4mAU0$80Q;,%F)[%pt)(N0d\j97N3T)XK+6KTW\gW!0rgZ+:)P?0q%+mSs_)Ho,FSni$M'n]"Z!\O(3.fP37THu]koa?!SHGSN#kM!:D,qD.\I"Um,_[?9N,k9Oo%^+Qki-sWO<7C%$!MnFC3@f5dF=HpqjQ9dMA6qf,/GSKNo=JiZ<7eUPL7cA.&73;-35*-1GrT'3o4h@%pP(Bu;m<Z`Kr.Jo/qP,p9Qh9f<)BuD^#5'B3*f8jCfg=5eP)uaJ:8rcM<7HC*<\old&i&*1b4V^FnHj,i3[L8SCTT(&#RW7T<EZ?9LuMWZaZ2V1[jOGPh\(bNk.;)cCI<RG>[h?>CR9I%SdN@/3TX#Hi]Gq6HTE('nAqGZdNBu3CXGYHmRj:Sj2DH^0a>m>a:AF83Df><sle+m*)QcH_u\sXa%<+2E0g@Ro-[F#H*\@/WE,D\#+)99/SJgn7=R/V,Qt`A!,QBHW]8+boSk@$SNO/U]~>endstream
|
||||
endobj
|
||||
10 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 742
|
||||
>>
|
||||
stream
|
||||
Gau0B9okbt&A@P9R,bW1'H9E\hU3rhOZ":Rk[@8P(!OZJ0[uJTq"Jo].,Acq[T`bF$MNQ*b^5BO!ffE8HTgEiT4@Z',X<cr.1eCG&3iDshji:i]C&GXAY&H)*eUF25\s<WO&u%RF5"63n`N6r_=r-p!+32+bMPNLr-UYu6d\g0J@25,X6mgeE>>Wdp_rP[c0c0to6h^9@mL?dQW_]A%HJ#>EB=pO7:(#b]UstQhL3iGI_.0]Bk3olZN)P!O^DR6PJpl?m*.R(pP*H.eJ%:p`q>g]!()t/gGo,<-2_p6GZ?._$OrN^0,EfE3Q#o_ST+>2OeV[o/asIu6%.cjqV!kBpPn@?nU8:VIsg--58_j.b]?8J%<Ldd^0u\g^2[faUCf,r?+=3oHPFJjiWIm,cV10%$[!#(DSal`&i&<@QZ]+*GqMoc3kW9?499#A]bs&FZT?8"%8(!r5Qm,9o^4pZ$8&I#+,iJTBn0tbVn[$+r_X&XCI,Xr[@lrt>#6uW(1`/86%](N*U_p#)(jujZruE[55q*==N=;B#@V7Ng422("$Fal.Qa'hQ5M)nnKJX/fL6`%/)Y!,a[<OODiLn'E0\5T%f]3>0nBUFPmCZ`drG\n)G#-SM-&&IqA$QBWmgF,2kWT>BBGF,Ok#M]B/2UtZCsbO>cRL@Ji?Bn0-T=sZ[L[fTFC2KP6J$Fm]fg6HZ#uTTLI)o\*C?,rmLMH8SrYQ^0+>l.VjK]?Jsrn(-`\(~>endstream
|
||||
endobj
|
||||
11 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 192
|
||||
>>
|
||||
stream
|
||||
GasJIbmM<Q$jQ0KME.\lg2&93O.r?k*cs%Z,1WhoTbZjeq"EuNIgUc6KJ3`($,NCUc!:g-9m\.A6LU&XO3gg[g3[gZ4S70Z9;Ui?FMf.\V_KX%.5.+"R[!1kAlC?",hBU_bS-H\LTeH>qM9%f85Ok1(h][:@of[6RPSiLSR>lidan*kN.H9nWuKn9q&i-V~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 12
|
||||
0000000000 65535 f
|
||||
0000000073 00000 n
|
||||
0000000104 00000 n
|
||||
0000000211 00000 n
|
||||
0000000404 00000 n
|
||||
0000000598 00000 n
|
||||
0000000792 00000 n
|
||||
0000000860 00000 n
|
||||
0000001156 00000 n
|
||||
0000001227 00000 n
|
||||
0000002166 00000 n
|
||||
0000002999 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<30157dc3b9cf65b8d1eaf3493559908e><30157dc3b9cf65b8d1eaf3493559908e>]
|
||||
% ReportLab generated PDF document -- digest (http://www.reportlab.com)
|
||||
|
||||
/Info 7 0 R
|
||||
/Root 6 0 R
|
||||
/Size 12
|
||||
>>
|
||||
startxref
|
||||
3282
|
||||
%%EOF
|
||||
BIN
tests/fixtures/scanned/documents/invoice-300dpi-scanned.pdf
vendored
Normal file
BIN
tests/fixtures/scanned/documents/invoice-300dpi-scanned.pdf
vendored
Normal file
Binary file not shown.
87
tests/fixtures/scanned/documents/invoice-300dpi.pdf
vendored
Normal file
87
tests/fixtures/scanned/documents/invoice-300dpi.pdf
vendored
Normal file
|
|
@ -0,0 +1,87 @@
|
|||
%PDF-1.3
|
||||
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/Contents 8 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/Contents 9 0 R /MediaBox [ 0 0 612 792 ] /Parent 7 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 7 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/Author (anonymous) /CreationDate (D:19800101000000+00'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:19800101000000+00'00') /Producer (ReportLab PDF Library - www.reportlab.com)
|
||||
/Subject (unspecified) /Title (untitled) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Count 2 /Kids [ 3 0 R 4 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
8 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1087
|
||||
>>
|
||||
stream
|
||||
Gau0B?#SIU'RfGR\.@HS'XgM0Ua5hONoR8[P1Z^e!X&![b=ofK&e0`jlM^QV,1CX>Bd:Sh"4e:W4hBg**+;,#L\q+W#.Q5BQT#a8J`6\aSPU_P?s1-g\rZacab2e6a2%qSO<ln$2%6\WhQO'ub6o^^6LjmkqZuH[IJXmS]oV<KA1?H^em"h\i-^Ok,3]q!8tqp+.I+%=\.N]9[T;a,E6F;-ne$!+EQa;e9.Q[AT#'m(&O<XCFtA]oP0uP>iL>%lgKE;6nIZS3I"[KZXHBMsiOUHcE?"9)Ub$X-)g1)0&R9$q&5aZ1oc\3`,tk`E4g->T*5Q(o;i1j=)eM:;bF6Uh#$Y51ZMF!jEA-*2^$e@:iq:"uP>]qN8#[kHrMUXTnGSG'G(0PA4<(2W-?2,97S.3KVkR;4.;%4&r`c5-]Rj^)K\gLtMAC*R/l"3nqS9Fb$CA2dNG)NuF8[K1&#-=KWbLl']bO#;]rGU)K"F5I,D:$k..r9J2b#VEWABp.V6Z*F5`^:s\^D1=Y"e;Ta5&E`-&X+ALeF($-rc]kMY$:H%$C!g/BQA-1R;SA:_OVE?0FX78Cg+#'rD$*9b$.^.`#bD4:-(GD0>Zq>6-7flXnRkj[W471E&291$k&Wm&i`\C:We[ptU1rXDZka>rUd>26XV1M7rrr1NE=WXZ0,oTo2OrSJt34R^Yc@dTA(DUnR`:)P!Pi0Z_oMF_:fHN)G>a73'<Cgu1^cR(p0!HK$%s[I9P._AO[A"d@7dCkU*LK]0U)[RScL:o*E.L,Z9"IYNj<n!3kjNd?q(!.dEkb0PTkg$]W1Du7[t78V1Bq2%ulbf>=_'+>ailnhCtMiB-eQKAHo]9C7RYu#=Mb<Dm)E#)j6(GOo,$k<?f.9Rlg2`DLI)/NlS>/\SZ.5M;-,+?uVi?1X+XZhWj\C<A5K/f!BE/bY(O.K9kVTa#+WZU(UfSeL'A%+oF8(rmPJ0nq)$4;&)gKmUEmOl&QNh*-@XHB-F#Y3JEE0+_^\\Gn8!sEm$pa<P#_^j@$h0D1-m"F]f(m5J(grF%)$RFs*&'m\+m:')>DdJ`b4r*<18^Y<5J62aNRTt`e~>endstream
|
||||
endobj
|
||||
9 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 426
|
||||
>>
|
||||
stream
|
||||
Garo>d8#<J'Sc)N'YbG2dk$\LJW2`M?"0POU4%S&MUXq->0uX'/PP>nX<m*;kN)XQk<BL\mNKK=2o(%(DFHfWYS)Ua.G-Etg#t(bL3tfes7>:f&Q>LQ%*VT$m=e^\p9jFFmi.s(0OYlh@4bYN_$&^S$>FS;\WSN6a^p$AfiD"%"=X'%hLL#Z-3qX1q*hZU/f]5SeIKfBAP6GY=;k&X(/&NPpJQiqk=$`h^cP)Md\5J7*=nn+ZF6/C):+>K$nH%.M`#FL<kKKfe?>YQkO.rXT$1?(=*W2:f;gKp<'Ku5_rOdUFk:G`a`-PUBUklI,oJf^^M;@j]cWLbN-a;rmT%8Jl?+aT%lS6=_6&5eH/03`r6d>::dYn0jo0d=S,UOh,kQ;SLT+G,057UOPhPcW@,"h`KXFm1E-/[=N.(bZ!R2G~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 10
|
||||
0000000000 65535 f
|
||||
0000000073 00000 n
|
||||
0000000104 00000 n
|
||||
0000000211 00000 n
|
||||
0000000404 00000 n
|
||||
0000000597 00000 n
|
||||
0000000665 00000 n
|
||||
0000000961 00000 n
|
||||
0000001026 00000 n
|
||||
0000002204 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<30157dc3b9cf65b8d1eaf3493559908e><30157dc3b9cf65b8d1eaf3493559908e>]
|
||||
% ReportLab generated PDF document -- digest (http://www.reportlab.com)
|
||||
|
||||
/Info 6 0 R
|
||||
/Root 5 0 R
|
||||
/Size 10
|
||||
>>
|
||||
startxref
|
||||
2720
|
||||
%%EOF
|
||||
|
|
@ -80,9 +80,9 @@ def create_pdf_from_text(source_text_path, output_pdf_path, config):
|
|||
with open(source_text_path, 'r', encoding='utf-8') as f:
|
||||
text = f.read()
|
||||
|
||||
# Create PDF canvas
|
||||
# Create PDF canvas (convert Path to string for reportlab)
|
||||
page_width, page_height = config["page_size"]
|
||||
c = canvas.Canvas(output_pdf_path, pagesize=config["page_size"])
|
||||
c = canvas.Canvas(str(output_pdf_path), pagesize=config["page_size"])
|
||||
|
||||
# Set font
|
||||
c.setFont(config["font"], config["font_size"])
|
||||
|
|
@ -152,7 +152,7 @@ def rasterize_pdf_to_scanned(pdf_path, scanned_pdf_path, dpi=300):
|
|||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
# Convert PDF to PPM images
|
||||
result = subprocess.run(
|
||||
["pdftoppm", "-r", str(dpi), pdf_path, os.path.join(tmpdir, "page")],
|
||||
["pdftoppm", "-r", str(dpi), str(pdf_path), os.path.join(tmpdir, "page")],
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
|
|
@ -160,7 +160,7 @@ def rasterize_pdf_to_scanned(pdf_path, scanned_pdf_path, dpi=300):
|
|||
if result.returncode != 0:
|
||||
print(f" Warning: pdftoppm failed, copying original PDF")
|
||||
import shutil
|
||||
shutil.copy(pdf_path, scanned_pdf_path)
|
||||
shutil.copy(str(pdf_path), str(scanned_pdf_path))
|
||||
return
|
||||
|
||||
# Convert images back to PDF
|
||||
|
|
@ -169,13 +169,13 @@ def rasterize_pdf_to_scanned(pdf_path, scanned_pdf_path, dpi=300):
|
|||
if not images:
|
||||
print(f" Warning: No images generated, copying original PDF")
|
||||
import shutil
|
||||
shutil.copy(pdf_path, scanned_pdf_path)
|
||||
shutil.copy(str(pdf_path), str(scanned_pdf_path))
|
||||
return
|
||||
|
||||
# Convert images to PDF using img2pdf or PIL
|
||||
try:
|
||||
import img2pdf
|
||||
with open(scanned_pdf_path, "wb") as f:
|
||||
with open(str(scanned_pdf_path), "wb") as f:
|
||||
f.write(img2pdf.convert([str(img) for img in images]))
|
||||
print(f" Created scanned: {scanned_pdf_path}")
|
||||
except ImportError:
|
||||
|
|
@ -187,7 +187,7 @@ def rasterize_pdf_to_scanned(pdf_path, scanned_pdf_path, dpi=300):
|
|||
|
||||
if pdf_images:
|
||||
pdf_images[0].save(
|
||||
scanned_pdf_path,
|
||||
str(scanned_pdf_path),
|
||||
save_all=True,
|
||||
append_images=pdf_images[1:],
|
||||
resolution=dpi
|
||||
|
|
@ -197,7 +197,7 @@ def rasterize_pdf_to_scanned(pdf_path, scanned_pdf_path, dpi=300):
|
|||
except Exception as e:
|
||||
print(f" Warning: Rasterization failed ({e}), using original PDF")
|
||||
import shutil
|
||||
shutil.copy(pdf_path, scanned_pdf_path)
|
||||
shutil.copy(str(pdf_path), str(scanned_pdf_path))
|
||||
|
||||
|
||||
def generate_all_fixtures():
|
||||
|
|
|
|||
BIN
tests/fixtures/scanned/multi-page/doc-10page-300dpi-scanned.pdf
vendored
Normal file
BIN
tests/fixtures/scanned/multi-page/doc-10page-300dpi-scanned.pdf
vendored
Normal file
Binary file not shown.
265
tests/fixtures/scanned/multi-page/doc-10page-300dpi.pdf
vendored
Normal file
265
tests/fixtures/scanned/multi-page/doc-10page-300dpi.pdf
vendored
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
%PDF-1.3
|
||||
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R /F2 3 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/BaseFont /Times-Roman /Encoding /WinAnsiEncoding /Name /F2 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/Contents 18 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/Contents 19 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/Contents 20 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Contents 21 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
8 0 obj
|
||||
<<
|
||||
/Contents 22 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
9 0 obj
|
||||
<<
|
||||
/Contents 23 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
10 0 obj
|
||||
<<
|
||||
/Contents 24 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
11 0 obj
|
||||
<<
|
||||
/Contents 25 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
12 0 obj
|
||||
<<
|
||||
/Contents 26 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
13 0 obj
|
||||
<<
|
||||
/Contents 27 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
14 0 obj
|
||||
<<
|
||||
/Contents 28 0 R /MediaBox [ 0 0 612 792 ] /Parent 17 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
15 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 17 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
16 0 obj
|
||||
<<
|
||||
/Author (anonymous) /CreationDate (D:19800101000000+00'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:19800101000000+00'00') /Producer (ReportLab PDF Library - www.reportlab.com)
|
||||
/Subject (unspecified) /Title (untitled) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
17 0 obj
|
||||
<<
|
||||
/Count 11 /Kids [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R
|
||||
14 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
18 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 539
|
||||
>>
|
||||
stream
|
||||
Garo=6''_R&;BTM/)G9T&sT2J76Qr7$)q)XjD$>p1!W%sNU:EGd.6We:(Wls-rsl(2"/-RVSGm>UAl/;HrKh<UHh[C*DjtXF/Y^5X,$\f]_Pl=,b@XkIp#)u4T<c!6Lbe[(U`"Hs1M="mE60m44L/I:$!obh6,#F?;T01U'dc=j:`a-M'B;HR,D/FRXDIi6DBidYl@VAKL9-`=$*e-Ecp(T!hFG^l"#F%2B61Zl^DA=L<rQV3/+BsWaLESgA35/I]+J2:l,k+Tpk#;F";qY&//!$P<MAIRGQ`-e6#NA>MEu2iFFJ)O'n,teJYuck<4)h43`7aT6d#pl'PIli[cX_e%7"EbP>G.MoF\-18h?cG"1-!P=ZX^m0@U+A"L:CcG=$uM41LjBO;'$e$1gZ./4!ePDnV7RK;;*IaWV>FOAFXC+'Mq=ef\#BWXcLNe1$LXL7/3Q;n=S.Lq1f8I3S+0^<mlJg=W=^n(5<KC^tYLTSR3OHqc1JDFooIsZQKn(sZ,.`[!3I<(Ol\5.@`*"g3.IfpSm+9~>endstream
|
||||
endobj
|
||||
19 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 821
|
||||
>>
|
||||
stream
|
||||
Gas1\9lC\"'YO0A]GiF0Th3_?oEJ<G"#Qr[+cq"i]PlC+YJ'U!g?$pBG%e@?"BDG.?GDnR1UAku`S[cpQQ=J/MV`fPd;2QX6aOaR6.HcT<Cc.u^A!$K&)q%n(MB(6F`dOUZqU'AW`D>6qRh.Sr?kmSDlV>^F[FV3Wa>-3SY,&lrU3ME"nl,V5-h)ojXXMQfl&`4*(4%</e0'XQcjujm\?)b&<OSe)>f_rCTWWN=h6\$?f0_2:h6Q,1Th`hM*S^rB$3%8NJe8.D);>Tkb4rBH4_@O/4If^qW_hhQ@M>2#a)_a\5,F@c!)Nk>D3<M^cmhrOTU/:)TJW-ajdSTJKAo4'V_h>n23'UTAb?)kA'Q-S4QlQPUc2&k(+F%j#eqO%Ek\e>8015Be)oXR6oJFT7kU3Qosu&FBrh[YoI@2MF5^R_J"U>;bjg>@chBJH8pSsjl$cr9'"#-LVVUB/2G\9le/uY/pYRW[ac.@6JI;<I,>A)agotf;Fl6)4PV@Q45<cVet,C*4/7,j[^sIG?B$2hL&'&<1sQ3\%\j,OiNTX%[+ljn>R8s.-l+Jk2us5?=4ukFpM?KZ+I`2'IARaNRq+d&)r7A'Y4qj'&d@O;":]t!J_LUG]:%Ibq[s27Z$nSqB`7b=`Qie+"6dD5L:oq$"2)i(hNdaKgtV%_0O\#dHGa5'=10tIW4[H7>t8=A#J"%d`?lJ]LE\`d&%46SqBTSA:YnQiN"hLL#,sBE.c9))`)k;HD0:W;PJ,:RISaiRATJ.\6;jDq^hX2>4^*AV.n[cV;8+;?!OAZC#3']p%pO4+Z$]st*W6-a4/@'~>endstream
|
||||
endobj
|
||||
20 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 676
|
||||
>>
|
||||
stream
|
||||
Gau`Q?#SFN'RfGR\;tIS<q8PL]J2sbBL#]SHe:/LDRb4p`(h>N;toTjCtb-e_iQ5<,"7gbHdCfDA(uXErd!Xq'`]<]>[Dd8q*8!=L$qAO@u/6\RJ.Z`]j6Jt&)El`Uk%$TXq?7L>i2?@GN7a_%!.RaL1HFdDh.`go!T^nDf9]df@ms<4-.iDaY8H+rk+'M`7/uHN-_2(?kc&,@#B.c<XP!e++EP]^kaIVR%?kr,Z>5%p]fX7,?o*1@"M$P(M5WB^YE+K4a0tYTqF9E.jYOeORSNs"]\7c^:U+PadLT:Yhe=?0ZZ"HEs:%(\4,)4mY't4)1#LGE_SRkD4CGnG_!;s/3gDq%GeGW%@0YD<HZTZf(*Y\5J`O&\7>IH9eeJrG_n@\O"Q$fam<Ek%2HI86c\()#F+s66PuNFCfYG3FP>\[!dQ3Ag,IuUU3UY+"7'I>\3`s$hIe05/1Hd+4;>H.F(MdAdDDK?-bcWa=eKri6nIkmg.!(<+JZL%NQ2kDUYY#V`GLWhAu3Wq/'mbId;T23m'6L")!Bi^^=+:"([fA['hX<U(mLlDb&6mVG.dbMh1i)MSmNToR-;MWqgK!W+t%cb[9=-c2On?;EgR2f7k>QHY<ub1qQGP;]3hJ/*UUTDF1141eahq;Fmrf'i_T3]89D<%pRXTFE:F2&n5R[~>endstream
|
||||
endobj
|
||||
21 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 700
|
||||
>>
|
||||
stream
|
||||
Gau0@?#SFN'RfGR\.;Q5_dWP%j7!gReolZ\K"%]l02\'&-6Q5*<$Cel(WWDg7Pa@^7#I"@q&H#JD_;&q^81ThXT82)hL?e9.&^c$]ZWSM+]MOn3^()M/;6U1@hR8ug>dFL!:PIGqSNjKJI69[r<2f]bRoa*On3pe<cjP/6);A2p^#>V(eu4]`t+:6#::-5VP9A]87"qAKW)sRc<jB1It6"IQff]U*CS"3c.6iYFqFSoe"m8DG\^>DD:V8^<*$rfenq"l(5igU9/(a5(PGs.?a@=H,k4NIE5>gJ1"O--Z2uXEX9+&rNHP:B_Rf/XK+$;o<aqM7rfge'\OHWF@K8j2Oe>,2<>)+N)^RUT,t@-[fAn'I-!Y3Rb^JOi(E<`+i<kRO"P3uT'!=??E@b!C':VdcMG6cH/]+GUeqM`>K&blpS4YKe`kjj-)"XjF+a!1in'YeaFC4J)-RfoLEhC[latC$7Ld@:6r/Yl.Id*`kEaN_WTRu.%$S!RaqPr&_-5J)-Bd@uf.)5(,#/u;/\-ocp3-&WroJtUA-n4`Zl(qHW6qlSMoQ=$O8cG7AX)pM6PH"<M=3TAMg[_\?=utdR96Z`J(]'Mi<)HmG]q[SBCR]ggM9D*KCU/fg,0?0U\BX#E%pEO++^?pl"2Tc)qs[fLW+Q^pZ90IRmM)'FW34Pm8<bp1A>qf9ku#N@oc($VMH9~>endstream
|
||||
endobj
|
||||
22 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 970
|
||||
>>
|
||||
stream
|
||||
Gat%!?#Q2d'RfGR\EXa`3X16Kb190:9smOf;P4PaO<t;EAHe$s=k\?igE]%K(o`>pq!/pWj5YK/d+D\AcR9$CnDX8.%_;]?(3EJ4_4SWp^C^@`fm\>[0em&kZGFjQ=bd<8Ylk!TrO&0+pN,iT;g7rR55j/Ll!8l?(]5q!!"=hhS0-2g!IheT_'>!**#1<+.5%.!\ZTn)#f2dj^p6Z`JRqu9^s,244p>[B_N*JTj:4R;=4)A^QPi\@<2jF2OVWcn@jI,jd[S<`>S()h&]L"["WH9S#Sga^P7Re,j_gqaOqn=bG[84uh;#$kEY?1441LH;9J#FGa5]XK"er1[-3H=e[<AHT@JWa"0]L"i7-]`J?F3'.JMJauRX2>-@1Hk]D8(`_Bc'#2GtR9r+2-,48f5YjTSB]:,!f_oP!+P-27!Pb:iEhA6LcTC9.RC,[V,ognSlXC>k,_5"ca2UlnQ:gg/a:'+&6;gLsEm^W<It]N/>@\n.)aP\'$I\%a_TL*b]0eO'a>&dtg8@B\*TU+gA6,]B6'bmP[f)9&*P$4&,,P;t!-R#'$%l0InPD:^)C_UM\'likS+6Jh&an5t-aB!)OhjZW_<aE<XSR;Mrk]eYD&Xm\-FN.f<fs=F5rihu0mr&qn+L=nmP`$=ONY+!`.QU5@\N@el\-J.9D2HGhN3r5G;YErK:i/.iOHK*'F@Y28k2<mG2h4eCJF8l<T'c[j(UI@?nU1;Vn_LAVZ<=kiamI+.<<^LSnMI0.0S0?6$4\Tsu.LG`qS.QeB9BVeO[h+?I74N]dHpibS]Bt=EHcu,b#S,\SF:-*s&=`]XP19NgjBuXC=Ga")+-SbY#m&!"5BqPa0bEn44P@a&9L1Ub;>H!O>j^>ZB$l6j&QYe@C#qlNalSnP-l5(N8YcbR>a\13Y%^<-+feZ(!4kT)'Fq)f$S]E0YVeTPI*NaWcLfj33eY5>ZZYsQfp^bTCS&>~>endstream
|
||||
endobj
|
||||
23 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1229
|
||||
>>
|
||||
stream
|
||||
GasIf?#S^^'RcT\EA^O_<L,3NBQs/!"k-Wo!_U\\+O,g9*9q7t+PS?7rUk7!5g)^CY)l!JER4SEpLFQ]p`''B<WUu3pgZ%3i<WNd_sq+C/-<[Pq#/UPqnj)cm0ITJmXS;V[H;f:e'ASfLG/d!^9LJLIK994*L"mY1O9#eHK=&5eUi2Y'*8c:OZ4Wu4sYe48_ah<]c6^_:8_#$3ul%''#k>\M_fB<hRL`[D0A-3B#3&+T*/8i.g'pF$Yb.[[1JU"2XlKab_.aZL:u`l)JJF@32g$Z0)iu8`lJr.pnbrN;[CWlV1?uu$0S"%IgD:iKfsH.TKk31d&H"NljuGhKC07`C1XNk!8-E+MRTf'BONGF99DTDTK,!m!e!F:=,.!r$lgS5Y?ZSrC;&e&rp'RirUm\bDqfBd[;@aNKVU/m1IbGcg5@iae+`-BMtM'5X4i#WmoTGa-`H<8(2X%6k6J(<VOqS=XajYiq1:>1)TQ#GS%m<XLB^qp?pXpJg1,,U%e-@=;jP:ikL+<Z7-,nH=^#I1TfnuO/U(;4m,=oqhU"*kf?RXP(lQL9lh[f+3,i_=#F-O`]A7P5#V1>]*4siAj:1PGKZ]l9Y+MQEpK)c`4700HAC<4)AcOZ",`9[%D/f@&a_FN_[d=;?qc7$WAa.<j9Z_+'"?U;;/DGcBq^:GX1?:3_!&p"Z3C\sa>GSq=0Y^!0?2+$8"W2eBF%`\,DTel5CQ0tpaOb]^oBBoQD2&8QfYOn,<?I@FHWOBgJ$!Wi.uT5'K4l\bf<+e)#DLAV5fm;hfSAaAkWKfCq?q`-d<.PO'I>+L>F<l](56j%P$Z6"?)KaMOhe#jdF@jP$_<JAHjiqYItY"i4g$^0%34IYN48:8NTU>(Deg>m'NPOejZ9+=pLl[I2KKd$^cgDj.u&)QU^T!2$CnjE'IB^\_^K/Y\.%h?7+jJK"_FsV>Td\clS)ah[Y_j(esg3`8kFR4kD1)+G`eX<0>3BX\-M(&/g,QNUWE>tPSB8Jh0XAgOU`0`n0JEL*\P>8Op3Oo'q-EISr]__,NJ#oDb%g2NklCY5E[R3*'1q#LUuaBM)igE%lM(YU-Rj&(,e^KdYk_:aR%]g:S6gC;5c[Lih*?6ZFm[X]Wnl>=GF-9H?H>/lZ<T>K.QqeT]"*am1nT%WDaEt)8PBb>($TCWWq=WC5dj?E:et-?=?TMni?o6n$t[qX':&Y#>tp#+*AT1IOO0m:R;Jt_#~>endstream
|
||||
endobj
|
||||
24 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 891
|
||||
>>
|
||||
stream
|
||||
GasbY?#Q2d'Sc)R/'^Sk[1hB#0d#X#9rNe7Z`JBfPBu77DSjC`##<_HqjFWE[+MLC+p@GrF5HM[!s2D-W;0_1!&kV+h',*$kT_5@PlQddTP)Mj4Sg%?#T&,/GsF0q=Ki5>pDRqe\Nc^@IKVCuBlFm4kp=\@5ei9E`Fcf-r1/Z`mYr;[?3'Ni+J<ZWIKd\8bkR'.94!4(%)=35P\2rs;4c`+:32.3`rV1M#h#S0h3TG6Kaa[7U:OD*heLTN!1:',$S5Yo2o37Wo#+F_W1[AXJ;J,fVQ#:ngqip=Wn>pMmq)&&ZqNIgei=l)1MdNESraEtk"damU-*s88JnlF9*]O/%^me]N[8HHSp*<#GVoO)j;D&)Zb6i<$c,^XCAeKO40DU?S#CaHj&XKOO^EODTXIFPR2.kjMLtqfPf]JgR\K5URaMB&HF@6sJg4ST>K!T-\h<;4gS[a7[C#V0jK!rjUrXkKiM-"#`'%A*JPIlSS-rFM!`_frm>._PcSgh"&M2fb[EV1@d$8bE:9mphhPeLS=+ZcAA%bKFVepD43%YT%JP"_Ln<E&rqFn#JUND@S[[PVd<)0`B5,]_qL5#;LZlHIq/^c*_L&Od`Kes3InE2S5If9]=D`&/k_+]_![S5+;,[BA>bB.(:Op*q[*@/[7k[k.p]GF\\kF2+JN?W5@l[S9p[YqU`.9`Xnpem'>(gOaOY`2/Nab`V;[(=4]*R.K3683.#IBiO%.Y/r'S%_^0.g*.p]p"T1M`9[W>c,(:b+=ZV*/6:Q3Gi2e1DFaa[?oP&q;`Zm((]A)]QhUl^+Ibj0<<FD6kordG%Rr6[)b!_-t?,Ur[O@8>!]@JUmLM@$^Tk]NEMYsq.=Hu&_f&\%EohBf4Lb7bV`bFi9W[k.kEF~>endstream
|
||||
endobj
|
||||
25 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 154
|
||||
>>
|
||||
stream
|
||||
GarW10aki`'SH/ZM@9tpMSAgg\?8mqiPEnc`kbes&H)R/ol0,6$"QTbMaCH[al`nrLsned#/6bJE<%d-S'kNKf0:Gr69g&PKnmf_o%R>#N#C=/jOre^L,bffY'9%276Bm_$f*5:#N'Zkf]EIa(G<<KF8~>endstream
|
||||
endobj
|
||||
26 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1110
|
||||
>>
|
||||
stream
|
||||
Gas1^9lo#Z&A@7.oO/9`/OLd,J`*mF-FHFa%Y[p)Gq9jjCMAOILC*kD[sKA\].&f!.LIoq2r1[plV8Us:E$t5%cDa=mlmP/Al'Kjc)3R$T4([3$3+WkrgsZ3M[UB-&ulGuDT*3]>f$9[DR[16I_!)DgoWNM6hNB;J%Q>Y,olXL=crs0)\`LjMUpf/f%rIDTt-%)cHA>-31nM!i\q!K&C*a+WK,mn89N:VK8]Gr%j8KJ2GPY*rj3IO/\aVb"tZ2j=]9QNCJX,r&J!*lD8D'!"?[[.G+]Rto'HTYl`M@'ZM>37"t`0t#^%CP^AT=9j3&9?X/"fNC>F$\Fi_WB:.@Ii4bg&`dPZ+c66hNojl!`eX`To%)6#c$J)\DEEH:V*:<kDtipHD8g'p(d>J#cFV1!h?0S\BYmK(Q>Y1MuS_KAWiAW^V&KOU>[#kp\$gK(6fC^kR9GOXHnWR0?Y^g]J-XM$>^gT=MZ+mO:$[MA^@/uQm(4M3(tn2ic4)S4."%F*qM/[Ct8D_)B4qlWkEj&?f98An!Fb;Ca)@dOU`\C4&2D-.h^V_j5.Soh#uBLsl.77$Nc,U^dm/P&T6#JbZTcmY5?^jMqjUrHWI\"KcT:=^0h1'"-da!d+UZ6'S,SgksT$&YN+Xo])&G<jk,_(QH_]YrQqca"t`H(#97%B)Adm\RaJSFbF>Ve=bY?u'M"OSoG=Ci;ErD?N&N"n.JHi.4!<O"(PJnft0GeX">jhDI^j5GG?=`sI*U$M>aY/KMl\nd6Nl>Vf-0O,%-47Q`E3(G[o%Rij7q_cbN'PeD[tF.,knLnQAL2KEbt6d`lt5?l(N5F_Ca)X$Fm+kQ[@YL&Yt;de?n7E.nHAWf`EU6m4j.gDGRc][a1cE'Jmr](ZK`?1hLU8Kg]iulGln\9kk]4.IsJuCCnqdT^MN24KE>"D3df=.idn8ipaS,4ZBS!@ujZ:3)=76Z,.SYK6[Wo;f*`G=aI(gL"4<V+/0X+/#p)rtEHGm`hM[=2Y:ZS6aO/eh)b@9fd`^b3"n_?(o*0](WeLjLA9oj$BI<YMp.q"i4_q>Z/Q)cNVf^ak>0;`$O@p(Lq8cb44E+8f!dn_DUa^V^cBr!O1jB%6~>endstream
|
||||
endobj
|
||||
27 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1134
|
||||
>>
|
||||
stream
|
||||
Gas1^9on9n&A?DnW67g*MIB'^&@hD;TP\D@Hc/>;$Q%h;M.[G_Z5n8_n#'tm2`nkg&d68JWnse[#c@6Nr&:e[^1?`b/qPL+li^0@8KdmH[b3$X*oB,AXq8bnQEq:2)FiUdJZr=qpW_1\3IBICD.)u+SC-p1A3V46o8[NIFf+oj9k+_2GViGWDV)*UZ`7S;-jf/A0:P*eEo,6V%_e[fJt&F.7@F?"/J\`_-nBuPA)"YK7\`7`%5?_?VLKma,L-Si^CX_:c=*c]d]d@"i@oT6mLKiDS7/0]c#a2u%h29UM&Crc#[+r9@g-K[CeEj"+TW]eKbPT^FN.c!%;N2KRH->Y=/8Z'hU:L*n3;4Hl:Zq<.q@Mu_TN[4h'm:MVbgQi)3a5adheRH==l"P`afRfe;.KlPL>.QPZ?6ps(2np0hK8Dc?qB;P`&H$W_u9p%Kf.dlt0o^Cm8g@Z\G%2:pr.2`3JJl3?\Ho:/4uT..+8`3HJ1V+e]OE*ODlfTVqV.Vge7U?K;@fZ)'QZ12bke7RE]3i(bAR*(=ANHFc"iLG8XHE<k>E0;["a/\Yq\M\@+XQ,c.hD-8`P5#:X??W^6D]d`p/!D@RTM?&JElNZmBkoDn+=cPA=g+]NW#P`osQP;U3<2UH@6H8Q/&@)"i4B8TR@on'^<LuDs_OZEc2G#Qrmp&kBN.j,OC;Ho)ACMP&T(*9`<$u\ps'Tt=f5b@N[[-bV#Q6!-Mrad0IK3HFe08[aVc3BfgX@!OrBkUN#c#Y^'/#]l]r/<E-JU.DF=CXe/o^0mZei6!mRYrF.#3S^rG:$9W7PYkd3SZ8I!aPugi"q\s%,%T\*.Vn?#Ab=2.2R=V/o@go8jC9O2E7uRkAS9]#if;jB;?l32BHBp:RSC?:-K)_pQEGT'rbq5H&?[n(B$0pZDYQB485gFCDdYP#jK_W:kj'd42it2`L4&"ss;VU=<?,P[[i@-sB2"a`M'YgGj.?qh?M53*^[/gDOP<Y:C#W1s#kmHK00QX_<g2miU:ABI<5>[a`D,DeNdKi[?@F8ee+udOj^M(8"6e/K&Cg-I$2p@53>:8h/:D$K_Cu9(h=Voqpd\rU-Kc5TK>1_(+C3)qA*&5ZrLL74W0j?>EDagSOE((\ZsR70~>endstream
|
||||
endobj
|
||||
28 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1156
|
||||
>>
|
||||
stream
|
||||
GasIfgJ6d"&:L1SW0`X6eWTSW2mca;#p:B'@#<9G,[NP!9&,?l[bC+6k$c#QX1FLG8D#/_@("ggWm7S)'gb08U&1*$^c[<8Y!WI>82h),K*9V,\4XNGoB=Zpn,P2'?lVrb/`H%^C_=PB!`i+?jm;M!bjI)lEIYI-I9PI^hgL.L?V#\DjF.g4rQj@)Oi'q8eUYX(]]L=Uok%E\OD3C6=BW^6!u[E9C9/!_1U&lfV2I[TWbMo>=-c(Wq\gD72fQNV,e1kBgPa@q7['W(Z-d7G&mH?K0,91AlddOkocPTQWc.qsPr1U<-!pXLAT5COZW$YQI5uI5SOEWOn]dZg.!Br`P^0+$.,a[3L1&)?PgD3)&)^W\&mgjJiEBZP2(jkWNq-b`hlu?u(eu))B*fk.enlQ=>RDtk:]r7!'AIqbm("&F@.nR2Tn6:(;JkW[/*MgSoIQ",E/Rr3Q_Go>g$:+d^UThZ)TP6rW^J-J+j<A@:C+!@d<8lV4EltibhGTefJF;i(i3rEMZhCT2=R,1#I'AK<8orI[cW=0f&aV6S*N`aHFrX!8LC%?0Hcmo/;;b!l]q^Ne5plMRWPuZ1A%A)T\6Q"bf-(Zask2#e(rABlkqg]:,#nZn6iUK>'T=1<T42[$.QBmdKia'4]C@U,>;iTpb2XBU.41gT>,N($Brq8Bl8%)c0Nqe^g@WnJJkIe<GN1[ZBB:.aXIua@+qVIdg7*l8#PiLne5L,W?G4n@i,?QS3-LID-lFubmfQ3)f[u#7Ju?6dIH`A"O/tHR?0m:%uI%Ma`h6S\ihfD]EWkpU/6$ZI:ST/&M9$'9J-YdQ#i.!Em[r$aV3pY\ZsXj_IQJAVLA$R_)m/4Jf%!-C*mLbrOlipVF-*H#K+X80O1+hb\2$-crlWTWDh#k0tCfmpsV'^UXEnuIdVqsr,H\kBCf[LNZS$_X62:<bM$?a<MO^C-&RF$oYD$4:9gPGF+;6#Y:S6f3^oa?P_t7F*-co!8F=V]jmO`1:E=R'Xi]UHCu#fr-@?V,eQ@oO3O0(Eg/r6+^K%7DrmK4AU\:AjPrY=CK@q>Y$f<4=`ip^8J'Y`kgk%SJi-9T/jBD.MA7)1'lZYkG[8j("iK>7(Q5r`4I4Cb3`NCVHhGEAnQJ04-Hdo2'H1XrNq!pXOqh*T~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 29
|
||||
0000000000 65535 f
|
||||
0000000073 00000 n
|
||||
0000000114 00000 n
|
||||
0000000221 00000 n
|
||||
0000000330 00000 n
|
||||
0000000525 00000 n
|
||||
0000000720 00000 n
|
||||
0000000915 00000 n
|
||||
0000001110 00000 n
|
||||
0000001305 00000 n
|
||||
0000001500 00000 n
|
||||
0000001696 00000 n
|
||||
0000001892 00000 n
|
||||
0000002088 00000 n
|
||||
0000002284 00000 n
|
||||
0000002480 00000 n
|
||||
0000002550 00000 n
|
||||
0000002847 00000 n
|
||||
0000002976 00000 n
|
||||
0000003606 00000 n
|
||||
0000004518 00000 n
|
||||
0000005285 00000 n
|
||||
0000006076 00000 n
|
||||
0000007137 00000 n
|
||||
0000008458 00000 n
|
||||
0000009440 00000 n
|
||||
0000009685 00000 n
|
||||
0000010887 00000 n
|
||||
0000012113 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<30157dc3b9cf65b8d1eaf3493559908e><30157dc3b9cf65b8d1eaf3493559908e>]
|
||||
% ReportLab generated PDF document -- digest (http://www.reportlab.com)
|
||||
|
||||
/Info 16 0 R
|
||||
/Root 15 0 R
|
||||
/Size 29
|
||||
>>
|
||||
startxref
|
||||
13361
|
||||
%%EOF
|
||||
BIN
tests/fixtures/scanned/receipt/receipt-300dpi-scanned.pdf
vendored
Normal file
BIN
tests/fixtures/scanned/receipt/receipt-300dpi-scanned.pdf
vendored
Normal file
Binary file not shown.
68
tests/fixtures/scanned/receipt/receipt-300dpi.pdf
vendored
Normal file
68
tests/fixtures/scanned/receipt/receipt-300dpi.pdf
vendored
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
%PDF-1.3
|
||||
%“Œ‹ž ReportLab Generated PDF document http://www.reportlab.com
|
||||
1 0 obj
|
||||
<<
|
||||
/F1 2 0 R
|
||||
>>
|
||||
endobj
|
||||
2 0 obj
|
||||
<<
|
||||
/BaseFont /Helvetica /Encoding /WinAnsiEncoding /Name /F1 /Subtype /Type1 /Type /Font
|
||||
>>
|
||||
endobj
|
||||
3 0 obj
|
||||
<<
|
||||
/Contents 7 0 R /MediaBox [ 0 0 612 792 ] /Parent 6 0 R /Resources <<
|
||||
/Font 1 0 R /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ]
|
||||
>> /Rotate 0 /Trans <<
|
||||
|
||||
>>
|
||||
/Type /Page
|
||||
>>
|
||||
endobj
|
||||
4 0 obj
|
||||
<<
|
||||
/PageMode /UseNone /Pages 6 0 R /Type /Catalog
|
||||
>>
|
||||
endobj
|
||||
5 0 obj
|
||||
<<
|
||||
/Author (anonymous) /CreationDate (D:19800101000000+00'00') /Creator (ReportLab PDF Library - www.reportlab.com) /Keywords () /ModDate (D:19800101000000+00'00') /Producer (ReportLab PDF Library - www.reportlab.com)
|
||||
/Subject (unspecified) /Title (untitled) /Trapped /False
|
||||
>>
|
||||
endobj
|
||||
6 0 obj
|
||||
<<
|
||||
/Count 1 /Kids [ 3 0 R ] /Type /Pages
|
||||
>>
|
||||
endobj
|
||||
7 0 obj
|
||||
<<
|
||||
/Filter [ /ASCII85Decode /FlateDecode ] /Length 1055
|
||||
>>
|
||||
stream
|
||||
Gau1-gQ%aW&;KZN'N:ur[T1NI#UFUhiapT#5f$;.l4$A=KQ[GJ@!d.q)B@7-MJ6-lM*`3"HlgTu[Kco_+5Gat7t@"Zb:=!gJ@Yd**!RkqW?)NIW:))uHgqD-?1>?3l3J3j2O;Hg,V,i:O`S'leMC$o5Wq>A_anE]rYb58YH@);$<MB,*$D2\?!\=m"7*&fS@<YsB"bVIpe_tg;scHm2FRpd>N^;/!^ujtXhL)S@%.Kb#,*(O+0&4#n_o(@ro$ubHc`e`=m]JO[8+s=qFP$FZ;r1^^Eie)6,_;l(J!;6fp)%V,Bc%*O\,+ldBn0^"fcG*XDG1J>?%Fg+@`jG0[H1TdCm?*S]7&Q<GJDG<,rFt^0[jaqAq$-\a9<q8;Y\W/iNOrnfKO^!6E*(]aRt-SjFR:0\Y0ma85[d3Fi1Y_i!Tj;.,\\Z%+db;(2*C$8mPbcGZBfL)oZL`0EWM67A!^_BO:ZXa^"Xl7dkY)KrK'e=;ATrb2.E>Y`]ZOi%VH44q$;R1WB9E@B9ge7',Q%FH-AhMBe5a>gq@c<!%:?-H6DS[7g4[-p\9(`IQS2DW8j-m4L<E%R>YlA+QU`&LBo)>OOV)&ZG](+Oo8D"X5>%&3J@YpKr%P)n>ECk>C_OYW2pIhiESJ8\qi1;k+eA0faM1GZ\&Jo.5)8>KJkK6T8J+s1idO6%F(:F\h8=&NdR[,!:EUL_S_DceaNfSX*R:f/;sHII-a_t:UV^S(N+m>n!@;/mbEe15O6+C)Y;/f.^JZ]=&u5HmGCL)j"8s5!Yk)U@AG^CcN\jS.tsX5cpV?og#?.e+:%LHk`aE0hf@rQq@3s*mC&h-j/E-m"r>SmF@s%d(iYL/Mc.R#CV56"B386k1+QPRJM+fECb2@+V(s>i]*:6RJ]:*jY>mI';tsVQ-=M<-NOT\VI@WH[u"UZ'A&];a#8mB2&kCMPb(GVCV'-6Y5HAhm2:uD=6bjUS2=-1qRbo:tJ*[;aHt`pN>0]N`>a)<->H!b3W.nf]C(X3s+C%B=Xul`A61lL"!*fTM*;"`n;s.YOR3756.L]9\0~>endstream
|
||||
endobj
|
||||
xref
|
||||
0 8
|
||||
0000000000 65535 f
|
||||
0000000073 00000 n
|
||||
0000000104 00000 n
|
||||
0000000211 00000 n
|
||||
0000000404 00000 n
|
||||
0000000472 00000 n
|
||||
0000000768 00000 n
|
||||
0000000827 00000 n
|
||||
trailer
|
||||
<<
|
||||
/ID
|
||||
[<30157dc3b9cf65b8d1eaf3493559908e><30157dc3b9cf65b8d1eaf3493559908e>]
|
||||
% ReportLab generated PDF document -- digest (http://www.reportlab.com)
|
||||
|
||||
/Info 5 0 R
|
||||
/Root 4 0 R
|
||||
/Size 8
|
||||
>>
|
||||
startxref
|
||||
1973
|
||||
%%EOF
|
||||
7
tests/fixtures/scanned/run_gen.sh
vendored
Executable file
7
tests/fixtures/scanned/run_gen.sh
vendored
Executable file
|
|
@ -0,0 +1,7 @@
|
|||
#!/run/current-system/sw/bin/bash
|
||||
# Wrapper script to run generate_scanned_fixtures.py with nix-shell dependencies
|
||||
|
||||
nix shell nixpkgs#python3Packages.reportlab nixpkgs#python3Packages.pillow nixpkgs#python3Packages.img2pdf nixpkgs#poppler_utils --command bash -c '
|
||||
cd "$(dirname "$0")"
|
||||
python3 generate_scanned_fixtures.py "$@"
|
||||
'
|
||||
Loading…
Add table
Reference in a new issue