pdftract/notes/bf-49wmw.md
jedarden e94f2abec4 fix(bf-49wmw): fix PNG-predictor unbounded pre-allocation
- Remove Vec::with_capacity(num_rows * row_size) pre-allocation in apply_png_predictors
- Remove Vec::with_capacity(data.len()) pre-allocation in apply_tiff_predictor_2
- Add MAX_ROW_BYTES (64 KB) to bound row size calculation
- Add is_row_size_clamped() check to detect suspicious PDF parameters
- Add max_output parameter to predictor functions for budget enforcement
- Track flate output separately, count predictor output against doc_counter
- Lower DEFAULT_MAX_DECOMPRESS_BYTES from 2GB to 512MiB

Row-by-row processing ensures peak memory stays at 2x stride regardless
of image height, preventing OOM from malicious PDF parameters.

Co-Authored-By: Claude Code <noreply@anthropic.com>
2026-05-22 17:26:27 -04:00

2.4 KiB

bf-49wmw: Fix PNG-predictor unbounded pre-allocation

Summary

Fixed OOM root cause in PNG/TIFF predictor application by removing unbounded pre-allocation and implementing row-by-row processing with budget enforcement.

Changes Made

1. Added MAX_ROW_BYTES constant (64 KB)

  • Bounds row size calculation to prevent OOM from malicious PDF parameters
  • Ensures peak memory stays at 2x stride (prev_row + current_row)

2. Added is_row_size_clamped() check

  • Returns true when calculated row_size exceeds MAX_ROW_BYTES
  • Functions return data as-is rather than risking incorrect decoding when parameters are suspicious

3. apply_png_predictors() changes

  • Removed Vec::with_capacity(num_rows * row_size) pre-allocation
  • Now uses Vec::new() and grows row-by-row
  • Added max_output parameter for budget enforcement
  • Checks budget before processing each row
  • Returns partial data when budget exceeded

4. apply_tiff_predictor_2() changes

  • Removed Vec::with_capacity(data.len()) pre-allocation
  • Now uses Vec::new() and grows row-by-row
  • Added max_output parameter for budget enforcement
  • Checks budget before processing each row
  • Returns partial data when budget exceeded

5. FlateDecoder changes

  • Tracks flate output separately from predictor output
  • Counts final predictor output against doc_counter (not flate bytes)
  • Passes remaining budget to predictor via predictor_budget

6. Lowered DEFAULT_MAX_DECOMPRESS_BYTES

  • Changed from 2 GB to 512 MiB for more reasonable default

Verification

Tests PASS

  • All 31 predictor tests pass
  • test_flate_decode_bomb_limit_with_predictor - verifies budget enforcement
  • test_flate_decode_performance_100mb - verifies performance characteristics
  • test_predictor_multiple_rows_tiff - verifies TIFF predictor
  • All PNG predictor tests (10/11/12/13/14/15) - verifies row-by-row processing

Code Review

  • No remaining Vec::with_capacity(data.len()) or Vec::with_capacity(num_rows * row_size) patterns
  • All other decoders (ASCII85, ASCIIHex, Passthrough) already use incremental growth
  • Peak memory bounded to 2x stride (MAX_ROW_BYTES = 64 KB) regardless of image height

Acceptance Criteria

  • Row-by-row processing (peak 2x stride)
  • Output counted against max_decompress_bytes
  • Never pre-size to claimed/decompressed length
  • Same discipline for apply_tiff_predictor_2
  • All Vec::with_capacity(data.len()) sites addressed