History

jedarden 1c6f26ecaa fix(bf-4mkhv): clean up unused imports in hash.rs The bead description mentioned compile errors in hash.rs from API drift, but those errors were either already fixed or misattributed. The API usage was already correct: - compute_fingerprint already takes 3 arguments with source - len() already propagates Result with ? - read_at method already used correctly - Catalog fields accessed via trailer correctly Only cleanup: removed unused std::fs::File and std::io imports. Verification: notes/bf-4mkhv.md		2026-06-01 09:43:48 -04:00
..
ci	docs(pdftract-5l9m): add CI validation script and verification note	2026-05-18 01:05:33 -04:00
analyze-docs.sh	fix(bf-4mkhv): clean up unused imports in hash.rs	2026-06-01 09:43:48 -04:00
analyze_doc_coverage.py	wip: intermediate state from previous work	2026-05-29 08:25:23 -04:00
audit_doc_coverage.py	wip: intermediate state from previous work	2026-05-29 08:25:23 -04:00
check-provenance.sh	fix(pdftract-5z5d8): fix provenance validation script	2026-05-17 23:43:37 -04:00
check-secrets.sh	feat(pdftract-59zz): implement MCP bearer token ingress channels and TH-03 enforcement	2026-05-18 02:47:54 -04:00
check_doc_coverage.sh	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
count_doc_coverage.sh	fix(bf-4mkhv): clean up unused imports in hash.rs	2026-06-01 09:43:48 -04:00
count_rustdoc_coverage.rs	fix(bf-4mkhv): clean up unused imports in hash.rs	2026-06-01 09:43:48 -04:00
debug_stream_fixtures.py	feat(pdftract-91e1i): HTTP fetch sequence implementation	2026-05-28 13:17:00 -04:00
doc_coverage.py	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
doc_coverage.rs	feat(pdftract-91e1i): HTTP fetch sequence implementation	2026-05-28 13:17:00 -04:00
doc_coverage.sh	fix(pyo3): correct extract_text_fn call in extract_markdown stub	2026-05-28 20:28:25 -04:00
fetch-shape-corpus.sh	feat(glyph-shape): implement font corpus fetch script and shape DB generation	2026-05-24 09:48:29 -04:00
generate-minimal-pdf.sh	feat(bf-1g1fd): implement CI memory-ceiling gate with cgroup MemoryMax enforcement	2026-05-23 13:22:55 -04:00
generate_document_model_fixtures.sh	fix(pdftract-2uk9z): wrap native module results in typed Python objects	2026-05-28 21:18:38 -04:00
generate_test_corpus.py	test(classifier): add 200-document labeled corpus for Phase 5.6	2026-05-17 07:16:02 -04:00
measure-doc-coverage.sh	fix(bf-4mkhv): clean up unused imports in hash.rs	2026-06-01 09:43:48 -04:00
measure-public-api-coverage.py	wip: intermediate state from previous work	2026-05-29 08:25:23 -04:00
README.md	test(bf-5dnh1): add memory ceiling enforcement for proptests	2026-05-23 13:39:04 -04:00
run-fuzz-with-limits.sh	feat(bf-1g1fd): implement CI memory-ceiling gate with cgroup MemoryMax enforcement	2026-05-23 13:22:55 -04:00
run-proptest-with-limits.sh	test(bf-5dnh1): add memory ceiling enforcement for proptests	2026-05-23 13:39:04 -04:00
rustdoc_coverage.py	feat(pdftract-91e1i): HTTP fetch sequence implementation	2026-05-28 13:17:00 -04:00
rustdoc_coverage.sh	wip: intermediate state from previous work	2026-05-29 08:25:23 -04:00

README.md

Scripts

This directory contains utility scripts for pdftract development and testing.

Memory Ceiling Enforcement

Fuzz Tests (`run-fuzz-with-limits.sh`)

Runs cargo-fuzz targets with memory limits to ensure pathological inputs fail fast:

scripts/run-fuzz-with-limits.sh [target]

Memory limits:

Cgroup MemoryMax: 1536 MB (hard ceiling)
Libfuzzer RSS limit: 1024 MB (per-execution)
Libfuzzer malloc limit: 1024 MB (total)

Environment:

FUZZ_TIME_SECONDS: Time per target (default: 60)
MEMORY_MAX_MB: Cgroup limit in MB (default: 1536)
RSS_LIMIT_MB: Libfuzzer RSS limit (default: 1024)

Implementation: Uses cgroup v2 MemoryMax (preferred) or cgroup v1 memory.limit_in_bytes with OOM killer disabled for clean failure mode.

Property Tests (`run-proptest-with-limits.sh`)

Runs proptest modules with memory limits:

scripts/run-proptest-with-limits.sh [test_name]

Memory limits:

Cgroup MemoryMax: 2048 MB (hard ceiling)

Environment:

PROPTEST_CASES: Test cases per module (default: 1000)
MEMORY_MAX_MB: Cgroup limit in MB (default: 2048)
PROPTEST_SEED: Proptest seed (default: random)

Proptest modules: lexer, object_parser, xref, stream, cmap_parser

Input size caps: All proptest strategies are bounded:

Lexer/object parser: up to 10 KB inputs
Xref/stream parsers: up to 100 KB inputs
Nested structures: depth-limited (e.g., 500 for parser depth checks)

These bounds ensure tests complete quickly while still exercising edge cases.

Why Memory Ceilings?

Per bf-1g1fd and the Quality Targets (plan.md Phase 0.4), adversarial inputs must not OOM the host. Memory ceilings enforce:

Clean failure mode - Allocation errors instead of host OOM
Fast failure - Pathological cases abort immediately at the limit
Regressions as test failures - Memory growth is caught in CI

CI enforces these limits via cgroup MemoryMax in .ci/argo-workflows/pdftract-ci.yaml (proptests) and .ci/argo-workflows/pdftract-nightly-fuzz.yaml (fuzz).

Other Scripts

`generate-minimal-pdf.sh`

Generates minimal valid PDF documents for testing.

`check-provenance.sh`

Verifies binary provenance and SBOM signatures.

`check-secrets.sh`

Scans for accidental secrets in committed code.

`generate_test_corpus.py`

Generates synthetic PDF test corpus.

README.md

Scripts

Memory Ceiling Enforcement

Fuzz Tests (run-fuzz-with-limits.sh)

Property Tests (run-proptest-with-limits.sh)

Why Memory Ceilings?

Other Scripts

generate-minimal-pdf.sh

check-provenance.sh