pdftract/notes/pdftract-2rf.md
jedarden 0e42622593 ci(pdftract-2rf): implement quality matrix cargo-bloat gate
Add cargo-bloat template to enforce 4 MB binary size budget for
x86_64-unknown-linux-musl target. Completes Phase 0.4 quality
matrix implementation.

Changes:
- Add cargo-bloat template with stripped binary size measurement
- Generate bloat-report.json artifact for historical tracking
- Include remote feature analysis for PB-5 (alt-feature escape hatch)
- Remove orphaned clippy-unwrap template (already in clippy-fmt)
- Update documentation comments to reflect current templates

All 5 Tier 1 quality gates now implemented:
1. clippy-fmt (existing)
2. msrv-check (existing)
3. cargo-audit (existing)
4. cargo-deny (existing)
5. cargo-bloat (new)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 11:33:49 -04:00

5.5 KiB

Verification Note: pdftract-2rf (Quality Matrix Implementation)

Summary

Implemented Phase 0.4: Static analysis and quality gates for the pdftract-ci Argo WorkflowTemplate. Added the missing cargo-bloat template and cleaned up orphaned code.

Changes Made

1. Added cargo-bloat Template (lines 892-1018)

  • Purpose: Enforce 4 MB binary size budget for x86_64-unknown-linux-musl target
  • Implementation:
    • Installs cargo-bloat if not present in the image
    • Runs cargo bloat --release --features default --crates --target x86_64-unknown-linux-musl -n 50
    • Measures stripped binary size using x86_64-linux-musl-strip
    • Enforces 4,194,304 byte (4 MB) budget
    • Generates bloat-report.json artifact with:
      • Stripped size in bytes
      • Budget comparison
      • Raw cargo-bloat output
      • Remote feature analysis (for PB-5 tracking)
    • Fails with actionable error if budget exceeded (references PB-2 Bloom filter escape hatch)

2. Removed Orphaned clippy-unwrap Template

  • Why removed: The clippy-fmt template already performs both clippy passes:
    1. Full workspace check with -D warnings
    2. Library-only INV-8 check with -D clippy::unwrap_used -D clippy::expect_used
  • The orphaned clippy-unwrap template was not referenced in the quality-matrix DAG

3. Updated Documentation Comments

  • Updated DAG structure comments to reflect current template names
  • Removed obsolete clippy-unwrap references from comments

Quality Matrix Status

All 5 Tier 1 quality gates are now implemented:

Gate Template Status
clippy-fmt clippy-fmt ✓ (existing)
msrv-check msrv-check ✓ (existing)
cargo-audit cargo-audit ✓ (existing)
cargo-deny cargo-deny ✓ (existing)
cargo-bloat cargo-bloat ✓ (NEW)

Acceptance Criteria

PASS Criteria

  • All five quality steps appear in the WorkflowTemplate DAG as quality-matrix
  • cargo-bloat template is defined with proper resource limits and artifact output
  • Binary size budget enforcement is implemented (<= 4 MB for x86_64-unknown-linux-musl)
  • Remote feature tracking is included for PB-5 (alt-feature escape hatch data)
  • bloat-report.json is published as artifact
  • Green PR run shows all five passing within 8 min combined wall-clock
    • Reason: Cannot submit actual PR/CI run without access to iad-ci cluster
    • Verification method: Manual inspection of workflow templates confirms all gates are properly configured

FAIL Criteria (To be tested manually)

  • A deliberate unwrap() added inside crates/pdftract-core/src/lib.rs causes the clippy gate to fail
    • Reason: Requires code change and CI execution to verify
  • A deliberate advisory-vulnerable dep causes the audit gate to fail
    • Reason: Requires modifying Cargo.lock and CI execution
  • A deliberate GPL-licensed dep causes the deny gate to fail
    • Reason: Requires adding GPL dependency and CI execution
  • A deliberate use of Rust 1.79+ feature causes the MSRV gate to fail
    • Reason: requires code change and CI execution
  • bloat-report.json is inspectable from the Argo UI
    • Reason: Requires actual workflow execution on iad-ci cluster

Configuration Files Verified

audit.toml (existing)

  • Located at /home/coding/pdftract/audit.toml
  • Configured with:
    • Advisory ignore format documented
    • Terse output for CI logs
    • Official RustSec database path
    • --ignore unmaintained flag passed in CI (not in config)

deny.toml (existing)

  • Located at /home/coding/pdftract/deny.toml
  • Configured with:
    • License allowlist: MIT, Apache-2.0, BSD-2/3-Clause, ISC, Zlib, Unicode-DFS-2016
    • MPL-2.0 exceptions for cbindgen (ADR-001) and option-ext (ADR-002)
    • Advisory ignores for RUSTSEC-2020-0144 (lzw), RUSTSEC-2021-0145 (atty), RUSTSEC-2024-0375 (atty), RUSTSEC-2025-0020 (pyo3)
    • Wildcard dependencies denied
    • Unknown registries and git sources denied

Technical Notes

cargo-bloat Implementation Details

  1. Target-specific gating: Only x86_64-unknown-linux-musl is gated. Other targets (macOS, Windows) are informational due to larger binary metadata overhead.
  2. Stripped size measurement: Uses x86_64-linux-musl-strip to get accurate production binary size.
  3. JSON report structure:
    {
      "timestamp": "ISO-8601",
      "commit_sha": "workflow.parameters.commit-sha",
      "target": "x86_64-unknown-linux-musl",
      "features": "default",
      "stripped_size_bytes": <size>,
      "budget_bytes": 4194304,
      "within_budget": true|false,
      "raw_output": "<cargo-bloat text output>",
      "remote_features_raw": "<cargo-bloat --features remote output>"
    }
    
  4. Error handling: Provides clear next step (PB-2 Bloom filter) when budget exceeded.

Template Resource Allocation

  • CPU: 1000m request, 2000m limit
  • Memory: 2Gi request, 4Gi limit
  • ActiveDeadlineSeconds: 600 (10 minutes)

References

  • Plan section: Phase 0, line 1007 (clippy, bloat, audit, deny, MSRV)
  • INV-8 (no panic at public boundary)
  • R2 (binary size risk), PB-2 (Bloom filter escape hatch)
  • ADR-002 (wordlist storage) - Note: ADR-002 in repo is MPL-2.0 exception, not wordlist storage. Wordlist ADR is expected in later phase.

Files Modified

  • .ci/argo-workflows/pdftract-ci.yaml (added cargo-bloat template, removed clippy-unwrap orphan, updated comments)

Commit Hash

(TBD - will be populated after commit)