jedarden/pdftract

Author	SHA1	Message	Date
jedarden	a65e12b916	docs(pdftract-5xq16): add verification note Add verification note documenting JSON-RPC 2.0 framing implementation with all acceptance criteria PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 00:01:12 -04:00
jedarden	c17ce713ee	feat(pdftract-5xq16): implement JSON-RPC 2.0 framing layer Add hand-rolled JSON-RPC 2.0 implementation for MCP server transports. Module: crates/pdftract-cli/src/mcp/framing/ - Id enum with Number/String/Null variants preserving JSON type - Request, Response, Notification, ErrorObject structs - BatchMessage for batch request handling - Strict jsonrpc version validation (must be "2.0") - All 6 spec-defined error codes (-32700, -32600, -32601, -32602, -32603, -32099..-32000) - Constructor helpers for common patterns Acceptance criteria verified: - Round-trip serialization/deserialization - ID type preservation (number/string/null) - Parse error responses with null id - Method not found error construction - Notification detection (no id field) - Batch request handling - Rejection of invalid jsonrpc versions - Empty batch rejection 16 unit tests covering all spec requirements. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-23 00:00:47 -04:00
jedarden	8c1c02e0e6	feat(pdftract-1wfp): implement SHA256SUMS aggregate file generation Add compute-sha256sums step to pdftract-ci publish-if-tag that produces an aggregate SHA256SUMS file covering all distributed artifacts: binary archives, Python wheels, sdist, and CycloneDX SBOM. Key changes: - Glob-based artifact collection (tar.gz, zip, whl, cdx.json) - Deterministic sorting with LC_ALL=C sort -k 2 for reproducibility - Local verification via sha256sum --check before publishing - Dynamic artifact upload array instead of hardcoded EXPECTED_ARTIFACTS - SBOM added as optional input artifact The SHA256SUMS file format matches GNU coreutils sha256sum output, enabling one-command verification with cosign verify-blob. References: - Plan line 3369: SHA256SUMS aggregate - Plan line 3419: sign-blob of SHA256SUMS - Plan line 3460: one cosign verify-blob umbrella Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 23:57:49 -04:00
jedarden	434d5b154f	docs(pdftract-8zbd): verify CycloneDX SBOM generation implementation All acceptance criteria verified PASS: - generate-sbom template in both workflows (github-release, docker-build) - SBOM attached to GitHub Release assets - SBOM attested to Docker images via cosign attest --type cyclonedx - SBOM included in SHA256SUMS aggregate - cyclonedx-cli validate passes - grype sbom: produces interpretable vulnerability report Tested with existing 127-component SBOM; grype found 1 Low severity vulnerability (GHSA-pph8-gcv7-4qj5 in PyO3 < 0.24.1). Bead: pdftract-8zbd	2026-05-22 23:54:18 -04:00
jedarden	f0919e67d8	feat(pdftract-3gk5): implement SLSA Level 3 provenance generation - Wire generate-provenance and verify-provenance steps into workflow DAG - Update publish-if-tag to upload multiple.intoto.jsonl to GitHub Release - Fix provenance reproducibility by using SOURCE_DATE_EPOCH from git commit - Docker images already have cosign attest --type slsaprovenance Acceptance criteria: - PASS: generate-provenance step wired into DAG - PASS: provenance uploaded to GitHub Release - PASS: Docker image cosign attest already implemented - WARN: Full slsa-verifier verification requires OIDC issuer registration - PASS: Provenance is reproducible using git commit timestamp - PASS: Automated smoke test validates JSON structure Refs: pdftract-3gk5, plan line 3415 (Signing and Provenance) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 23:27:41 -04:00
jedarden	f7e2db9134	feat(pdftract-33v): implement property tests and nightly fuzz job Implements Phase 0.5: Property tests and nightly fuzz job for pdftract. ## Changes ### Per-PR Property Tests - Added ci-proptest profile to .cargo/config.toml (opt-level 2, no LTO) - Added .nextest.toml with ci-proptest profile configuration - Property tests already exist in tests/proptest/ for all modules: - lexer: INV-8 invariant (no panic at public boundary) - object_parser: direct/indirect object parsing - xref: cross-reference table parsing - stream_decoder: decompression filters - cmap_parser: CMap name and string handling - CI workflow integrated with PROPTEST_SEED and PROPTEST_CASES parameters - proptest-regressions/ committed for reproducible failures ### Nightly Fuzz Job - Created pdftract-nightly-fuzz.yaml CronWorkflow - Runs daily at 0400 UTC (schedule: "0 4 * * *") - 24 CPU-hours across 5 fuzz targets (~4.8 hours each) - Fuzz targets already exist in fuzz/fuzz_targets/: - lexer, object_parser, xref, stream_decoder, cmap_parser - Seed corpus populated from tests/fixtures/malformed/ - Crash artifacts uploaded as workflow artifacts - Issue-reporter sidecar integration (placeholder for follow-up) ### Core Features - Added fuzzing feature to crates/pdftract-core/Cargo.toml - Enables cfg(fuzzing) for fuzz harnesses (excludes from default build) ### Infrastructure - Updated .gitignore to exclude generated fuzz/corpus/ - proptest-regressions/ tracked for minimal counterexamples ## Acceptance Criteria - [PASS] proptest runs on every PR; 10,000 cases per module budget - [PASS] proptest-regressions/ is committed and replayed on every run - [PASS] Nightly fuzz CronWorkflow runs for 24 hours without infrastructure failure - [WARN] Issue-reporter sidecar is placeholder (follow-up bead) - [PASS] Proptest panic verification test exists (tests/proptest-panic-verification.rs) ## References - Plan: Phase 0, line 1007 - INV-8 (no panic at public boundary) - EC-08 (circular references), EC-10 (decompression bomb), EC-07 (corrupt xref) - Sibling template: needle uses cargo-fuzz in CronWorkflow Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 23:13:13 -04:00
jedarden	6a35bdd869	feat(pdftract-29z7b): implement unified diagnostic system + CLI commands - Added `cmd_explain_diagnostic` function to CLI for detailed diagnostic code explanation - Added `--list-diagnostics` and `--explain-diagnostic <code>` CLI commands - Verified all Phase 1.1-1.5 modules use unified DiagCode (lexer, parser, xref, stream, catalog, outline, pages) - DIAGNOSTIC_CATALOG provides metadata for all 61 diagnostic codes - Diagnostic struct size: 56 bytes (within 48-64 target range) - emit! macro provides ergonomic diagnostic emission - INV-8 maintained: no panics in error paths All diagnostic codes follow naming convention: - STRUCT_: PDF structure errors - STREAM_: Stream decoder errors - XREF_: Cross-reference table errors - ENCRYPTION_: Encryption-related errors - OCR_: OCR pipeline errors - REMOTE_: Remote source errors - PAGE_: Page-level errors - FONT_: Font pipeline errors - GSTATE_: Graphics state errors - LAYOUT_: Layout and reading order errors - MCP_: MCP server errors - CACHE_: Cache errors References: Phase 1.6 (error recovery), INV-8, Phase 0.4 (clippy enforces doc comments)	2026-05-22 22:38:31 -04:00
jedarden	1959ff2446	feat(pdftract-3uu6v): implement LZWDecode with /EarlyChange parameter - Add LZWDecoder filter using lzw crate v0.10 - Support /EarlyChange parameter (default 1, late 0) - Early change (1): Adobe/TIFF variant, code size increases BEFORE - Late change (0): GIF variant, code size increases AFTER - Full predictor support (TIFF predictor 2, PNG predictors 10-15) - Bomb limit protection with partial bytes on exceed - INV-8 maintained: partial bytes returned on decode errors - 23 tests pass (19 unit tests + 4 proptests) - Fixtures generated using lzw crate for verification Acceptance criteria: - Critical test /EarlyChange=0 byte-perfect: PASS - LZWDecode without /DecodeParms defaults: PASS - LZWDecode + /Predictor 12: PASS - Truncated stream partial bytes: PASS - Bomb limit honored: PASS - proptest no panic: PASS - INV-8 maintained: PASS Refs: Plan Phase 1.5 line 1142, PDF spec 7.4.4 Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 22:38:31 -04:00
jedarden	768b858c36	feat(pdftract-1w22d): implement .NET SDK subprocess wrapper Complete implementation of the Pdftract NuGet package as a subprocess- based SDK with async-first design using System.Diagnostics.Process and System.Text.Json. Implementation: - All 9 contract methods (ExtractAsync, ExtractTextAsync, etc.) with sync wrappers in Pdftract.Sync.cs - 8 exception types inheriting from PdftractException base class - Source discriminated union (PathSource, UrlSource, BytesSource) with FromPath, FromUrl, FromUri, FromBytes factory methods - C# record types for all models (Document, Page, Metadata, etc.) - ExtractOptions, SearchOptions, HashOptions with PascalCase properties - Source-generated JSON serialization via JsonContext for Native AOT - IAsyncEnumerable streaming for NDJSON outputs - CancellationToken propagation to Process.Kill(entireProcessTree: true) Bug fixes: - Fixed ArgumentList handling (was adding List as single element) - Added source.Dispose() cleanup for BytesSource temporary files - Added cleanup for VerifyReceiptAsync temporary receipt file - Added process.EnableRaisingEvents for proper event handling - Fixed output capture to include newlines between lines - Changed to source-generated JSON (JsonContext) instead of reflection Acceptance criteria: - All 9 methods exposed as both async and sync variants - All 8 exception classes inherit from PdftractException - Models as C# records - Supports net8.0 and net9.0 - CancellationToken terminates subprocess Files modified: - pdftract-dotnet/src/Pdftract/Pdftract.cs - pdftract-dotnet/src/Pdftract/Pdftract.Sync.cs - pdftract-dotnet/src/Pdftract/Source/Source.cs - pdftract-dotnet/src/Pdftract/Models/Document.cs - pdftract-dotnet/src/Pdftract/Models/JsonContext.cs - pdftract-dotnet/tests/Pdftract.Tests/ConformanceTests.cs - pdftract-dotnet/README.md - pdftract-dotnet/notes/pdftract-1w22d.md Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 19:50:57 -04:00
jedarden	43d31f8dfc	docs(pdftract-dejqs): update verification note with 2026-05-22 test results Re-verified per-page Resource dictionary inheritance implementation: - All 33 tests pass (resources + pages) - Arc sharing optimization confirmed (Arc::ptr_eq test) - INV-8 maintained (proptests pass) Acceptance criteria: - ✅ 3-level resource inheritance - ✅ Per-key override semantics - ✅ Arc sharing when no merge needed - ✅ ColorSpace inline arrays preserved - ✅ Empty root /Resources propagation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 19:21:39 -04:00
jedarden	cab7f8bf34	docs(pdftract-1zhu): add verification note for /Prev chain handler The /Prev chain handler for incremental PDF updates was already fully implemented. All 12 acceptance criteria tests pass. Verification note added at notes/pdftract-1zhu.md covering: - load_xref_with_prev_chain implementation (xref.rs:2154-2269) - Cycle detection, depth limiting, override semantics - Hybrid file support via load_single_xref - All tests passing (3-revision chain, object lifecycle, trailer handling) Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 19:15:47 -04:00
jedarden	afdd0c9d73	docs(pdftract-dejqs): add verification note for resource inheritance Add verification note confirming that per-page Resource dictionary inheritance is complete and all acceptance criteria are met. The implementation in resources.rs and pages.rs provides: - Per-namespace merging (Font, XObject, ExtGState, ColorSpace, etc.) - Per-key last-write-wins semantics - Arc sharing for memory efficiency when pages lack /Resources - Support for inline ColorSpace arrays All 10 resource-related tests pass, including: - 3-level inheritance test - Per-key override test - Arc sharing test - ColorSpace inline array test - Empty root /Resources test Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 19:15:47 -04:00
jedarden	2663c932aa	feat(pdftract-2gbu9): enhance linearization detection with robust substring matching Enhanced the `detect_linearization` function to avoid false matches when extracting keys from the linearization dictionary. Previous implementation could incorrectly match "/L" within "/Linearized" or "/H" within other keys. Changes: - Added loop-based search in extract_number helper to skip substring matches - Added similar substring-aware logic for /H (hint stream) parsing - Added new diagnostic codes for /Prev chain error handling - Added comprehensive verification note Acceptance criteria PASS: - Non-linearized files return None - Valid linearized dict detected correctly - File size mismatch (incremental update) invalidates linearization - No /H entry returns None for hint_stream_offset - Random bytes never panic (proptest) - Forward scan disabled for linearized files - INV-8 maintained (no panics on arbitrary input) Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 19:15:47 -04:00
jedarden	6d06624682	docs(bf-5en1a): add verification note for max_decompress_bytes default The 512 MiB DEFAULT_MAX_DECOMPRESS_BYTES change was implemented in commit `e94f2ab` (fix(bf-49wmw)). This note documents the verification. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 17:29:02 -04:00
jedarden	256b5c7e5e	feat(pdftract-5og4): add comprehensive proptest for hybrid xref handler The hybrid xref handler (merge_hybrid) was already implemented. This adds a property-based test to verify it handles random combinations of traditional and stream entries without panicking. Changes: - Added proptest_merge_hybrid_no_panic to proptest_tests module - Tests random entry sets using prop::collection::hash_map - Covers all entry types (InUse, Free, Compressed) - Verification note confirms all acceptance criteria PASS Test results: 9/9 merge_hybrid tests pass Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:26:27 -04:00
jedarden	e0b293c3d6	fix(pdftract-2a6rk): fix xref.rs u64 literal overflow in proptest Fixed compilation error in xref.rs where u64 literal 0x5DEECE66D was used with u32 state, causing overflow. Changed state to u64 for proper Java Random algorithm behavior. The OCG /OCProperties parsing implementation was already complete and all tests pass. See notes/pdftract-2a6rk.md for verification. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 17:26:27 -04:00
jedarden	e94f2abec4	fix(bf-49wmw): fix PNG-predictor unbounded pre-allocation - Remove Vec::with_capacity(num_rows * row_size) pre-allocation in apply_png_predictors - Remove Vec::with_capacity(data.len()) pre-allocation in apply_tiff_predictor_2 - Add MAX_ROW_BYTES (64 KB) to bound row size calculation - Add is_row_size_clamped() check to detect suspicious PDF parameters - Add max_output parameter to predictor functions for budget enforcement - Track flate output separately, count predictor output against doc_counter - Lower DEFAULT_MAX_DECOMPRESS_BYTES from 2GB to 512MiB Row-by-row processing ensures peak memory stays at 2x stride regardless of image height, preventing OOM from malicious PDF parameters. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 17:26:27 -04:00
jedarden	2a2a247e87	feat(pdftract-5og4): implement hybrid xref handler with traditional priority Implements merge_hybrid() and is_hybrid_trailer() for hybrid PDF files. Hybrid files have both a traditional xref table at startxref and a supplementary xref stream pointed to by /XRefStm in the trailer. Per PDF spec, the traditional table is authoritative for objects it covers; the stream's type-2 entries fill gaps not covered by the traditional table. Key behaviors: - Traditional entries override stream entries for same object numbers - Stream-only type-2 entries are added as gap fill - Free/InUse conflicts emit STRUCT_HYBRID_CONFLICT diagnostic - Merged trailer has /XRefStm key removed - Result XrefSection has is_hybrid: true set Acceptance criteria: - Critical test: traditional entries override stream entries (PASS) - Gap fill: stream-only type-2 entries added (PASS) - Free/InUse conflict: diagnostic emitted (PASS) - Non-hybrid trailer: is_hybrid_trailer returns false (PASS) - proptest: no panics with random combinations (PASS) - INV-8 maintained: no panics in library code (PASS) Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 17:26:27 -04:00
jedarden	f7e6ff4173	docs(pdftract-5cqy): add xref stream parser verification note The xref stream parser implementation was already complete in crates/pdftract-core/src/parser/xref.rs. All acceptance criteria pass: - Simple test /W [1 4 2] /Index [0 6]: 6 entries decoded correctly - Type-2 compressed entries: route through ObjStm correctly - Multi-subsection /Index [0 3 100 2]: produces correct entries - Predictor support: FlateDecode + PNG predictor handled - Zero-width field /W [1 4 0]: generation defaults to 0 - proptest: random byte sequences never panic - INV-8 maintained: no production panics All 11 xref stream tests pass. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-22 15:30:02 -04:00
jedarden	6d59706cc4	docs(pdftract-6bxw): add ObjStm parser verification note Add comprehensive verification note documenting that the ObjStm parser implementation is complete and all acceptance criteria are met. All 16 unit tests pass, covering: - N=10 object parsing (critical test) - /Extends chain handling - Circular reference detection - Truncated ObjStm recovery - Decompression bomb protection - Cache hit verification (Arc::ptr_eq) - Missing key errors - Embedded stream rejection - Depth limit enforcement Refs: pdftract-6bxw	2026-05-22 15:00:32 -04:00
jedarden	9fca24c77a	docs(plan): SDKs are monorepo members, not separate repos Add a Repository Layout subsection: SDK source lives at root-level pdftract-<lang>/ in this monorepo (single source of truth), generated via pdftract sdk codegen and published to language registries from here. Retire the legacy standalone repos. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 07:21:45 -04:00
jedarden	0932cf1fdc	feat(sdks): vendor dotnet/java/node SDKs into the monorepo Consolidate the .NET, Java, and Node SDKs into root-level pdftract-<lang>/ directories (matching the already-tracked pdftract-go/), per the decision to make the generated SDKs first-class monorepo members rather than separate repos. Content imported from the standalone ~/pdftract-<lang> repos (build artifacts excluded). Removes the broken empty-git nested clones that were polluting the working tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 07:20:19 -04:00
jedarden	bcdc2adea3	test(fixtures): restore malformed PDF corpus, commit so it is durable The 12 synthetic malformed fixtures (generate_test_corpus.py output, tracked in PROVENANCE.md) existed only as untracked files and were swept by a cleanup stash, breaking the provenance pre-commit hook for all commits. Restore from stash and commit them as tracked files so they cannot be lost again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 23:53:33 -04:00
jedarden	2251f8a9c0	docs(plan): make bounded peak-RSS a CI-gated target; default max_decompress_bytes 2GB->512MB Add a Memory targets table as a first-class acceptance criterion alongside Accuracy/Speed/Weight, with a hard per-document peak-RSS ceiling that must not scale with input/payload. Promote OOM-safety to a Tier-1 hard gate. Reconcile the contradictory 2 GB max_decompress_bytes default to the research-backed 512 MB (root cause of an observed multi-GB OOM via the unbounded PNG-predictor pre-alloc under rayon page parallelism). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 23:25:50 -04:00
jedarden	0db78aa5ae	fix(pdftract-6bxw): fix ObjStm parser caching and test data - Change resolve function signature from Fn(ObjRef) -> Option<PdfObject> to Fn(ObjRef) -> Option<PdfStream> for type safety - Fix caching: load_object_stream now properly populates cache - Fix error propagation for /Extends chains (CircularRef, DepthExceeded) - Fix test data: add whitespace between embedded objects for lexer - Fix compilation error in test_truncated_objstm_body All 16 objstm tests now pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 22:47:29 -04:00
jedarden	fabedcf295	docs(pdftract-dejqs): add verification note for per-page resource inheritance Verifies that the per-page Resource dictionary inheritance implementation is complete and correct. All acceptance criteria are met: - 3-level resource inheritance test passes - Per-key override test passes - /Resources missing on page inherits parent's - Arc<ResourceDict> sharing verified with Arc::ptr_eq - ColorSpace inline-array test passes - Empty root /Resources propagates correctly - INV-8 maintained (all fuzz tests pass) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 22:35:43 -04:00
jedarden	0b838de6cc	docs(pdftract-5upi): update verification note with additional bug fix Add documentation for the fix that removed diagnostic emission for unknown keywords, complementing the earlier keyword fallback fix. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-20 22:05:17 -04:00
jedarden	7818f22735	fix(pdftract-5upi): remove diagnostic emission for unknown keywords The lexer should not emit diagnostics for unknown keywords because: 1. Many valid keywords (trailer, xref, etc.) are not in the initial dispatch table 2. The object parser is responsible for validating keywords against known operators 3. Emitting diagnostics here causes false positives for valid PDF constructs This change aligns with the task requirement that unknown keywords emit Token::Keyword without a diagnostic, letting the object parser handle STRUCT_UNKNOWN_KEYWORD if needed. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-20 22:03:58 -04:00
jedarden	fee6ed8afd	fix(pdftract-5upi): correct keyword fallback in lexer Fixed incorrect fallback behavior in keyword lexer functions. Four functions (lex_e_keyword, lex_o_keyword, lex_r_keyword, lex_n_keyword) were incorrectly calling lex_name() instead of lex_keyword() when keywords didn't match. When a PDF contains an unrecognized word starting with e/o/n/R (e.g., "endob" instead of "endobj"), the lexer should fall back to generic keyword parsing (Token::Keyword(bytes)), not name parsing. Names always start with /, so calling lex_name() on input without a leading / would incorrectly skip the first byte. References: - Bead: pdftract-5upi - Notes: notes/pdftract-5upi.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:55:55 -04:00
jedarden	52bcb16bf6	feat(pdftract-3gk5): add SLSA Level 3 provenance generation Implements SLSA Level 3 build provenance generation for the release pipeline. Each release produces a multiple.intoto.jsonl file that names the source commit, builder identity (iad-ci OIDC issuer), command line, and materials consumed. Changes: - Add generate-provenance template that creates SLSA Provenance v1.0 predicate following in-toto Statement format - Add verify-provenance template with slsa-verifier smoke test - Update DAG dependencies: generate-provenance -> verify-provenance -> publish-if-tag - Include provenance in SHA256SUMS and GitHub Release upload - Sync workflow to declarative-config for ArgoCD Acceptance criteria: - PASS: generate-provenance template creates multiple.intoto.jsonl - PASS: verify-provenance runs slsa-verifier validation - PASS: provenance flows to publish-if-tag and GitHub Release - WARN: Full cryptographic verification requires OIDC issuer registration with Sigstore (one-time setup) Refs: - Plan section: Release Engineering / Signing and Provenance, line 3402 - Bead: pdftract-3gk5 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 21:55:55 -04:00
jedarden	5f656c99f8	docs(pdftract-58kz): add verification note Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:39:55 -04:00
jedarden	bb5346b305	docs(pdftract-58kz): add security policy documentation Add comprehensive SECURITY.md covering: - Supported versions policy - Private vulnerability reporting (email + GitHub) - 90-day disclosure window with timelines - CVE assignment via GitHub Security Advisories - In-scope and out-of-scope vulnerability classes - Safe harbor policy for good-faith researchers Add security issue template redirecting users to private reporting. Add Security section to CONTRIBUTING.md and README.md with links to SECURITY.md. Add docs/security/pgp-public-key.asc placeholder with generation instructions. References: bead pdftract-58kz, plan line 3433 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:39:24 -04:00
jedarden	64bb59d76f	docs(pdftract-8zbd): add SBOM generation verification note Documents that CycloneDX SBOM generation is fully implemented in the Argo Workflows (declarative-config). The workflows: - Generate pdftract-vX.Y.Z.cdx.json using cargo-cyclonedx - Validate schema with cyclonedx-cli validate - Attest to Docker images via cosign attest --type cyclonedx - Attach to GitHub Release as an asset - Include in SHA256SUMS aggregate Acceptance criteria: 5 PASS, 1 WARN (grype test requires release) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:38:25 -04:00
jedarden	6fa837d3c9	docs(pdftract-8eo1): add verification note for cosign keyless signing implementation Status: Implementation COMPLETE, infrastructure blocker REMAINING Implemented: - cosign installed in pdftract-github-release.yaml and pdftract-docker-build.yaml - OIDC token projection configured with audience: sigstore - SHA256SUMS signing via cosign sign-blob - Docker image signing for all 3 variants (latest, ocr, full) - SLSA provenance attestation via cosign attest - README verification documentation complete Blocker: - OIDC issuer https://iad-ci-oidc.ardenone.com not in public Fulcio config - Requires PR to sigstore/fulcio OR self-hosted Fulcio (v1.1+) References: - https://github.com/sigstore/fulcio/blob/main/config/identity/config.yaml - Bead pdftract-8eo1	2026-05-20 19:36:09 -04:00
jedarden	9348407d76	docs(pdftract-68pe): update verification note with SLSA attestation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Bead-Id: pdftract-68pe	2026-05-20 19:35:51 -04:00
jedarden	c28b23fd2b	docs(pdftract-1lw3): add verification note for release cascade workflow Documents the completed implementation of pdftract-release-cascade WorkflowTemplate and pdftract-tag-trigger Argo Events Sensor. Acceptance criteria: - PASS: All infrastructure files committed in declarative-config - WARN: Runtime verification deferred (kubectl not available in env) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:33:35 -04:00
jedarden	c335423468	docs(pdftract-68pe): update verification note with OIDC improvements Documents the enhancements made to cosign keyless signing: - Projected service account token with sigstore audience - Explicit OIDC issuer URL configuration - Improved digest extraction with fallback strategies Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:27:08 -04:00
jedarden	419f18e41a	feat(pdftract-154mz): fix canonicalization module compilation Make diagnostics module visible to fingerprint module and fix hash_page_geometry signature to match usage. Changes: - Add `pub mod diagnostics;` to lib.rs for module visibility - Modify hash_page_geometry to create diagnostics internally The canonicalize module already has complete implementation: - canonicalize_f64: banker's rounding to 4dp for geometry - normalize_content_stream: whitespace normalization via lexer - serialize_dict_canonical: sorted-key dict serialization - hash_resource_dict_canonical: order-independent resource hashing Verification: notes/pdftract-154mz.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:24:38 -04:00
jedarden	4ddf954169	docs(pdftract-2xei): add verification note for pdftract-docs-build template Documents the WorkflowTemplate creation for mdBook → Cloudflare Pages CI. Template committed to declarative-config 4fe4947. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:24:14 -04:00
jedarden	5485a15550	docs(pdftract-2x7y): add verification note for pdftract-github-release Documents the implementation of the pdftract-github-release WorkflowTemplate, including artifact taxonomy, release notes generation, and acceptance criteria status. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:23:39 -04:00
jedarden	89d16a6a59	docs(pdftract-68pe): add verification note	2026-05-20 19:18:38 -04:00
jedarden	eb835161e9	feat(pdftract-33v): add property tests and nightly fuzz job Add per-PR property tests and nightly fuzz job infrastructure: CI Changes (declarative-config): - pdftract-ci.yaml: Add proptest step to test-matrix - New test-proptest template with configurable case count - Sets PROPTEST_SEED for reproducibility - Runs 10,000 cases per module within 1 CPU-hour budget - pdftract-nightly-fuzz.yaml: Sync fuzz workflow - CronWorkflow runs daily at 0400 UTC - 5 fuzz targets with address sanitizer - Seed corpus from malformed fixtures Existing Infrastructure (Already in Place): - Proptest suites for lexer, object_parser, xref, stream, cmap_parser - Fuzz targets for all 5 modules - proptest-regressions/ with README - Seed corpus in fuzz/corpus/ Verification: - Added tests/proptest-panic-verification.rs - Proptest infrastructure correctly structured - Will catch deliberate panics within budget Closes: pdftract-33v	2026-05-20 19:18:03 -04:00
jedarden	79f13c92c3	feat(pdftract-68pe): add Dockerfile with FEATURES build-arg support Adds multi-stage Dockerfile supporting three feature variants: - default: baseline features, distroless base (~20 MB) - ocr: default + OCR (Tesseract), debian-slim base (~120 MB) - full: all features, debian-slim base (~140 MB) The FEATURES build-arg selects the variant at build time. Bead: pdftract-68pe Plan: Release Engineering / Argo WorkflowTemplates, line 3392	2026-05-20 19:17:49 -04:00
jedarden	442e973508	docs(pdftract-5x3u): add verification note for pdftract-crates-publish Documents the implementation of the pdftract-crates-publish WorkflowTemplate in jedarden/declarative-config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:17:44 -04:00
jedarden	fda4403014	docs(pdftract-245s): add verification note for pdftract-py-ci WorkflowTemplate Documents the implementation of the pdftract-py-ci WorkflowTemplate that builds 5 platform wheels + 1 sdist using maturin and publishes to PyPI via twine. Acceptance criteria: - PASS: WorkflowTemplate file at correct location - PASS: Failed platform builds don't cancel others (continueOn.failed: true) - PASS: Idempotent re-runs (twine --skip-existing) - PASS: PyPI token from ESO Secret configured - WARN: Test workflow submission (requires iad-ci cluster access) - WARN: Actual pip install test (requires PyPI publish) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:12:56 -04:00
jedarden	ae17a42489	docs(pdftract-2a6rk): add OCG /OCProperties parsing verification note The OCG implementation was already complete in ocg.rs. All 20 tests pass: - BaseState parsing (ON/OFF/Unchanged) - /ON and /OFF array override handling - OCMD policy preservation (AllOn, AnyOn, AllOff, AnyOff) - INV-8 compliance verified via proptests Phase 3 will consume OcProperties via is_visible() to suppress glyphs in /OC /OCGRef BDC blocks when the referenced OCG is OFF. Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-05-20 19:11:56 -04:00
jedarden	6bdc2b5278	docs(pdftract-2pyln): update verification note with bug fix details Add details about the BytesSource cleanup bug fix and clarify that the contract defines 7 error kinds, not 8 as initially stated in the task. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:09:49 -04:00
jedarden	5781d67d5c	fix(pdftract-2pyln): add source parameter to invoke methods for BytesSource cleanup - Add source Source parameter to invoke, invokeJSON, invokeString, invokeStream - Change BytesSource from []byte type to struct with data and tmpPath fields - Add proper cleanup of temporary files after subprocess execution - Fix source parameter pass-through in Extract, ExtractText, ExtractMarkdown, GetMetadata, Hash, Classify This ensures BytesSource temporary files are cleaned up after use, preventing file descriptor leaks. The BytesSource now creates a temp file on demand and cleans it up automatically via defer in the invoke methods. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:08:14 -04:00
jedarden	e0dea12849	docs(pdftract-220e): add verification note for pdftract-build-binaries template Documents the completed WorkflowTemplate creation including: - 10-item matrix build (5 triples × 2 feature variants) - Cross-compilation setup with osxcross SDK - Archive packaging with licenses, README, CHANGELOG excerpt - Reproducibility via SOURCE_DATE_EPOCH Acceptance criteria: 5 PASS, 2 WARN (kubectl unavailable, no test run) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-20 19:08:02 -04:00
jedarden	5dca47b976	docs(pdftract-4b0z): add verification note	2026-05-20 19:06:36 -04:00

1 2 3

150 commits