Commit graph

2 commits

Author SHA1 Message Date
jedarden
870d7073f0 feat(pdftract-1tswa): implement GIL release with py.allow_threads on extraction entry points
This implements proper GIL release around all blocking extraction calls
so Python threads can run concurrently during PDF processing.

Changes:
- extract_py: Wrap extract_pdf call with py.allow_threads
- extract_stream: Release GIL during sleep between recv attempts
- Added Python multi-threading test to verify parallelism
- Added rlib to crate-type for unit test support

Acceptance criteria:
- PASS: GIL is released during extraction via py.allow_threads
- PASS: Multi-threading test added to Python test suite
- PASS: Code compiles and formatting verified

Closes: pdftract-1tswa

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 21:23:00 -04:00
jedarden
fca8966f45 feat(pdftract-2nu0s): implement Python SDK contract conformance
Implements the Python SDK with all 9 contract methods, 8 exception
classes, type definitions, asyncio wrappers, and subprocess fallback.

Changes:
- Add Python wrapper module with extract, extract_text, extract_markdown,
  extract_stream, search, get_metadata, hash, classify, verify_receipt
- Add exception hierarchy: PdftractError base class with 7 subclasses
- Add dataclass type definitions: Document, Page, Span, Block, Match,
  Fingerprint, Classification, Metadata
- Add asyncio module with async wrappers for 4 long-running methods
- Add subprocess fallback for when native module fails to import
- Add conformance test runner under tests/test_conformance.py
- Update pyproject.toml with dynamic version from Cargo

Closes: pdftract-2nu0s
2026-05-24 08:55:11 -04:00