From 3d4f29b9b8623da5c18e25274b836ad75d0c0b42 Mon Sep 17 00:00:00 2001 From: jedarden Date: Sun, 24 May 2026 05:23:43 -0400 Subject: [PATCH] docs(pdftract-jmh6w): add verification note --- notes/pdftract-jmh6w.md | 81 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 notes/pdftract-jmh6w.md diff --git a/notes/pdftract-jmh6w.md b/notes/pdftract-jmh6w.md new file mode 100644 index 0000000..f45ffee --- /dev/null +++ b/notes/pdftract-jmh6w.md @@ -0,0 +1,81 @@ +# pdftract-jmh6w: rayon+tokio concurrency bridge + +## Summary + +Implemented Phase 6.4.3: rayon+tokio concurrency bridge with documentation and testing. + +## Changes Made + +### 1. Documentation (serve.rs) +- Added comprehensive "Concurrency model" section to module rustdoc explaining: + - Two-level concurrency: tokio (per-request) + rayon (per-document) + - spawn_blocking bridge between async and sync + - Thread pool sizing (tokio: 512, rayon: num_cpus) +- Added "Error codes" section documenting all error response codes + +### 2. CLI Help (main.rs) +- Added long_about to Serve command documenting: + - Concurrency architecture + - Endpoints (/extract, /extract/text, /extract/stream, /health) + - Cache behavior + +### 3. Error Handling (serve.rs) +- Added `InternalPanic` variant to `AxumError` enum +- Updated `IntoResponse` to return specific error codes: + - BAD_REQUEST (400) + - EXTRACTION_ERROR (422) + - INTERNAL_ERROR (500) + - INTERNAL_PANIC (500) +- Improved JoinError handling in all three POST handlers: + - `extract_handler` + - `extract_text_handler` + - `extract_stream_handler` +- Distinguishes between cancellation (INTERNAL_ERROR) and panic (INTERNAL_PANIC) + +### 4. Testing (serve.rs) +- `test_error_into_response`: Verifies error status codes +- `test_cache_status_conversions`: Tests CacheStatus enum conversions +- `test_concurrent_requests_parallel`: Critical integration test + - Starts server on random port + - Verifies /health responds in < 100ms + - Launches 8 concurrent extraction requests + - Verifies all requests complete + - Verifies wallclock time < serialized estimate (proves parallelism) + - Verifies /health still responds quickly during load + +### 5. Dependencies (Cargo.toml) +- Added `multipart` feature to reqwest dev-dependency for integration testing + +## Acceptance Criteria Status + +| Criterion | Status | Notes | +|-----------|--------|-------| +| 8 concurrent requests complete in parallel | PASS | Integration test runs 8 concurrent requests and verifies parallelism | +| /health responds in < 100ms during 8 concurrent extractions | PASS | Test verifies /health response time < 100ms under load | +| Rayon par_iter inside spawn_blocking works | PASS | Already implemented; unchanged | +| Module rustdoc documents concurrency model | PASS | Added "Concurrency model" section to serve.rs rustdoc | +| CLI --help documents concurrency model | PASS | Added long_about to Serve command | + +## Git Commit + +``` +66b3eff feat(pdftract-jmh6w): implement rayon+tokio concurrency bridge +``` + +## Test Results + +``` +running 3 tests +test serve::tests::test_cache_status_conversions ... ok +test serve::tests::test_error_into_response ... ok +test serve::tests::test_concurrent_requests_parallel ... ok + +test result: ok. 3 passed; 0 failed; 0 ignored +``` + +## Notes + +- The spawn_blocking pattern was already implemented; this bead added documentation, improved error handling, and testing +- Integration test uses existing test fixture `hello.pdf` from pdftract-libpdftract +- Test focuses on concurrency proof (all requests complete in parallel) rather than extraction success +- The pre-existing clippy error in pdftract-core build.rs is unrelated to this change