docs(pdftract-jmh6w): add verification note

This commit is contained in:
jedarden 2026-05-24 05:23:43 -04:00
parent 66b3eff9cb
commit 3d4f29b9b8

81
notes/pdftract-jmh6w.md Normal file
View file

@ -0,0 +1,81 @@
# pdftract-jmh6w: rayon+tokio concurrency bridge
## Summary
Implemented Phase 6.4.3: rayon+tokio concurrency bridge with documentation and testing.
## Changes Made
### 1. Documentation (serve.rs)
- Added comprehensive "Concurrency model" section to module rustdoc explaining:
- Two-level concurrency: tokio (per-request) + rayon (per-document)
- spawn_blocking bridge between async and sync
- Thread pool sizing (tokio: 512, rayon: num_cpus)
- Added "Error codes" section documenting all error response codes
### 2. CLI Help (main.rs)
- Added long_about to Serve command documenting:
- Concurrency architecture
- Endpoints (/extract, /extract/text, /extract/stream, /health)
- Cache behavior
### 3. Error Handling (serve.rs)
- Added `InternalPanic` variant to `AxumError` enum
- Updated `IntoResponse` to return specific error codes:
- BAD_REQUEST (400)
- EXTRACTION_ERROR (422)
- INTERNAL_ERROR (500)
- INTERNAL_PANIC (500)
- Improved JoinError handling in all three POST handlers:
- `extract_handler`
- `extract_text_handler`
- `extract_stream_handler`
- Distinguishes between cancellation (INTERNAL_ERROR) and panic (INTERNAL_PANIC)
### 4. Testing (serve.rs)
- `test_error_into_response`: Verifies error status codes
- `test_cache_status_conversions`: Tests CacheStatus enum conversions
- `test_concurrent_requests_parallel`: Critical integration test
- Starts server on random port
- Verifies /health responds in < 100ms
- Launches 8 concurrent extraction requests
- Verifies all requests complete
- Verifies wallclock time < serialized estimate (proves parallelism)
- Verifies /health still responds quickly during load
### 5. Dependencies (Cargo.toml)
- Added `multipart` feature to reqwest dev-dependency for integration testing
## Acceptance Criteria Status
| Criterion | Status | Notes |
|-----------|--------|-------|
| 8 concurrent requests complete in parallel | PASS | Integration test runs 8 concurrent requests and verifies parallelism |
| /health responds in < 100ms during 8 concurrent extractions | PASS | Test verifies /health response time < 100ms under load |
| Rayon par_iter inside spawn_blocking works | PASS | Already implemented; unchanged |
| Module rustdoc documents concurrency model | PASS | Added "Concurrency model" section to serve.rs rustdoc |
| CLI --help documents concurrency model | PASS | Added long_about to Serve command |
## Git Commit
```
66b3eff feat(pdftract-jmh6w): implement rayon+tokio concurrency bridge
```
## Test Results
```
running 3 tests
test serve::tests::test_cache_status_conversions ... ok
test serve::tests::test_error_into_response ... ok
test serve::tests::test_concurrent_requests_parallel ... ok
test result: ok. 3 passed; 0 failed; 0 ignored
```
## Notes
- The spawn_blocking pattern was already implemented; this bead added documentation, improved error handling, and testing
- Integration test uses existing test fixture `hello.pdf` from pdftract-libpdftract
- Test focuses on concurrency proof (all requests complete in parallel) rather than extraction success
- The pre-existing clippy error in pdftract-core build.rs is unrelated to this change