docs(pdftract-45vo7): document Ruby SDK completion status

The Ruby SDK structure is in place with all 9 contract methods, 8 exception classes, and the Argo workflow template for RubyGems publish is synced to declarative-config. This is a v1.1+ deferred task. Ruby is not installed on the build server, preventing local build/test verification. The SDK should be moved to a separate repo (github.com/jedarden/pdftract-ruby) when the v1.1+ release wave begins. Verification note: notes/pdftract-45vo7.md
2026-06-01 10:20:19 -04:00 · 2026-06-01 10:20:19 -04:00 · b0b73c3c4a
commit b0b73c3c4a
parent 54d63c945a
1 changed files with 156 additions and 0 deletions
--- a/notes/pdftract-45vo7.md
+++ b/notes/pdftract-45vo7.md
@ -0,0 +1,156 @@
+# Bead pdftract-45vo7: Ruby SDK + RubyGems Publish
+
+## Status: Partially Complete (v1.1+ Deferred)
+
+This bead is marked as v1.1+ deferred (lower priority than v1.0 SDKs). The Ruby SDK structure is in place but Ruby is not installed on the build server, preventing local build/test verification.
+
+## Completed Components
+
+### 1. Ruby SDK Structure (pdftract-ruby/)
+
+All required files are present and properly structured:
+
+- **lib/pdftract.rb** - Main module with convenience methods
+- **lib/pdftract/client.rb** - Client class with all 9 contract methods:
+  - `extract(source, **options)` → Document
+  - `extract_text(source, **options)` → String
+  - `extract_markdown(source, **options)` → String
+  - `extract_stream(source, **options)` → Enumerator<Page>
+  - `search(source, pattern, **options)` → Enumerator<Match>
+  - `get_metadata(source, **options)` → Metadata
+  - `hash(source, **options)` → Fingerprint
+  - `classify(source)` → Classification
+  - `verify_receipt(path, receipt)` → Boolean
+- **lib/pdftract/models.rb** - Data classes using Ruby 3.2+ Data.define
+- **lib/pdftract/errors.rb** - 8 exception classes inheriting from Pdftract::Error
+- **lib/pdftract/source.rb** - Source classes (PathSource, URLSource, BytesSource)
+- **pdftract.gemspec** - Gem specification (version 1.0.0)
+- **Rakefile** - Test and build tasks
+- **test/conformance_test.rb** - Minitest conformance suite
+- **GENERATED** marker file
+
+### 2. Argo Workflow Template (.ci/argo-workflows/pdftract-ruby-publish.yaml)
+
+The RubyGems publish workflow is complete and synced to declarative-config:
+
+- Location: `~/declarative-config/k8s/iad-ci/argo-workflows/pdftract-ruby-publish.yaml`
+- Steps: clone → sync-version → bundle-install → conformance → build → publish
+- RubyGems API key: from ESO Secret `rubygems-api-key-pdftract`
+- Container: `ruby:3.2-slim`
+- Idempotent re-run logic (checks if version already exists)
+
+### 3. Code Generator Templates
+
+Templates exist at `templates/sdk-skeleton/ruby/`:
+- pdftract.gemspec.tera
+- lib/pdftract.rb.tera
+- lib/pdftract/codegen/types.rb.tera
+- lib/pdftract/codegen/errors.rb.tera
+- lib/pdftract/codegen/methods.rb.tera
+- test/codegen/conformance_test.rb.tera
+- README.md.tera
+- GENERATED.tera
+
+## Remaining Work (Requires Ruby Installation)
+
+### 1. Separate Repo Setup
+
+The task specifies the Ruby SDK should be in `github.com/jedarden/pdftract-ruby` as a separate repo. Currently it's in `pdftract-ruby/` within the main pdftract repo.
+
+**Action needed:**
+- Create `github.com/jedarden/pdftract-ruby` repository
+- Move `pdftract-ruby/` contents to the new repo
+- Update Argo workflow to clone from the correct location
+
+### 2. Build Verification
+
+Ruby is not installed on the build server:
+```
+$ gem build pdftract.gemspec
+bash: gem: command not found
+```
+
+**Action needed:**
+- Install Ruby 3.2+ and bundler on build server OR
+- Run build/test in CI container (ruby:3.2-slim) only
+
+### 3. Conformance Test Verification
+
+The conformance test exists but requires:
+- A built `pdftract` binary in PATH
+- Test fixtures from `tests/sdk-conformance/fixtures/`
+
+**Action needed:**
+- Ensure test fixtures are included in Ruby SDK repo
+- Run `bundle exec rake test:conformance` in CI environment
+
+## Implementation Details
+
+### Error Handling (Exit Code → Exception)
+
+| Exit Code | Exception | Description |
+|-----------|-----------|-------------|
+| 2 | CorruptPdfError | The PDF file is corrupt or invalid |
+| 3 | EncryptionError | PDF is encrypted, password missing/wrong |
+| 4 | SourceUnreachableError | Source (file/URL) is unreadable |
+| 5 | RemoteFetchInterruptedError | Network interrupted during fetch |
+| 6 | TlsError | TLS certificate validation failed |
+| 10 | ReceiptVerifyError | Receipt verification failed |
+
+### Option Naming Convention
+
+CLI `--ocr-language` → Ruby `ocr_language` (snake_case kwargs)
+
+Example:
+```ruby
+client.extract(source, ocr_language: "eng", pages: "1-3")
+```
+
+### Streaming with Enumerator
+
+The `extract_stream` and `search` methods return Ruby Enumerators that lazily parse NDJSON:
+
+```ruby
+client.extract_stream(source).each do |page|
+  puts "Page #{page.page}: #{page.spans.map(&:text).join}"
+end
+```
+
+### Source Handling
+
+Three source types:
+- `PathSource` - local filesystem path
+- `URLSource` - remote URL (passed as `--url <url>`)
+- `BytesSource` - in-memory bytes (written to temp file, cleaned up)
+
+## Recommendations
+
+Since this is v1.1+ deferred:
+
+1. **Keep current structure** - The in-tree `pdftract-ruby/` is fine for now
+2. **Complete on v1.1+ release** - When Ruby SDK becomes priority:
+   - Create separate `github.com/jedarden/pdftract-ruby` repo
+   - Set up Ruby build environment in CI
+   - Run full conformance test suite
+   - Publish to RubyGems
+
+## Files Modified
+
+None - this is a documentation bead. The Ruby SDK structure and Argo workflow template were already in place.
+
+## Commit
+
+```
+docs(pdftract-45vo7): document Ruby SDK completion status
+
+The Ruby SDK structure is in place with all 9 contract methods,
+8 exception classes, and the Argo workflow template for RubyGems
+publish is synced to declarative-config.
+
+This is a v1.1+ deferred task. Ruby is not installed on the build
+server, preventing local build/test verification. The SDK should
+be moved to a separate repo (github.com/jedarden/pdftract-ruby)
+when the v1.1+ release wave begins.
+
+Verification note: notes/pdftract-45vo7.md
+```