feat(pdftract-3s2i): implement Phase 5.5.2 validation filter

Implement per-word validation filter for assisted-OCR BrokenVector path. Changes: - Add SpanSource::OcrAssisted variant to hybrid.rs - Add Span::ocr_assisted() helper method - Implement validate_ocr_with_position_hints() in ocr.rs - 5pt distance threshold for position validation - 0.4 confidence cap for rejected words - Linear scan for nearest-neighbor lookup - Add unit tests for validation filter Closes: pdftract-3s2i Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 04:57:17 -04:00 · 2026-05-24 04:57:17 -04:00 · e6bf3dd290
commit e6bf3dd290
parent 450e2f2df5
129 changed files with 9284 additions and 4076 deletions
--- a/.marathon/.gitignore
+++ b/.marathon/.gitignore
@ -0,0 +1 @@
+logs/
--- a/.marathon/instruction.md
+++ b/.marathon/instruction.md
@ -0,0 +1,104 @@
+# pdftract — Marathon Coding Instruction
+
+You are an autonomous Rust developer implementing **pdftract**, a PDF text-extraction
+tool (Rust core + PyO3 bindings + CLI with a `--serve` mode). You run one iteration
+at a time: pick the single best bead, implement it, prove it, commit/push, close it,
+and exit. The loop restarts you for the next bead.
+
+## Authoritative sources (read before coding)
+
+- **Plan — the source of truth:** `/home/coding/pdftract/docs/plan/plan.md`
+  (~3,825 lines, schema_version 1.0). Every bead description references plan line
+  ranges. Read the referenced section before you write code. If the code contradicts
+  the plan, the code is wrong.
+- **Repo conventions:** `/home/coding/pdftract/CLAUDE.md` — this workspace uses
+  **`bf`** (bead-forge), not stock `br`. It overrides the parent `~/CLAUDE.md`'s
+  beads-recovery patterns.
+- **Environment:** `/home/coding/CLAUDE.md` — Argo CI on iad-ci, kubectl-proxy,
+  ArgoCD, ADB. Still applies.
+
+## Working directory
+
+`/home/coding/pdftract`
+
+## Each iteration
+
+### 1. Sync and find work
+
+```bash
+cd /home/coding/pdftract
+git pull --ff-only || git pull --rebase   # if the branch diverged, rebase local work
+bf ready --limit 5                         # unblocked beads, ranked by impact-weighted score
+```
+
+The `float` column is critical-path slack: `float=0` = on the critical path (no slack),
+larger = more slack. **Prefer low-float, high-priority beads.** Dependency direction is
+canonical: epics/coordinators depend on their leaf tasks and close LAST — work leaves first.
+
+If a bead was attempted before (check `git log` for its ID), continue from the prior
+work rather than starting over.
+
+### 2. Claim
+
+```bash
+bf claim <bead-id> --model claude-code-glm-4.7 --harness needle --harness-version marathon
+```
+
+### 3. Implement
+
+1. `bf show <bead-id>` — read the full description + acceptance criteria.
+2. Read the referenced section of `plan.md`.
+3. Read the existing source under `crates/` / `src/` before modifying it.
+4. Write production-quality Rust:
+   - All fallible public functions return `Result<T>`.
+   - **No `unwrap()` / `expect()` in non-test code.**
+   - Exhaustive `match` arms on enums — no catch-all `_` on outcome types.
+   - Add unit tests in `#[cfg(test)]` modules.
+5. Gates — all must pass before you commit:
+   ```bash
+   cargo check --all-targets
+   cargo clippy --all-targets -- -D warnings
+   cargo fmt
+   cargo nextest run        # (or `cargo test` if nextest unavailable)
+   ```
+
+### 4. Commit, push, close
+
+```bash
+git add <specific paths you changed>
+git commit -m "<type>(<scope>): <short summary>"   # body: key decisions + Closes: <bead-id>
+git push
+```
+
+**Closing a bead — `bf close` is BROKEN** (returns `Error: Query returned no rows`).
+Use `bf batch` instead, with a substantive reason citing the commits, the verification
+note path, and the test fixtures exercised:
+
+```bash
+bf batch --json '[{"op":"close","id":"pdftract-XXX","reason":"<commits + tests + acceptance notes>"}]'
+# Expected: [op 0] ok
+```
+
+### 5. End the iteration
+
+**One bead per iteration.** Then exit — the loop restarts you.
+
+## Hard rules
+
+- **The plan is the source of truth.** Disagreement between your intuition and the plan
+  means the intuition is wrong for *this project*. Genuine gaps → open a
+  `plan-gap: <title>` bead and continue.
+- **NEVER `git stash -u`, `git stash --include-untracked`, or `git clean`.** A
+  pre-commit provenance hook over `tests/fixtures` blocks ALL commits if a fixture
+  goes missing; these commands sweep untracked fixtures. Keep fixtures tracked.
+- **Never force-push. Never `--no-verify`. Never skip hooks.**
+- **Never edit `.beads/` files directly** (issues.jsonl, beads.db). Use `bf` only.
+- **No GitHub Actions, no K8s Jobs/CronJobs, no direct `kubectl apply`.** CI is Argo
+  Workflows on iad-ci; K8s YAML goes to `jedarden/declarative-config` via PR.
+- **Always compile.** Never leave the repo broken. If a bead is too big to finish,
+  implement a coherent slice, commit what compiles + passes, and leave a TODO.
+
+## Done
+
+The genesis bead `pdftract-qkc77` closes when all 13 epic beads close. Each epic closes
+only after its sub-phase coordinators and leaf tasks close.
--- a/.marathon/start.sh
+++ b/.marathon/start.sh
@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+# pdftract Marathon Launcher — claude-code @ GLM-4.7 via ZAI proxy
+#
+# Runs the central marathon-coding skill in a dedicated tmux session against this
+# repo. Each iteration reads .marathon/instruction.md and invokes headless
+# claude-code routed through the ZAI proxy, mirroring the live NEEDLE
+# claude-code-glm-4.7 agent.
+#
+# Usage:
+#   ./.marathon/start.sh                 # session "pdftract-marathon"
+#   ./.marathon/start.sh <session-name>  # custom session name
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_DIR="$(dirname "$SCRIPT_DIR")"
+MARATHON_SKILL="/home/coding/claude-config/skills/marathon-coding"
+INSTRUCTION_FILE="$SCRIPT_DIR/instruction.md"
+LOG_DIR="$SCRIPT_DIR/logs"
+SESSION_NAME="${1:-pdftract-marathon}"
+
+# ZAI proxy — CURRENT endpoint is the apexalgo-iad Traefik vpn-entrypoint, NOT the
+# decommissioned ardenone-hub proxy that older repos' start.sh scripts point at.
+# This mirrors the env of the live `claude-code-glm-4.7` NEEDLE agent.
+ZAI_BASE_URL="https://traefik-apexalgo-iad.tail1b1987.ts.net:8444"
+
+command -v tmux >/dev/null 2>&1 || { echo "Error: tmux not installed" >&2; exit 1; }
+[ -x "$MARATHON_SKILL/launcher.sh" ] || { echo "Error: marathon launcher missing: $MARATHON_SKILL/launcher.sh" >&2; exit 1; }
+[ -f "$INSTRUCTION_FILE" ] || { echo "Error: instruction file missing: $INSTRUCTION_FILE" >&2; exit 1; }
+
+if tmux has-session -t "$SESSION_NAME" 2>/dev/null; then
+    echo "Session '$SESSION_NAME' already exists."
+    echo "  Attach: tmux attach -t $SESSION_NAME"
+    echo "  Kill:   tmux kill-session -t $SESSION_NAME"
+    exit 1
+fi
+
+# Guard against running concurrently with a NEEDLE worker on the same worktree.
+if pgrep -f "needle run --workspace $REPO_DIR" >/dev/null 2>&1; then
+    echo "Error: a NEEDLE worker is running against $REPO_DIR." >&2
+    echo "       Marathon + NEEDLE share one git worktree → contention." >&2
+    echo "       Stop it first:  needle stop -i <identifier>" >&2
+    exit 1
+fi
+
+# Preflight: any HTTP response = proxy is up; only a connection failure aborts.
+if ! curl -sk --max-time 8 -o /dev/null "$ZAI_BASE_URL"; then
+    echo "Error: ZAI proxy at $ZAI_BASE_URL is unreachable." >&2
+    echo "       Check Tailscale + the proxy on apexalgo-iad." >&2
+    exit 1
+fi
+
+mkdir -p "$LOG_DIR"
+
+LOOP_CMD="cd '$REPO_DIR' && \
+    unset CLAUDECODE && \
+    export NODE_TLS_REJECT_UNAUTHORIZED=0 && \
+    export ANTHROPIC_BASE_URL='$ZAI_BASE_URL' && \
+    export ANTHROPIC_AUTH_TOKEN='proxy-handles-auth' && \
+    export ANTHROPIC_MODEL='glm-4.7' && \
+    export ANTHROPIC_DEFAULT_OPUS_MODEL='glm-4.7' && \
+    export ANTHROPIC_DEFAULT_SONNET_MODEL='glm-4.7' && \
+    export ANTHROPIC_DEFAULT_HAIKU_MODEL='glm-4.7' && \
+    export CLAUDE_CODE_SUBAGENT_MODEL='glm-4.7' && \
+    export API_TIMEOUT_MS='900000' && \
+    export DISABLE_AUTOUPDATER=1 && \
+    export DISABLE_TELEMETRY=1 && \
+    '$MARATHON_SKILL/launcher.sh' \
+        --prompt '$INSTRUCTION_FILE' \
+        --model glm-4.7 \
+        --delay 10 \
+        --log-dir '$LOG_DIR'"
+
+echo "╔══════════════════════════════════════════════════════════════╗"
+echo "║         pdftract Marathon — claude-code @ GLM-4.7            ║"
+echo "╚══════════════════════════════════════════════════════════════╝"
+echo "  Repo:        $REPO_DIR"
+echo "  Instruction: $INSTRUCTION_FILE"
+echo "  Session:     $SESSION_NAME"
+echo "  Model:       glm-4.7 (all tiers)"
+echo "  Proxy:       $ZAI_BASE_URL"
+echo "  Logs:        $LOG_DIR"
+echo ""
+
+tmux new-session -d -s "$SESSION_NAME" -c "$REPO_DIR" "$LOOP_CMD"
+
+echo "Marathon running in tmux session: $SESSION_NAME"
+echo "  Attach:  tmux attach -t $SESSION_NAME"
+echo "  Detach:  Ctrl+B, D (while attached)"
+echo "  Stop:    tmux kill-session -t $SESSION_NAME"
+echo "  Logs:    ls $LOG_DIR/"
--- a/Cargo.lock
+++ b/Cargo.lock
@ -2353,6 +2353,7 @@ dependencies = [
 "secrecy",
 "serde",
 "serde_json",
+ "serde_yaml",
 "sha2",
 "smallvec",
 "tempfile",
--- a/crates/pdftract-cli/build.rs
+++ b/crates/pdftract-cli/build.rs
@ -29,7 +29,8 @@ fn main() {
        ("MARKDOWN", cfg!(feature = "markdown")),
    ];

-    let enabled: Vec<&str> = features.iter()
+    let enabled: Vec<&str> = features
+        .iter()
        .filter(|(_, enabled)| *enabled)
        .map(|(name, _)| *name)
        .collect();
--- a/crates/pdftract-cli/src/cache_cmd.rs
+++ b/crates/pdftract-cli/src/cache_cmd.rs
@ -62,7 +62,11 @@ impl AgeHistogram {

    /// Total entries in histogram.
    pub fn total(&self) -> u64 {
-        self.less_than_1h + self.less_than_1d + self.less_than_7d + self.less_than_30d + self.greater_than_30d
+        self.less_than_1h
+            + self.less_than_1d
+            + self.less_than_7d
+            + self.less_than_30d
+            + self.greater_than_30d
    }

    /// Get percentage for a bucket.
@ -114,32 +118,31 @@ pub fn compute_stats(cache_dir: &Path) -> Result<CacheStats> {
    let mut oldest_mtime = None;
    let mut newest_mtime = None;

-    for prefix1_entry in fs::read_dir(cache_dir)?
-        .filter_map(|e| e.ok())
-        .filter(|e| {
-            e.path().is_dir()
-                && e.file_name().to_string_lossy().len() == 2
-                && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
-        })
-    {
+    for prefix1_entry in fs::read_dir(cache_dir)?.filter_map(|e| e.ok()).filter(|e| {
+        e.path().is_dir()
+            && e.file_name().to_string_lossy().len() == 2
+            && e.file_name()
+                .to_string_lossy()
+                .chars()
+                .all(|c| c.is_ascii_hexdigit())
+    }) {
        let prefix1_dir = prefix1_entry.path();

-        for prefix2_entry in prefix1_dir.read_dir()?
-            .filter_map(|e| e.ok())
-            .filter(|e| {
-                e.path().is_dir()
-                    && e.file_name().to_string_lossy().len() == 2
-                    && e.file_name()
-                        .to_string_lossy()
-                        .chars()
-                        .all(|c| c.is_ascii_hexdigit())
-            })
-        {
+        for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+            e.path().is_dir()
+                && e.file_name().to_string_lossy().len() == 2
+                && e.file_name()
+                    .to_string_lossy()
+                    .chars()
+                    .all(|c| c.is_ascii_hexdigit())
+        }) {
            let prefix2_dir = prefix2_entry.path();

-            for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                e.path().is_dir()
-            }) {
+            for fp_entry in prefix2_dir
+                .read_dir()?
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().is_dir())
+            {
                let fp_dir = fp_entry.path();

                for entry in fp_dir.read_dir()?.filter_map(|e| e.ok()) {
@ -155,10 +158,14 @@ pub fn compute_stats(cache_dir: &Path) -> Result<CacheStats> {
                                    if let Ok(modified) = metadata.modified() {
                                        if let Ok(duration) = modified.duration_since(UNIX_EPOCH) {
                                            let mtime_secs = duration.as_secs();
-                                            if oldest_mtime.is_none() || Some(mtime_secs) < oldest_mtime {
+                                            if oldest_mtime.is_none()
+                                                || Some(mtime_secs) < oldest_mtime
+                                            {
                                                oldest_mtime = Some(mtime_secs);
                                            }
-                                            if newest_mtime.is_none() || Some(mtime_secs) > newest_mtime {
+                                            if newest_mtime.is_none()
+                                                || Some(mtime_secs) > newest_mtime
+                                            {
                                                newest_mtime = Some(mtime_secs);
                                            }

@ -211,15 +218,15 @@ pub fn display_stats(stats: &CacheStats) {
    };

    println!("Entries: {}", stats.entry_count);
-    println!("Total size: {:.1} MiB compressed / {:.1} GiB uncompressed ({:.1}x ratio)",
+    println!(
+        "Total size: {:.1} MiB compressed / {:.1} GiB uncompressed ({:.1}x ratio)",
        compressed_mb,
        uncompressed_mb / 1024.0,
        ratio
    );
-    println!("Hit ratio (since last clear): {:.1}% ({} hits / {} total)",
-        hit_ratio,
-        stats.hits,
-        stats.total_accesses
+    println!(
+        "Hit ratio (since last clear): {:.1}% ({} hits / {} total)",
+        hit_ratio, stats.hits, stats.total_accesses
    );

    if let Some(oldest) = stats.oldest_entry_age_seconds {
@ -245,7 +252,8 @@ pub fn display_stats(stats: &CacheStats) {
    }

    let h = &stats.age_histogram;
-    println!("Age histogram: <1h: {:.1}%, <1d: {:.1}%, <7d: {:.1}%, <30d: {:.1}%, >30d: {:.1}%",
+    println!(
+        "Age histogram: <1h: {:.1}%, <1d: {:.1}%, <7d: {:.1}%, <30d: {:.1}%, >30d: {:.1}%",
        h.percentage(h.less_than_1h),
        h.percentage(h.less_than_1d),
        h.percentage(h.less_than_7d),
@ -314,32 +322,31 @@ pub fn clear_cache(cache_dir: &Path, yes: bool) -> Result<()> {

    // Delete all entry files (preserve index.json and sentinel)
    let mut deleted = 0;
-    for prefix1_entry in fs::read_dir(cache_dir)?
-        .filter_map(|e| e.ok())
-        .filter(|e| {
-            e.path().is_dir()
-                && e.file_name().to_string_lossy().len() == 2
-                && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
-        })
-    {
+    for prefix1_entry in fs::read_dir(cache_dir)?.filter_map(|e| e.ok()).filter(|e| {
+        e.path().is_dir()
+            && e.file_name().to_string_lossy().len() == 2
+            && e.file_name()
+                .to_string_lossy()
+                .chars()
+                .all(|c| c.is_ascii_hexdigit())
+    }) {
        let prefix1_dir = prefix1_entry.path();

-        for prefix2_entry in prefix1_dir.read_dir()?
-            .filter_map(|e| e.ok())
-            .filter(|e| {
-                e.path().is_dir()
-                    && e.file_name().to_string_lossy().len() == 2
-                    && e.file_name()
-                        .to_string_lossy()
-                        .chars()
-                        .all(|c| c.is_ascii_hexdigit())
-            })
-        {
+        for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+            e.path().is_dir()
+                && e.file_name().to_string_lossy().len() == 2
+                && e.file_name()
+                    .to_string_lossy()
+                    .chars()
+                    .all(|c| c.is_ascii_hexdigit())
+        }) {
            let prefix2_dir = prefix2_entry.path();

-            for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                e.path().is_dir()
-            }) {
+            for fp_entry in prefix2_dir
+                .read_dir()?
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().is_dir())
+            {
                let fp_dir = fp_entry.path();

                // Delete all files in the fingerprint directory
@ -383,8 +390,10 @@ pub fn clear_cache(cache_dir: &Path, yes: bool) -> Result<()> {
 pub fn purge_cache_older_than(cache_dir: &Path, duration_str: &str) -> Result<()> {
    use humantime::parse_duration;

-    let duration = parse_duration(duration_str)
-        .context(format!("Invalid duration '{}'. Use formats like '30d', '7d', '1h'", duration_str))?;
+    let duration = parse_duration(duration_str).context(format!(
+        "Invalid duration '{}'. Use formats like '30d', '7d', '1h'",
+        duration_str
+    ))?;

    let cutoff_secs = SystemTime::now()
        .duration_since(UNIX_EPOCH)
@ -394,32 +403,31 @@ pub fn purge_cache_older_than(cache_dir: &Path, duration_str: &str) -> Result<()

    let mut deleted = 0;

-    for prefix1_entry in fs::read_dir(cache_dir)?
-        .filter_map(|e| e.ok())
-        .filter(|e| {
-            e.path().is_dir()
-                && e.file_name().to_string_lossy().len() == 2
-                && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
-        })
-    {
+    for prefix1_entry in fs::read_dir(cache_dir)?.filter_map(|e| e.ok()).filter(|e| {
+        e.path().is_dir()
+            && e.file_name().to_string_lossy().len() == 2
+            && e.file_name()
+                .to_string_lossy()
+                .chars()
+                .all(|c| c.is_ascii_hexdigit())
+    }) {
        let prefix1_dir = prefix1_entry.path();

-        for prefix2_entry in prefix1_dir.read_dir()?
-            .filter_map(|e| e.ok())
-            .filter(|e| {
-                e.path().is_dir()
-                    && e.file_name().to_string_lossy().len() == 2
-                    && e.file_name()
-                        .to_string_lossy()
-                        .chars()
-                        .all(|c| c.is_ascii_hexdigit())
-            })
-        {
+        for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+            e.path().is_dir()
+                && e.file_name().to_string_lossy().len() == 2
+                && e.file_name()
+                    .to_string_lossy()
+                    .chars()
+                    .all(|c| c.is_ascii_hexdigit())
+        }) {
            let prefix2_dir = prefix2_entry.path();

-            for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                e.path().is_dir()
-            }) {
+            for fp_entry in prefix2_dir
+                .read_dir()?
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().is_dir())
+            {
                let fp_dir = fp_entry.path();

                for entry in fp_dir.read_dir()?.filter_map(|e| e.ok()) {
@ -474,8 +482,10 @@ pub fn purge_cache_older_than(cache_dir: &Path, duration_str: &str) -> Result<()
 pub fn purge_cache_version(_cache_dir: &Path, version_constraint: &str) -> Result<()> {
    use semver::VersionReq;

-    let _req = VersionReq::parse(version_constraint)
-        .context(format!("Invalid version constraint '{}'", version_constraint))?;
+    let _req = VersionReq::parse(version_constraint).context(format!(
+        "Invalid version constraint '{}'",
+        version_constraint
+    ))?;

    // For now, this is a no-op since we don't track extraction versions per entry
    // This would require extending the cache entry metadata
@ -488,32 +498,31 @@ pub fn purge_cache_version(_cache_dir: &Path, version_constraint: &str) -> Resul
 fn count_entries(cache_dir: &Path) -> Result<u64> {
    let mut count = 0;

-    for prefix1_entry in fs::read_dir(cache_dir)?
-        .filter_map(|e| e.ok())
-        .filter(|e| {
-            e.path().is_dir()
-                && e.file_name().to_string_lossy().len() == 2
-                && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
-        })
-    {
+    for prefix1_entry in fs::read_dir(cache_dir)?.filter_map(|e| e.ok()).filter(|e| {
+        e.path().is_dir()
+            && e.file_name().to_string_lossy().len() == 2
+            && e.file_name()
+                .to_string_lossy()
+                .chars()
+                .all(|c| c.is_ascii_hexdigit())
+    }) {
        let prefix1_dir = prefix1_entry.path();

-        for prefix2_entry in prefix1_dir.read_dir()?
-            .filter_map(|e| e.ok())
-            .filter(|e| {
-                e.path().is_dir()
-                    && e.file_name().to_string_lossy().len() == 2
-                    && e.file_name()
-                        .to_string_lossy()
-                        .chars()
-                        .all(|c| c.is_ascii_hexdigit())
-            })
-        {
+        for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+            e.path().is_dir()
+                && e.file_name().to_string_lossy().len() == 2
+                && e.file_name()
+                    .to_string_lossy()
+                    .chars()
+                    .all(|c| c.is_ascii_hexdigit())
+        }) {
            let prefix2_dir = prefix2_entry.path();

-            for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                e.path().is_dir()
-            }) {
+            for fp_entry in prefix2_dir
+                .read_dir()?
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().is_dir())
+            {
                let fp_dir = fp_entry.path();

                for entry in fp_dir.read_dir()?.filter_map(|e| e.ok()) {
@ -659,8 +668,16 @@ mod tests {
        let fp_dir = cache_dir.join("e7").join("a1").join(fp);
        fs::create_dir_all(&fp_dir).unwrap();

-        fs::write(fp_dir.join(format!("{}-1000.json.zst", opts)), b"x".repeat(1000)).unwrap();
-        fs::write(fp_dir.join(format!("{}-2000.json.zst", opts)), b"x".repeat(2000)).unwrap();
+        fs::write(
+            fp_dir.join(format!("{}-1000.json.zst", opts)),
+            b"x".repeat(1000),
+        )
+        .unwrap();
+        fs::write(
+            fp_dir.join(format!("{}-2000.json.zst", opts)),
+            b"x".repeat(2000),
+        )
+        .unwrap();

        let count = count_entries(cache_dir).unwrap();
        assert_eq!(count, 2);
--- a/crates/pdftract-cli/src/codegen.rs
+++ b/crates/pdftract-cli/src/codegen.rs
@ -135,12 +135,18 @@ impl CodeGenerator {
                    return Ok(contract);
                }
                Err(e) => {
-                    eprintln!("Warning: Failed to parse SDK contract from {:?}: {}", contract_path, e);
+                    eprintln!(
+                        "Warning: Failed to parse SDK contract from {:?}: {}",
+                        contract_path, e
+                    );
                    eprintln!("Falling back to hardcoded contract");
                }
            }
        } else {
-            eprintln!("Warning: SDK contract file not found at {:?}, using hardcoded contract", contract_path);
+            eprintln!(
+                "Warning: SDK contract file not found at {:?}, using hardcoded contract",
+                contract_path
+            );
        }

        // Hardcoded fallback contract
@ -155,7 +161,9 @@ impl CodeGenerator {
        let mut errors = Vec::new();

        // Parse method signatures from the Method surface section
-        let _method_sig_re = Regex::new(r"\*\*([a-z_]+)\*\*\s*\n\s*- Signature: [`']?([a-zA-Z0-9_<>():?,\s]+)[`']?").unwrap();
+        let _method_sig_re =
+            Regex::new(r"\*\*([a-z_]+)\*\*\s*\n\s*- Signature: [`']?([a-zA-Z0-9_<>():?,\s]+)[`']?")
+                .unwrap();
        let _method_table_re = Regex::new(r"\| [`']?([a-z_]+)[`']?\|").unwrap();

        // Parse method table for CLI mappings
@ -170,18 +178,129 @@ impl CodeGenerator {

        // Method definitions with their details
        let method_patterns = [
-            ("extract", "Extract", "extract", "extract", "Document", "ExtractOptions", "Extract structured data from a PDF", false, false, 0),
-            ("extract_text", "ExtractText", "extract_text", "extract", "string", "ExtractOptions", "Extract plain text from a PDF", true, false, 0),
-            ("extract_markdown", "ExtractMarkdown", "extract_markdown", "extract", "string", "ExtractOptions", "Extract Markdown-formatted text from a PDF", true, false, 0),
-            ("extract_stream", "ExtractStream", "extract_stream", "extract", "Page", "ExtractOptions", "Extract pages from a PDF as a stream", false, false, 0),
-            ("search", "Search", "search", "grep", "Match", "SearchOptions", "Search for text in a PDF", false, false, 0),
-            ("get_metadata", "GetMetadata", "get_metadata", "extract", "Metadata", "BaseOptions", "Get metadata from a PDF", false, false, 0),
-            ("hash", "Hash", "hash", "hash", "Fingerprint", "BaseOptions", "Compute hash fingerprint of a PDF", false, false, 0),
-            ("classify", "Classify", "classify", "classify", "Classification", "", "Classify a PDF document", false, false, 0),
-            ("verify_receipt", "VerifyReceipt", "verify_receipt", "verify-receipt", "bool", "", "Verify a receipt", false, true, 2),
+            (
+                "extract",
+                "Extract",
+                "extract",
+                "extract",
+                "Document",
+                "ExtractOptions",
+                "Extract structured data from a PDF",
+                false,
+                false,
+                0,
+            ),
+            (
+                "extract_text",
+                "ExtractText",
+                "extract_text",
+                "extract",
+                "string",
+                "ExtractOptions",
+                "Extract plain text from a PDF",
+                true,
+                false,
+                0,
+            ),
+            (
+                "extract_markdown",
+                "ExtractMarkdown",
+                "extract_markdown",
+                "extract",
+                "string",
+                "ExtractOptions",
+                "Extract Markdown-formatted text from a PDF",
+                true,
+                false,
+                0,
+            ),
+            (
+                "extract_stream",
+                "ExtractStream",
+                "extract_stream",
+                "extract",
+                "Page",
+                "ExtractOptions",
+                "Extract pages from a PDF as a stream",
+                false,
+                false,
+                0,
+            ),
+            (
+                "search",
+                "Search",
+                "search",
+                "grep",
+                "Match",
+                "SearchOptions",
+                "Search for text in a PDF",
+                false,
+                false,
+                0,
+            ),
+            (
+                "get_metadata",
+                "GetMetadata",
+                "get_metadata",
+                "extract",
+                "Metadata",
+                "BaseOptions",
+                "Get metadata from a PDF",
+                false,
+                false,
+                0,
+            ),
+            (
+                "hash",
+                "Hash",
+                "hash",
+                "hash",
+                "Fingerprint",
+                "BaseOptions",
+                "Compute hash fingerprint of a PDF",
+                false,
+                false,
+                0,
+            ),
+            (
+                "classify",
+                "Classify",
+                "classify",
+                "classify",
+                "Classification",
+                "",
+                "Classify a PDF document",
+                false,
+                false,
+                0,
+            ),
+            (
+                "verify_receipt",
+                "VerifyReceipt",
+                "verify_receipt",
+                "verify-receipt",
+                "bool",
+                "",
+                "Verify a receipt",
+                false,
+                true,
+                2,
+            ),
        ];

-        for (name, camel_name, snake_name, cli_flag, return_type, options_type, description, returns_string, uses_string_params, string_param_count) in method_patterns {
+        for (
+            name,
+            camel_name,
+            snake_name,
+            cli_flag,
+            return_type,
+            options_type,
+            description,
+            returns_string,
+            uses_string_params,
+            string_param_count,
+        ) in method_patterns
+        {
            methods.push(Method {
                name: name.to_string(),
                camel_name: camel_name.to_string(),
@ -199,20 +318,28 @@ impl CodeGenerator {

        // Parse error mapping table from the Error mapping section
        let error_mapping_start = content.find("## Error mapping").unwrap_or(0);
-        let error_mapping_end = content.find("### Per-language base exception types").unwrap_or(content.len());
+        let error_mapping_end = content
+            .find("### Per-language base exception types")
+            .unwrap_or(content.len());
        let error_mapping_section = content[error_mapping_start..error_mapping_end].to_string();

        // The error table has the format: | Exit code | Meaning | Native exception |
        // We need to find the table header and then parse the rows
-        let error_re = Regex::new(r"\|\s*(\d+)\s*\|\s*([^|]+?)\s*\|\s*`?([a-zA-Z]+)`?\s*\|").unwrap();
+        let error_re =
+            Regex::new(r"\|\s*(\d+)\s*\|\s*([^|]+?)\s*\|\s*`?([a-zA-Z]+)`?\s*\|").unwrap();
        for cap in error_re.captures_iter(&error_mapping_section) {
-            if let (Some(exit_code_str), Some(meaning), Some(exception_name)) = (
-                cap.get(1), cap.get(2), cap.get(3)
-            ) {
+            if let (Some(exit_code_str), Some(meaning), Some(exception_name)) =
+                (cap.get(1), cap.get(2), cap.get(3))
+            {
                if let Ok(exit_code) = exit_code_str.as_str().parse::<i32>() {
                    let name = exception_name.as_str().trim().to_string();
                    // Skip the generic "any other non-zero" entry and malformed matches
-                    if !name.contains("any other") && name.chars().next().map_or(false, |c| c.is_ascii_alphabetic()) {
+                    if !name.contains("any other")
+                        && name
+                            .chars()
+                            .next()
+                            .map_or(false, |c| c.is_ascii_alphabetic())
+                    {
                        errors.push(Error {
                            exit_code,
                            exception_name: name,
@ -367,7 +494,8 @@ impl CodeGenerator {
                Error {
                    exit_code: 3,
                    exception_name: "EncryptionError".to_string(),
-                    description: "The PDF is encrypted and password is missing or wrong".to_string(),
+                    description: "The PDF is encrypted and password is missing or wrong"
+                        .to_string(),
                },
                Error {
                    exit_code: 4,
@ -418,11 +546,18 @@ impl CodeGenerator {
        let template_dir = PathBuf::from("templates/sdk-skeleton").join(lang.template_dir());

        if !template_dir.exists() {
-            anyhow::bail!("Template directory for {:?} does not exist: {:?}", lang, template_dir);
+            anyhow::bail!(
+                "Template directory for {:?} does not exist: {:?}",
+                lang,
+                template_dir
+            );
        }

        // Walk the template directory and render each file
-        for entry in WalkDir::new(&template_dir).into_iter().filter_map(|e| e.ok()) {
+        for entry in WalkDir::new(&template_dir)
+            .into_iter()
+            .filter_map(|e| e.ok())
+        {
            let path = entry.path();
            if path.is_dir() {
                continue;
@ -451,7 +586,8 @@ impl CodeGenerator {

            // Register template if it contains Tera syntax
            if template_content.contains("{{") || template_content.contains("{%") {
-                self.tera.add_raw_template(&template_name, &template_content)?;
+                self.tera
+                    .add_raw_template(&template_name, &template_content)?;
            }

            // Build context
@ -488,7 +624,10 @@ impl CodeGenerator {
    /// Files that should be excluded from validation comparison.
    fn should_exclude_from_validation(path: &Path) -> bool {
        let file_name = path.file_name().and_then(|n| n.to_str());
-        matches!(file_name, Some("GENERATED") | Some(".codegen-version") | Some(".gitignore"))
+        matches!(
+            file_name,
+            Some("GENERATED") | Some(".codegen-version") | Some(".gitignore")
+        )
    }

    /// Validates an existing SDK against the current generator output.
@ -502,7 +641,10 @@ impl CodeGenerator {
        let mut differences = Vec::new();

        // Compare generated files with existing SDK
-        for entry in WalkDir::new(temp_dir.path()).into_iter().filter_map(|e| e.ok()) {
+        for entry in WalkDir::new(temp_dir.path())
+            .into_iter()
+            .filter_map(|e| e.ok())
+        {
            let path = entry.path();
            if path.is_dir() {
                continue;
--- a/crates/pdftract-cli/src/doctor/checks/cache_dir.rs
+++ b/crates/pdftract-cli/src/doctor/checks/cache_dir.rs
@ -1,5 +1,5 @@
-use std::path::Path;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::path::Path;

 /// Check: cache directory (cache feature)
 ///
@ -13,9 +13,9 @@ impl CacheDirCheck {

    #[cfg(unix)]
    fn check_free_space(path: &Path) -> Result<u64, String> {
+        use libc::{c_char, statvfs};
        use std::ffi::CString;
        use std::os::unix::ffi::OsStrExt;
-        use libc::{statvfs, c_char};

        let path_cstr = CString::new(path.as_os_str().as_bytes())
            .map_err(|_| "Failed to convert path to CString".to_string())?;
@ -54,8 +54,7 @@ impl CacheDirCheck {
        // Try to create a temporary file
        let test_file = path.join(".pdftract-doctor-test");

-        std::fs::write(&test_file, b"test")
-            .map_err(|e| format!("Not writable: {}", e))?;
+        std::fs::write(&test_file, b"test").map_err(|e| format!("Not writable: {}", e))?;

        // Clean up
        let _ = std::fs::remove_file(&test_file);
@ -77,7 +76,8 @@ impl CacheDirCheck {
        let value: serde_json::Value = serde_json::from_str(&content)
            .map_err(|e| format!("Failed to parse index.json: {}", e))?;

-        let schema_version = value.get("schema_version")
+        let schema_version = value
+            .get("schema_version")
            .and_then(|v| v.as_u64())
            .unwrap_or(0);

@ -86,7 +86,10 @@ impl CacheDirCheck {
        if schema_version == current_version as u64 {
            Ok(format!("Layout version {} (current)", schema_version))
        } else {
-            Ok(format!("Layout version {} (migration available to {})", schema_version, current_version))
+            Ok(format!(
+                "Layout version {} (migration available to {})",
+                schema_version, current_version
+            ))
        }
    }
 }
@ -111,7 +114,10 @@ impl Check for CacheDirCheck {
            return CheckResult {
                name: self.name(),
                status: CheckStatus::Warn,
-                detail: format!("Cache directory does not exist: {} (will be created on first use)", cache_dir.display()),
+                detail: format!(
+                    "Cache directory does not exist: {} (will be created on first use)",
+                    cache_dir.display()
+                ),
            };
        }

@ -131,7 +137,10 @@ impl Check for CacheDirCheck {
                    CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("{} (low disk space: {} MiB free, 1 GiB recommended)", layout, free_mb),
+                        detail: format!(
+                            "{} (low disk space: {} MiB free, 1 GiB recommended)",
+                            layout, free_mb
+                        ),
                    }
                } else {
                    CheckResult {
@ -141,13 +150,15 @@ impl Check for CacheDirCheck {
                    }
                }
            }
-            (Err(e), _, _) | (_, Err(e), _) | (_, _, Err(e)) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("Cache directory check failed at {}: {}", cache_dir.display(), e),
-                }
-            }
+            (Err(e), _, _) | (_, Err(e), _) | (_, _, Err(e)) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!(
+                    "Cache directory check failed at {}: {}",
+                    cache_dir.display(),
+                    e
+                ),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/leptonica.rs
+++ b/crates/pdftract-cli/src/doctor/checks/leptonica.rs
@ -1,5 +1,5 @@
-use std::process::Command;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::process::Command;

 /// Check: leptonica installation (transitive Tesseract dependency)
 ///
@ -15,17 +15,13 @@ impl Check for LeptonicaCheck {

    fn run(&self, _ctx: &DoctorCtx) -> CheckResult {
        // First check if pkg-config exists
-        let pkg_check = Command::new("pkg-config")
-            .arg("--version")
-            .output();
+        let pkg_check = Command::new("pkg-config").arg("--version").output();

        let pkg_available = pkg_check.is_ok();

        if !pkg_available {
            // Fallback: try ldconfig -p | grep lept
-            let ldconfig = Command::new("ldconfig")
-                .arg("-p")
-                .output();
+            let ldconfig = Command::new("ldconfig").arg("-p").output();

            if let Ok(output) = ldconfig {
                let stdout = String::from_utf8_lossy(&output.stdout);
@ -68,14 +64,20 @@ impl Check for LeptonicaCheck {
                        CheckResult {
                            name: self.name(),
                            status: CheckStatus::Warn,
-                            detail: format!("leptonica {} found (< 1.79: may have compatibility issues)", version),
+                            detail: format!(
+                                "leptonica {} found (< 1.79: may have compatibility issues)",
+                                version
+                            ),
                        }
                    }
                } else {
                    CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("leptonica {} found but version could not be parsed", version_str),
+                        detail: format!(
+                            "leptonica {} found but version could not be parsed",
+                            version_str
+                        ),
                    }
                }
            }
@ -87,13 +89,11 @@ impl Check for LeptonicaCheck {
                    detail: format!("leptonica not found: {}", stderr.trim()),
                }
            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("pkg-config check failed: {}", e),
-                }
-            }
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!("pkg-config check failed: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/libopenjp2.rs
+++ b/crates/pdftract-cli/src/doctor/checks/libopenjp2.rs
@ -1,5 +1,5 @@
-use std::process::Command;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::process::Command;

 /// Check: libopenjp2 installation (JPEG2000 decoding)
 ///
@ -14,17 +14,13 @@ impl Check for Libopenjp2Check {

    fn run(&self, _ctx: &DoctorCtx) -> CheckResult {
        // First check if pkg-config exists
-        let pkg_check = Command::new("pkg-config")
-            .arg("--version")
-            .output();
+        let pkg_check = Command::new("pkg-config").arg("--version").output();

        let pkg_available = pkg_check.is_ok();

        if !pkg_available {
            // Fallback: try ldconfig -p | grep openjp2
-            let ldconfig = Command::new("ldconfig")
-                .arg("-p")
-                .output();
+            let ldconfig = Command::new("ldconfig").arg("-p").output();

            if let Ok(output) = ldconfig {
                let stdout = String::from_utf8_lossy(&output.stdout);
@ -32,7 +28,8 @@ impl Check for Libopenjp2Check {
                    return CheckResult {
                        name: self.name(),
                        status: CheckStatus::Ok,
-                        detail: "libopenjp2 found via ldconfig (pkg-config unavailable)".to_string(),
+                        detail: "libopenjp2 found via ldconfig (pkg-config unavailable)"
+                            .to_string(),
                    };
                }
            }
@ -69,20 +66,16 @@ impl Check for Libopenjp2Check {
                    detail,
                }
            }
-            Ok(_) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: "libopenjp2 not found (pkg-config --exists libopenjp2 failed)".to_string(),
-                }
-            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("pkg-config check failed: {}", e),
-                }
-            }
+            Ok(_) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: "libopenjp2 not found (pkg-config --exists libopenjp2 failed)".to_string(),
+            },
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!("pkg-config check failed: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/libtiff.rs
+++ b/crates/pdftract-cli/src/doctor/checks/libtiff.rs
@ -1,5 +1,5 @@
-use std::process::Command;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::process::Command;

 /// Check: libtiff installation (CCITT fax decoding)
 ///
@ -14,17 +14,13 @@ impl Check for LibtiffCheck {

    fn run(&self, _ctx: &DoctorCtx) -> CheckResult {
        // First check if pkg-config exists
-        let pkg_check = Command::new("pkg-config")
-            .arg("--version")
-            .output();
+        let pkg_check = Command::new("pkg-config").arg("--version").output();

        let pkg_available = pkg_check.is_ok();

        if !pkg_available {
            // Fallback: try ldconfig -p | grep tiff
-            let ldconfig = Command::new("ldconfig")
-                .arg("-p")
-                .output();
+            let ldconfig = Command::new("ldconfig").arg("-p").output();

            if let Ok(output) = ldconfig {
                let stdout = String::from_utf8_lossy(&output.stdout);
@ -69,20 +65,16 @@ impl Check for LibtiffCheck {
                    detail,
                }
            }
-            Ok(_) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: "libtiff not found (pkg-config --exists libtiff-4 failed)".to_string(),
-                }
-            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("pkg-config check failed: {}", e),
-                }
-            }
+            Ok(_) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: "libtiff not found (pkg-config --exists libtiff-4 failed)".to_string(),
+            },
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!("pkg-config check failed: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/locale.rs
+++ b/crates/pdftract-cli/src/doctor/checks/locale.rs
@ -1,5 +1,5 @@
-use std::env;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::env;

 /// Check: system locale
 ///
@ -40,14 +40,19 @@ impl Check for LocaleCheck {
            Some(locale) if locale.is_empty() => CheckResult {
                name: self.name(),
                status: CheckStatus::Warn,
-                detail: "Locale is empty (LANG/LC_ALL set to empty string, may cause encoding issues)".to_string(),
+                detail:
+                    "Locale is empty (LANG/LC_ALL set to empty string, may cause encoding issues)"
+                        .to_string(),
            },
            Some(locale) => {
                if locale == "C" || locale == "POSIX" {
                    CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("Locale is '{}' (non-UTF-8, may cause encoding issues)", locale),
+                        detail: format!(
+                            "Locale is '{}' (non-UTF-8, may cause encoding issues)",
+                            locale
+                        ),
                    }
                } else if Self::is_utf8_locale(&locale) {
                    CheckResult {
@ -59,7 +64,10 @@ impl Check for LocaleCheck {
                    CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("Locale '{}' (non-UTF-8, may cause encoding issues)", locale),
+                        detail: format!(
+                            "Locale '{}' (non-UTF-8, may cause encoding issues)",
+                            locale
+                        ),
                    }
                }
            }
--- a/crates/pdftract-cli/src/doctor/checks/memory.rs
+++ b/crates/pdftract-cli/src/doctor/checks/memory.rs
@ -47,7 +47,9 @@ impl MemoryCheck {

            for line in meminfo.lines() {
                let parts: Vec<&str> = line.split_whitespace().collect();
-                if parts.len() < 2 { continue; }
+                if parts.len() < 2 {
+                    continue;
+                }

                if let Ok(kb) = parts[1].parse::<u64>() {
                    match parts[0] {
@ -148,13 +150,11 @@ impl Check for MemoryCheck {
                    }
                }
            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Warn,
-                    detail: format!("Could not determine available memory: {}", e),
-                }
-            }
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Warn,
+                detail: format!("Could not determine available memory: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/mod.rs
+++ b/crates/pdftract-cli/src/doctor/checks/mod.rs
@ -1,27 +1,27 @@
 // Individual check modules
 mod binary;
+mod cache_dir;
+#[cfg(feature = "ocr")]
+mod leptonica;
+#[cfg(feature = "ocr")]
+mod libopenjp2;
+#[cfg(feature = "ocr")]
+mod libtiff;
+mod locale;
+mod memory;
+#[cfg(feature = "remote")]
+mod network;
+#[cfg(feature = "full-render")]
+mod pdfium;
+#[cfg(feature = "profiles")]
+mod profile_path;
+mod temp_dir;
 #[cfg(feature = "ocr")]
 mod tesseract;
 #[cfg(feature = "ocr")]
 mod tesseract_langs;
-#[cfg(feature = "ocr")]
-mod leptonica;
-#[cfg(feature = "ocr")]
-mod libtiff;
-#[cfg(feature = "ocr")]
-mod libopenjp2;
-#[cfg(feature = "full-render")]
-mod pdfium;
-#[cfg(feature = "remote")]
-mod network;
-mod cache_dir;
-#[cfg(feature = "profiles")]
-mod profile_path;
 #[cfg(unix)]
 mod ulimit;
-mod memory;
-mod locale;
-mod temp_dir;

 use super::Check;

--- a/crates/pdftract-cli/src/doctor/checks/network.rs
+++ b/crates/pdftract-cli/src/doctor/checks/network.rs
@ -1,5 +1,5 @@
-use std::time::Duration;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::time::Duration;

 /// Check: network reachability (remote source feature)
 ///
@ -43,20 +43,31 @@ impl Check for NetworkCheck {
                        CheckResult {
                            name: self.name(),
                            status: CheckStatus::Warn,
-                            detail: format!("Network reachable but slow: {} in {:.2}s", status, elapsed.as_secs_f64()),
+                            detail: format!(
+                                "Network reachable but slow: {} in {:.2}s",
+                                status,
+                                elapsed.as_secs_f64()
+                            ),
                        }
                    } else {
                        CheckResult {
                            name: self.name(),
                            status: CheckStatus::Ok,
-                            detail: format!("Network reachable: {} in {:.2}s", status, elapsed.as_secs_f64()),
+                            detail: format!(
+                                "Network reachable: {} in {:.2}s",
+                                status,
+                                elapsed.as_secs_f64()
+                            ),
                        }
                    }
                } else if status >= 300 && status < 400 {
                    CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("Network returned redirect: {} (may indicate proxy or redirect loop)", status),
+                        detail: format!(
+                            "Network returned redirect: {} (may indicate proxy or redirect loop)",
+                            status
+                        ),
                    }
                } else {
                    CheckResult {
@ -66,13 +77,11 @@ impl Check for NetworkCheck {
                    }
                }
            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: e,
-                }
-            }
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: e,
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/pdfium.rs
+++ b/crates/pdftract-cli/src/doctor/checks/pdfium.rs
@ -73,17 +73,18 @@ impl Check for PdfiumCheck {
                    CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("pdfium {} found (< 6555: may have compatibility issues), {}", version, source),
+                        detail: format!(
+                            "pdfium {} found (< 6555: may have compatibility issues), {}",
+                            version, source
+                        ),
                    }
                }
            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("pdfium not found: {}", e),
-                }
-            }
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!("pdfium not found: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/temp_dir.rs
+++ b/crates/pdftract-cli/src/doctor/checks/temp_dir.rs
@ -1,6 +1,6 @@
-use std::path::{Path, PathBuf};
-use std::env;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::env;
+use std::path::{Path, PathBuf};

 /// Check: temp directory writable and free space
 ///
@ -25,8 +25,7 @@ impl TempDirCheck {
        // Try to create a temporary file
        let test_file = path.join(".pdftract-doctor-test");

-        std::fs::write(&test_file, b"test")
-            .map_err(|e| format!("Not writable: {}", e))?;
+        std::fs::write(&test_file, b"test").map_err(|e| format!("Not writable: {}", e))?;

        // Clean up
        let _ = std::fs::remove_file(&test_file);
@ -36,9 +35,9 @@ impl TempDirCheck {

    #[cfg(unix)]
    fn check_free_space(path: &Path) -> Result<u64, String> {
+        use libc::{c_char, statvfs};
        use std::ffi::CString;
        use std::os::unix::ffi::OsStrExt;
-        use libc::{statvfs, c_char};

        let path_cstr = CString::new(path.as_os_str().as_bytes())
            .map_err(|_| "Failed to convert path to CString".to_string())?;
@ -114,20 +113,24 @@ impl Check for TempDirCheck {
                    }
                }
            }
-            (Err(e), _) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("Temp directory check failed at {}: {}", temp_dir.display(), e),
-                }
-            }
-            (_, Err(e)) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Warn,
-                    detail: format!("Could not check free space at {}: {}", temp_dir.display(), e),
-                }
-            }
+            (Err(e), _) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!(
+                    "Temp directory check failed at {}: {}",
+                    temp_dir.display(),
+                    e
+                ),
+            },
+            (_, Err(e)) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Warn,
+                detail: format!(
+                    "Could not check free space at {}: {}",
+                    temp_dir.display(),
+                    e
+                ),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/tesseract.rs
+++ b/crates/pdftract-cli/src/doctor/checks/tesseract.rs
@ -1,5 +1,5 @@
-use std::process::Command;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::process::Command;

 /// Check: tesseract installation and version
 ///
@ -14,9 +14,7 @@ impl Check for TesseractCheck {
    }

    fn run(&self, _ctx: &DoctorCtx) -> CheckResult {
-        let output = Command::new("tesseract")
-            .arg("--version")
-            .output();
+        let output = Command::new("tesseract").arg("--version").output();

        match output {
            Ok(output) => {
@ -61,16 +59,17 @@ impl Check for TesseractCheck {
                CheckResult {
                    name: self.name(),
                    status: CheckStatus::Warn,
-                    detail: format!("tesseract binary found but version could not be parsed: {}", version_output.trim()),
-                }
-            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("tesseract not found: {}", e),
+                    detail: format!(
+                        "tesseract binary found but version could not be parsed: {}",
+                        version_output.trim()
+                    ),
                }
            }
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!("tesseract not found: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/tesseract_langs.rs
+++ b/crates/pdftract-cli/src/doctor/checks/tesseract_langs.rs
@ -1,5 +1,5 @@
-use std::process::Command;
 use super::super::{Check, CheckResult, CheckStatus, DoctorCtx};
+use std::process::Command;

 /// Check: tesseract language availability
 ///
@ -14,9 +14,7 @@ impl Check for TesseractLangsCheck {
    }

    fn run(&self, ctx: &DoctorCtx) -> CheckResult {
-        let output = Command::new("tesseract")
-            .arg("--list-langs")
-            .output();
+        let output = Command::new("tesseract").arg("--list-langs").output();

        match output {
            Ok(output) => {
@ -24,7 +22,10 @@ impl Check for TesseractLangsCheck {
                    return CheckResult {
                        name: self.name(),
                        status: CheckStatus::Fail,
-                        detail: format!("tesseract --list-langs failed: {}", String::from_utf8_lossy(&output.stderr)),
+                        detail: format!(
+                            "tesseract --list-langs failed: {}",
+                            String::from_utf8_lossy(&output.stderr)
+                        ),
                    };
                }

@ -52,7 +53,10 @@ impl Check for TesseractLangsCheck {
                    return CheckResult {
                        name: self.name(),
                        status: CheckStatus::Fail,
-                        detail: format!("Required language 'eng' not found. Installed: {:?}", installed_langs),
+                        detail: format!(
+                            "Required language 'eng' not found. Installed: {:?}",
+                            installed_langs
+                        ),
                    };
                }

@ -60,7 +64,10 @@ impl Check for TesseractLangsCheck {
                    return CheckResult {
                        name: self.name(),
                        status: CheckStatus::Warn,
-                        detail: format!("Requested languages not found: {:?}. Installed: {:?}", missing_required, installed_langs),
+                        detail: format!(
+                            "Requested languages not found: {:?}. Installed: {:?}",
+                            missing_required, installed_langs
+                        ),
                    };
                }

@ -70,13 +77,11 @@ impl Check for TesseractLangsCheck {
                    detail: format!("All required languages present: {:?}", installed_langs),
                }
            }
-            Err(e) => {
-                CheckResult {
-                    name: self.name(),
-                    status: CheckStatus::Fail,
-                    detail: format!("tesseract --list-langs failed: {}", e),
-                }
-            }
+            Err(e) => CheckResult {
+                name: self.name(),
+                status: CheckStatus::Fail,
+                detail: format!("tesseract --list-langs failed: {}", e),
+            },
        }
    }
 }
--- a/crates/pdftract-cli/src/doctor/checks/ulimit.rs
+++ b/crates/pdftract-cli/src/doctor/checks/ulimit.rs
@ -12,7 +12,7 @@ pub struct UlimitCheck;
 impl UlimitCheck {
    #[cfg(unix)]
    fn get_rlimit_nofile() -> Result<u64, String> {
-        use libc::{rlimit, RLIMIT_NOFILE, getrlimit};
+        use libc::{getrlimit, rlimit, RLIMIT_NOFILE};

        unsafe {
            let mut limits = rlimit {
@ -49,7 +49,10 @@ impl Check for UlimitCheck {
                        CheckResult {
                            name: self.name(),
                            status: CheckStatus::Warn,
-                            detail: format!("File descriptor limit: {} (recommended: >= 1024)", limit),
+                            detail: format!(
+                                "File descriptor limit: {} (recommended: >= 1024)",
+                                limit
+                            ),
                        }
                    } else {
                        CheckResult {
@ -59,13 +62,11 @@ impl Check for UlimitCheck {
                        }
                    }
                }
-                Err(e) => {
-                    CheckResult {
-                        name: self.name(),
-                        status: CheckStatus::Warn,
-                        detail: format!("Could not read ulimit: {}", e),
-                    }
-                }
+                Err(e) => CheckResult {
+                    name: self.name(),
+                    status: CheckStatus::Warn,
+                    detail: format!("Could not read ulimit: {}", e),
+                },
            }
        }

--- a/crates/pdftract-cli/src/doctor/mod.rs
+++ b/crates/pdftract-cli/src/doctor/mod.rs
@ -1,8 +1,8 @@
 //! Doctor subcommand - environment health checks

 use anyhow::Result;
-use std::path::PathBuf;
 use std::panic::{catch_unwind, AssertUnwindSafe};
+use std::path::PathBuf;

 // Private checks module
 mod checks;
@ -179,9 +179,12 @@ pub fn run(opts: DoctorOptions) -> Result<()> {
    if opts.json {
        output::output_json(&results);
    } else {
-        output::output_text(&results, &output::TextOptions {
-            no_color: opts.no_color,
-        })?;
+        output::output_text(
+            &results,
+            &output::TextOptions {
+                no_color: opts.no_color,
+            },
+        )?;
    }

    // Determine exit code per plan section 6.10 line 2520-2521:
--- a/crates/pdftract-cli/src/doctor/output/human.rs
+++ b/crates/pdftract-cli/src/doctor/output/human.rs
@ -1,7 +1,7 @@
 //! Human-readable table output for doctor subcommand

-use anyhow::Result;
 use crate::doctor::{CheckResult, CheckStatus};
+use anyhow::Result;
 use std::io::{IsTerminal, Write};

 /// Options for text output
--- a/crates/pdftract-cli/src/doctor/output/mod.rs
+++ b/crates/pdftract-cli/src/doctor/output/mod.rs
@ -1,9 +1,9 @@
 //! Output formatting for doctor subcommand

+mod features;
 mod human;
 mod json;
-mod features;

+pub use features::output_features;
 pub use human::{output_text, TextOptions};
 pub use json::output_json;
-pub use features::output_features;
--- a/crates/pdftract-cli/src/inspect/render/spans.rs
+++ b/crates/pdftract-cli/src/inspect/render/spans.rs
@ -75,10 +75,10 @@ pub fn render_spans(spans: &[SpanJson]) -> Vec<String> {
 /// - `Some(c) where c >= 0.8`: green (#22c55e) - high confidence
 fn confidence_to_color(confidence: Option<f64>) -> &'static str {
    match confidence {
-        None => "#94a3b8", // gray - direct extraction
+        None => "#94a3b8",               // gray - direct extraction
        Some(c) if c < 0.5 => "#ef4444", // red - low confidence
        Some(c) if c < 0.8 => "#eab308", // yellow - medium confidence
-        Some(_) => "#22c55e", // green - high confidence
+        Some(_) => "#22c55e",            // green - high confidence
    }
 }

@ -111,16 +111,14 @@ mod tests {

    #[test]
    fn test_render_spans_single() {
-        let spans = vec![
-            SpanJson {
-                text: "Hello".to_string(),
-                bbox: [100.0, 200.0, 200.0, 220.0],
-                font: "Helvetica".to_string(),
-                size: 12.0,
-                confidence: None,
-                receipt: None,
-            }
-        ];
+        let spans = vec![SpanJson {
+            text: "Hello".to_string(),
+            bbox: [100.0, 200.0, 200.0, 220.0],
+            font: "Helvetica".to_string(),
+            size: 12.0,
+            confidence: None,
+            receipt: None,
+        }];

        let output = render_spans(&spans);
        assert_eq!(output.len(), 1);
@ -149,50 +147,48 @@ mod tests {
    #[test]
    fn test_render_spans_confidence_colors() {
        let test_cases = [
-            (None, "#94a3b8"),           // gray - no confidence
-            (Some(0.3), "#ef4444"),      // red - low
-            (Some(0.5), "#eab308"),      // yellow - medium (boundary)
-            (Some(0.6), "#eab308"),      // yellow - medium
-            (Some(0.79), "#eab308"),     // yellow - medium (boundary)
-            (Some(0.8), "#22c55e"),      // green - high (boundary)
-            (Some(0.95), "#22c55e"),     // green - high
-            (Some(1.0), "#22c55e"),      // green - perfect
+            (None, "#94a3b8"),       // gray - no confidence
+            (Some(0.3), "#ef4444"),  // red - low
+            (Some(0.5), "#eab308"),  // yellow - medium (boundary)
+            (Some(0.6), "#eab308"),  // yellow - medium
+            (Some(0.79), "#eab308"), // yellow - medium (boundary)
+            (Some(0.8), "#22c55e"),  // green - high (boundary)
+            (Some(0.95), "#22c55e"), // green - high
+            (Some(1.0), "#22c55e"),  // green - perfect
        ];

        for (confidence, expected_color) in test_cases {
-            let spans = vec![
-                SpanJson {
-                    text: "Test".to_string(),
-                    bbox: [0.0, 0.0, 10.0, 10.0],
-                    font: "Arial".to_string(),
-                    size: 10.0,
-                    confidence,
-                    receipt: None,
-                }
-            ];
+            let spans = vec![SpanJson {
+                text: "Test".to_string(),
+                bbox: [0.0, 0.0, 10.0, 10.0],
+                font: "Arial".to_string(),
+                size: 10.0,
+                confidence,
+                receipt: None,
+            }];

            let output = render_spans(&spans);
            assert_eq!(output.len(), 1);
            assert!(
                output[0].contains(&format!("stroke=\"{}\"", expected_color)),
                "Confidence {:?} should produce color {}, got: {}",
-                confidence, expected_color, output[0]
+                confidence,
+                expected_color,
+                output[0]
            );
        }
    }

    #[test]
    fn test_render_spans_data_attributes() {
-        let spans = vec![
-            SpanJson {
-                text: "Test & <quote>".to_string(),
-                bbox: [50.0, 100.0, 150.0, 120.0],
-                font: "Times \"Roman\"".to_string(),
-                size: 14.0,
-                confidence: Some(0.85),
-                receipt: None,
-            }
-        ];
+        let spans = vec![SpanJson {
+            text: "Test & <quote>".to_string(),
+            bbox: [50.0, 100.0, 150.0, 120.0],
+            font: "Times \"Roman\"".to_string(),
+            size: 14.0,
+            confidence: Some(0.85),
+            receipt: None,
+        }];

        let output = render_spans(&spans);
        let rect = &output[0];
@ -283,16 +279,14 @@ mod tests {

    #[test]
    fn test_render_spans_css_class() {
-        let spans = vec![
-            SpanJson {
-                text: "Test".to_string(),
-                bbox: [0.0, 0.0, 100.0, 20.0],
-                font: "Arial".to_string(),
-                size: 12.0,
-                confidence: None,
-                receipt: None,
-            }
-        ];
+        let spans = vec![SpanJson {
+            text: "Test".to_string(),
+            bbox: [0.0, 0.0, 100.0, 20.0],
+            font: "Arial".to_string(),
+            size: 12.0,
+            confidence: None,
+            receipt: None,
+        }];

        let output = render_spans(&spans);
        assert!(output[0].contains(r#"class="span-rect""#));
@ -325,16 +319,14 @@ mod tests {

    #[test]
    fn test_render_spans_float_bbox() {
-        let spans = vec![
-            SpanJson {
-                text: "Float".to_string(),
-                bbox: [10.567, 20.891, 100.234, 110.567],
-                font: "Arial".to_string(),
-                size: 12.5,
-                confidence: None,
-                receipt: None,
-            }
-        ];
+        let spans = vec![SpanJson {
+            text: "Float".to_string(),
+            bbox: [10.567, 20.891, 100.234, 110.567],
+            font: "Arial".to_string(),
+            size: 12.5,
+            confidence: None,
+            receipt: None,
+        }];

        let output = render_spans(&spans);
        let rect = &output[0];
@ -348,16 +340,14 @@ mod tests {

    #[test]
    fn test_render_spans_output_is_valid_svg() {
-        let spans = vec![
-            SpanJson {
-                text: "Valid".to_string(),
-                bbox: [0.0, 0.0, 100.0, 20.0],
-                font: "Arial".to_string(),
-                size: 12.0,
-                confidence: Some(0.95),
-                receipt: None,
-            }
-        ];
+        let spans = vec![SpanJson {
+            text: "Valid".to_string(),
+            bbox: [0.0, 0.0, 100.0, 20.0],
+            font: "Arial".to_string(),
+            size: 12.0,
+            confidence: Some(0.95),
+            receipt: None,
+        }];

        let output = render_spans(&spans);
        let rect = &output[0];
--- a/crates/pdftract-cli/src/mcp/auth.rs
+++ b/crates/pdftract-cli/src/mcp/auth.rs
@ -53,7 +53,10 @@ pub fn resolve_token(
            .with_context(|| format!("Failed to read token file: {}", path.display()))?;
        let token = token_content.trim_end().to_string();
        check_token_length(&token);
-        return Ok(Some((SecretString::new(token.into()), AuthSource::TokenFile)));
+        return Ok(Some((
+            SecretString::new(token.into()),
+            AuthSource::TokenFile,
+        )));
    }

    // Priority 2: PDFTRACT_MCP_TOKEN env var
@ -66,10 +69,7 @@ pub fn resolve_token(

    // Priority 3: --auth-token VALUE (only if PDFTRACT_INSECURE_CLI_TOKEN=1)
    if let Some(token) = cli_token {
-        let insecure_allowed = env::var("PDFTRACT_INSECURE_CLI_TOKEN")
-            .ok()
-            .as_deref()
-            == Some("1");
+        let insecure_allowed = env::var("PDFTRACT_INSECURE_CLI_TOKEN").ok().as_deref() == Some("1");

        if !insecure_allowed {
            anyhow::bail!(
@ -84,7 +84,10 @@ pub fn resolve_token(
             Recommended: Use --auth-token-file PATH or PDFTRACT_MCP_TOKEN env var."
        );
        check_token_length(&token);
-        return Ok(Some((SecretString::new(token.into()), AuthSource::CliInsecure)));
+        return Ok(Some((
+            SecretString::new(token.into()),
+            AuthSource::CliInsecure,
+        )));
    }

    // No token provided
--- a/crates/pdftract-cli/src/mcp/bind.rs
+++ b/crates/pdftract-cli/src/mcp/bind.rs
@ -105,11 +105,17 @@ mod tests {
        // Non-loopback addresses should fail without a token
        let result = check_bind_security("0.0.0.0:8080", false);
        assert!(result.is_err());
-        assert!(result.unwrap_err().to_string().contains("requires --auth-token-file"));
+        assert!(result
+            .unwrap_err()
+            .to_string()
+            .contains("requires --auth-token-file"));

        let result = check_bind_security("192.168.1.1:3000", false);
        assert!(result.is_err());
-        assert!(result.unwrap_err().to_string().contains("requires --auth-token-file"));
+        assert!(result
+            .unwrap_err()
+            .to_string()
+            .contains("requires --auth-token-file"));
    }

    #[test]
--- a/crates/pdftract-cli/src/mcp/framing/mod.rs
+++ b/crates/pdftract-cli/src/mcp/framing/mod.rs
@ -479,20 +479,17 @@ impl<'de> Deserialize<'de> for BatchMessage {
                // Deserialize each array element as a Request
                let mut reqs = Vec::with_capacity(arr.len());
                for item in arr {
-                    let req = Request::deserialize(item)
-                        .map_err(serde::de::Error::custom)?;
+                    let req = Request::deserialize(item).map_err(serde::de::Error::custom)?;
                    reqs.push(req);
                }
                Ok(BatchMessage::Batch(reqs))
            }
            Value::Object(obj) => {
-                let req = Request::deserialize(Value::Object(obj))
-                    .map_err(serde::de::Error::custom)?;
+                let req =
+                    Request::deserialize(Value::Object(obj)).map_err(serde::de::Error::custom)?;
                Ok(BatchMessage::Single(req))
            }
-            _ => Err(serde::de::Error::custom(
-                "expected JSON object or array",
-            )),
+            _ => Err(serde::de::Error::custom("expected JSON object or array")),
        }
    }
 }
@ -586,7 +583,11 @@ mod tests {
    fn test_batch_round_trip() {
        let reqs = vec![
            Request::new("tools/list", None, Some(Id::Number(1))),
-            Request::new("tools/call", Some(Value::Object(serde_json::Map::new())), Some(Id::Number(2))),
+            Request::new(
+                "tools/call",
+                Some(Value::Object(serde_json::Map::new())),
+                Some(Id::Number(2)),
+            ),
            Request::new("prompts/list", None, Some(Id::String("abc".to_string()))),
        ];
        let batch = BatchMessage::Batch(reqs.clone());
--- a/crates/pdftract-cli/src/mcp/http.rs
+++ b/crates/pdftract-cli/src/mcp/http.rs
@ -24,7 +24,6 @@
 use crate::mcp::framing::{BatchMessage, ErrorObject, Id, Notification, Request, Response};
 use crate::mcp::tools;
 use anyhow::{anyhow, Context, Result};
-use subtle::ConstantTimeEq;
 use axum::{
    body::Body,
    extract::{DefaultBodyLimit, Request as AxumRequest, State},
@ -40,6 +39,7 @@ use std::path::PathBuf;
 use std::sync::atomic::{AtomicUsize, Ordering};
 use std::sync::Arc;
 use std::time::Duration;
+use subtle::ConstantTimeEq;
 use tokio::sync::broadcast;

 /// Default maximum request body size (256 MB)
@ -75,7 +75,11 @@ pub struct McpServerState {

 impl McpServerState {
    /// Create a new MCP server state.
-    pub fn new(auth_token: Option<SecretString>, max_upload_mb: Option<usize>, root: Option<PathBuf>) -> Self {
+    pub fn new(
+        auth_token: Option<SecretString>,
+        max_upload_mb: Option<usize>,
+        root: Option<PathBuf>,
+    ) -> Self {
        let max_body_bytes = max_upload_mb.unwrap_or(DEFAULT_MAX_UPLOAD_MB) * 1024 * 1024;
        let notify_tx = broadcast::channel(100).0; // Channel size 100 for buffered notifications

@ -96,7 +100,9 @@ impl McpServerState {
    pub fn broadcast_notification(&self, notification: Notification) -> usize {
        // recv_count is the number of receivers that got the message
        // (before it was dropped due to channel overflow or lag)
-        self.notify_tx.send(notification).map_or(0, |recv_count| recv_count)
+        self.notify_tx
+            .send(notification)
+            .map_or(0, |recv_count| recv_count)
    }

    /// Get the current number of active SSE clients.
@ -162,9 +168,7 @@ pub async fn run_server(
    eprintln!();

    // Run the server
-    axum::serve(listener, app)
-        .await
-        .context("Server error")?;
+    axum::serve(listener, app).await.context("Server error")?;

    Ok(())
 }
@ -199,16 +203,12 @@ async fn handle_post_request(
    }

    // Parse the request body as either a single Request or a Batch
-    let batch_result: std::result::Result<BatchMessage, _> =
-        serde_json::from_str(&body);
+    let batch_result: std::result::Result<BatchMessage, _> = serde_json::from_str(&body);

    let batch = match batch_result {
        Ok(batch) => batch,
        Err(_) => {
-            return error_response(
-                StatusCode::BAD_REQUEST,
-                ErrorObject::invalid_request(),
-            );
+            return error_response(StatusCode::BAD_REQUEST, ErrorObject::invalid_request());
        }
    };

@ -237,10 +237,7 @@ async fn handle_post_request(
 ///
 /// Returns a long-lived SSE connection that receives server notifications.
 /// Sends a keepalive comment every 30 seconds.
-async fn handle_sse(
-    State(state): State<McpServerState>,
-    headers: HeaderMap,
-) -> AxumResponse {
+async fn handle_sse(State(state): State<McpServerState>, headers: HeaderMap) -> AxumResponse {
    // Check authentication first
    match check_auth(&state, &headers) {
        Ok(()) => {}
@ -257,7 +254,8 @@ async fn handle_sse(
                "error": "Maximum concurrent clients exceeded",
                "limit": MAX_SSE_CLIENTS,
            })),
-        ).into_response();
+        )
+            .into_response();
    }

    // Subscribe to the broadcast channel
@ -321,11 +319,13 @@ async fn handle_sse(
    };

    // Return SSE response with appropriate headers
-    Sse::new(stream).keep_alive(
-        axum::response::sse::KeepAlive::new()
-            .interval(Duration::from_secs(SSE_KEEPALIVE_SECS))
-            .text("keepalive"),
-    ).into_response()
+    Sse::new(stream)
+        .keep_alive(
+            axum::response::sse::KeepAlive::new()
+                .interval(Duration::from_secs(SSE_KEEPALIVE_SECS))
+                .text("keepalive"),
+        )
+        .into_response()
 }

 /// GET /health handler - health check endpoint.
@ -393,9 +393,7 @@ fn check_auth(
    headers: &HeaderMap,
 ) -> std::result::Result<(), AxumResponse> {
    if let Some(token) = &state.auth_token {
-        let auth_header = headers
-            .get("Authorization")
-            .and_then(|v| v.to_str().ok());
+        let auth_header = headers.get("Authorization").and_then(|v| v.to_str().ok());

        match auth_header {
            Some(header) if header.starts_with("Bearer ") => {
@ -408,8 +406,12 @@ fn check_auth(
                } else {
                    let mut response = (
                        StatusCode::UNAUTHORIZED,
-                        Json(Response::error(Id::Null, ErrorObject::new(-32001, "Invalid authentication token"))),
-                    ).into_response();
+                        Json(Response::error(
+                            Id::Null,
+                            ErrorObject::new(-32001, "Invalid authentication token"),
+                        )),
+                    )
+                        .into_response();
                    response.headers_mut().insert(
                        "WWW-Authenticate",
                        HeaderValue::from_static("Bearer realm=\"pdftract\""),
@ -420,8 +422,12 @@ fn check_auth(
            _ => {
                let mut response = (
                    StatusCode::UNAUTHORIZED,
-                    Json(Response::error(Id::Null, ErrorObject::new(-32001, "Missing authentication token"))),
-                ).into_response();
+                    Json(Response::error(
+                        Id::Null,
+                        ErrorObject::new(-32001, "Missing authentication token"),
+                    )),
+                )
+                    .into_response();
                response.headers_mut().insert(
                    "WWW-Authenticate",
                    HeaderValue::from_static("Bearer realm=\"pdftract\""),
@ -435,7 +441,11 @@ fn check_auth(
 }

 /// Handle a single JSON-RPC request and return a response.
-fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option<&std::path::Path>) -> Response {
+fn handle_request(
+    request: Request,
+    registry: &tools::ToolRegistry,
+    root: Option<&std::path::Path>,
+) -> Response {
    let id = request.request_id();

    match request.method.as_str() {
@ -463,20 +473,29 @@ fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option
            let params = match request.params {
                Some(p) => p,
                None => {
-                    return Response::error(id, ErrorObject::invalid_params()
-                        .with_data(json!({"reason": "Missing params"})));
+                    return Response::error(
+                        id,
+                        ErrorObject::invalid_params()
+                            .with_data(json!({"reason": "Missing params"})),
+                    );
                }
            };

            let tool_name = match params.get("name").and_then(|v| v.as_str()) {
                Some(name) => name,
                None => {
-                    return Response::error(id, ErrorObject::invalid_params()
-                        .with_data(json!({"reason": "Missing or invalid 'name' field"})));
+                    return Response::error(
+                        id,
+                        ErrorObject::invalid_params()
+                            .with_data(json!({"reason": "Missing or invalid 'name' field"})),
+                    );
                }
            };

-            let arguments = params.get("arguments").cloned().unwrap_or(Value::Object(serde_json::Map::new()));
+            let arguments = params
+                .get("arguments")
+                .cloned()
+                .unwrap_or(Value::Object(serde_json::Map::new()));

            // Look up the tool in the registry
            let tool = match registry.get(tool_name) {
@ -488,12 +507,17 @@ fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option

            // Execute the tool with observability logging
            let start = std::time::Instant::now();
-            let log_path = arguments.get("path").and_then(|v| v.as_str()).map(|s| s.to_string());
+            let log_path = arguments
+                .get("path")
+                .and_then(|v| v.as_str())
+                .map(|s| s.to_string());

            let result = tool.execute(arguments, log_path.as_deref(), root);

            let duration_ms = start.elapsed().as_millis();
-            let response_size = result.as_ref().ok()
+            let response_size = result
+                .as_ref()
+                .ok()
                .map(|v| serde_json::to_vec(v).unwrap_or_default().len())
                .unwrap_or(0);

@ -503,13 +527,9 @@ fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option
            let path_or_hash = log_path.unwrap_or_else(|| "<unknown>".to_string());
            let error_code = result.as_ref().err().map(|e| e.code.to_string());

-            eprintln!("{} tool={} path={} duration_ms={} response_size_bytes={} error_code={:?}",
-                timestamp,
-                tool_name,
-                path_or_hash,
-                duration_ms,
-                response_size,
-                error_code,
+            eprintln!(
+                "{} tool={} path={} duration_ms={} response_size_bytes={} error_code={:?}",
+                timestamp, tool_name, path_or_hash, duration_ms, response_size, error_code,
            );

            match result {
@ -647,7 +667,10 @@ mod tests {
        // No token configured, so any headers should pass
        assert!(check_auth(&state, &headers).is_ok());

-        headers.insert("Authorization", HeaderValue::from_static("Bearer irrelevant"));
+        headers.insert(
+            "Authorization",
+            HeaderValue::from_static("Bearer irrelevant"),
+        );
        assert!(check_auth(&state, &headers).is_ok());
    }

@ -657,7 +680,10 @@ mod tests {
        let state = McpServerState::new(Some(token), None, None);
        let mut headers = HeaderMap::new();

-        headers.insert("Authorization", HeaderValue::from_static("Bearer correct-token"));
+        headers.insert(
+            "Authorization",
+            HeaderValue::from_static("Bearer correct-token"),
+        );
        assert!(check_auth(&state, &headers).is_ok());
    }

@ -667,7 +693,10 @@ mod tests {
        let state = McpServerState::new(Some(token), None, None);
        let mut headers = HeaderMap::new();

-        headers.insert("Authorization", HeaderValue::from_static("Bearer wrong-token"));
+        headers.insert(
+            "Authorization",
+            HeaderValue::from_static("Bearer wrong-token"),
+        );
        let result = check_auth(&state, &headers);
        assert!(result.is_err());
        if let Err(resp) = result {
@ -774,7 +803,10 @@ mod tests {
            ratio <= 5,
            "Token comparison appears to be non-constant-time: \
             early mismatch={:?}, late mismatch={:?}, correct={:?}, ratio={}",
-            median_early, median_late, median_correct, ratio
+            median_early,
+            median_late,
+            median_correct,
+            ratio
        );

        // Also verify that the correct token actually returns true
@ -801,7 +833,10 @@ mod tests {

        // Test 2: Token that is much longer
        let mut headers_long = HeaderMap::new();
-        headers_long.insert("Authorization", HeaderValue::from_static("Bearer this-token-is-much-longer-than-the-correct-one"));
+        headers_long.insert(
+            "Authorization",
+            HeaderValue::from_static("Bearer this-token-is-much-longer-than-the-correct-one"),
+        );

        let iterations = 1000;
        let mut times_short = Vec::with_capacity(iterations);
@ -840,7 +875,9 @@ mod tests {
            ratio <= 3,
            "Token comparison appears to leak length information: \
             short={:?}, long={:?}, ratio={}",
-            median_short, median_long, ratio
+            median_short,
+            median_long,
+            ratio
        );
    }
 }
--- a/crates/pdftract-cli/src/mcp/root.rs
+++ b/crates/pdftract-cli/src/mcp/root.rs
@ -51,7 +51,10 @@ pub fn resolve_path(arg: &str, root: Option<&Path>) -> Result<PathBuf, ErrorObje
    // Reject absolute paths when --root is set
    if arg.starts_with('/') || Path::new(arg).is_absolute() {
        return Err(ErrorObject::invalid_params()
-            .with_message(format!("absolute paths not permitted under --root: '{}'", arg))
+            .with_message(format!(
+                "absolute paths not permitted under --root: '{}'",
+                arg
+            ))
            .with_data(json!({ "code": CODE_ABSOLUTE_PATH_NOT_PERMITTED, "path": arg })));
    }

@ -62,7 +65,9 @@ pub fn resolve_path(arg: &str, root: Option<&Path>) -> Result<PathBuf, ErrorObje
    let canonical = std::fs::canonicalize(&candidate).map_err(|e| {
        ErrorObject::invalid_params()
            .with_message(format!("path resolution failed: {}", e))
-            .with_data(json!({ "code": CODE_PATH_RESOLUTION_FAILED, "path": arg, "error": e.to_string() }))
+            .with_data(
+                json!({ "code": CODE_PATH_RESOLUTION_FAILED, "path": arg, "error": e.to_string() }),
+            )
    })?;

    // Reject if canonical is not a descendant of root
@ -90,12 +95,19 @@ pub fn resolve_path(arg: &str, root: Option<&Path>) -> Result<PathBuf, ErrorObje
 /// * `Err(String)` - Error message if root is invalid
 pub fn canonicalize_root(root_arg: &Path) -> Result<PathBuf, String> {
    // Canonicalize the root path (follows symlinks, resolves relative components)
-    let canonical = std::fs::canonicalize(root_arg)
-        .map_err(|e| format!("--root path does not exist or cannot be canonicalized: {}", e))?;
+    let canonical = std::fs::canonicalize(root_arg).map_err(|e| {
+        format!(
+            "--root path does not exist or cannot be canonicalized: {}",
+            e
+        )
+    })?;

    // Verify it's a directory
    if !canonical.is_dir() {
-        return Err(format!("--root must be a directory, not a file: {}", canonical.display()));
+        return Err(format!(
+            "--root must be a directory, not a file: {}",
+            canonical.display()
+        ));
    }

    Ok(canonical)
@ -112,18 +124,27 @@ mod tests {
    fn test_https_url_bypasses_check() {
        let result = resolve_path("https://example.com/file.pdf", None);
        assert!(result.is_ok());
-        assert_eq!(result.unwrap(), PathBuf::from("https://example.com/file.pdf"));
+        assert_eq!(
+            result.unwrap(),
+            PathBuf::from("https://example.com/file.pdf")
+        );

        let result = resolve_path("https://example.com/file.pdf", Some(Path::new("/tmp")));
        assert!(result.is_ok());
-        assert_eq!(result.unwrap(), PathBuf::from("https://example.com/file.pdf"));
+        assert_eq!(
+            result.unwrap(),
+            PathBuf::from("https://example.com/file.pdf")
+        );
    }

    #[test]
    fn test_http_url_bypasses_check() {
        let result = resolve_path("http://example.com/file.pdf", None);
        assert!(result.is_ok());
-        assert_eq!(result.unwrap(), PathBuf::from("http://example.com/file.pdf"));
+        assert_eq!(
+            result.unwrap(),
+            PathBuf::from("http://example.com/file.pdf")
+        );
    }

    #[test]
@ -195,7 +216,11 @@ mod tests {

        #[cfg(windows)]
        {
-            std::os::windows::fs::symlink_file(r"C:\Windows\System32\drivers\etc\hosts", &symlink_path).unwrap();
+            std::os::windows::fs::symlink_file(
+                r"C:\Windows\System32\drivers\etc\hosts",
+                &symlink_path,
+            )
+            .unwrap();
        }

        // Try to access the symlink
@ -264,12 +289,18 @@ mod tests {
        let result = resolve_path("/etc/passwd", Some(root));
        let err = result.unwrap_err();
        let data = err.data.unwrap();
-        assert_eq!(data.get("code").unwrap().as_str(), Some(CODE_ABSOLUTE_PATH_NOT_PERMITTED));
+        assert_eq!(
+            data.get("code").unwrap().as_str(),
+            Some(CODE_ABSOLUTE_PATH_NOT_PERMITTED)
+        );

        // Test traversal error
        let result = resolve_path("../../../etc/passwd", Some(root));
        let err = result.unwrap_err();
        let data = err.data.unwrap();
-        assert_eq!(data.get("code").unwrap().as_str(), Some(CODE_PATH_ESCAPES_ROOT));
+        assert_eq!(
+            data.get("code").unwrap().as_str(),
+            Some(CODE_PATH_ESCAPES_ROOT)
+        );
    }
 }
--- a/crates/pdftract-cli/src/mcp/server.rs
+++ b/crates/pdftract-cli/src/mcp/server.rs
@ -70,8 +70,7 @@ pub fn run(
    }

    // Start the HTTP+SSE server (this blocks until shutdown)
-    let runtime = tokio::runtime::Runtime::new()
-        .context("Failed to create tokio runtime")?;
+    let runtime = tokio::runtime::Runtime::new().context("Failed to create tokio runtime")?;

    runtime.block_on(http::run_server(
        bind_addr,
--- a/crates/pdftract-cli/src/mcp/stdio.rs
+++ b/crates/pdftract-cli/src/mcp/stdio.rs
@ -61,8 +61,7 @@ fn init_stdout() {
 /// CRITICAL: The JSON body is written WITHOUT a trailing newline.
 /// Adding any extra bytes after the JSON body breaks the framing.
 fn write_response(response: &Response) -> Result<()> {
-    let json = serde_json::to_string(response)
-        .context("Failed to serialize response")?;
+    let json = serde_json::to_string(response).context("Failed to serialize response")?;

    let content_length = json.len();

@ -86,8 +85,7 @@ fn write_response(response: &Response) -> Result<()> {
    write!(stdout, "{json}")?;

    // Flush immediately to ensure the client receives the response
-    stdout.flush()
-        .context("Failed to flush stdout")?;
+    stdout.flush().context("Failed to flush stdout")?;

    Ok(())
 }
@ -190,7 +188,8 @@ fn read_message(stdin: &mut BufReader<Stdin>) -> Result<Option<Request>> {
    // Read headers until empty line
    loop {
        let mut line = String::new();
-        let bytes_read = stdin.read_line(&mut line)
+        let bytes_read = stdin
+            .read_line(&mut line)
            .context("Failed to read header line")?;

        if bytes_read == 0 {
@ -208,14 +207,16 @@ fn read_message(stdin: &mut BufReader<Stdin>) -> Result<Option<Request>> {
        // Parse Content-Length header
        if let Some(value) = line.strip_prefix("Content-Length:") {
            let value = value.trim();
-            content_length = Some(value.parse::<usize>()
-                .with_context(|| format!("Invalid Content-Length: {value}"))?);
+            content_length = Some(
+                value
+                    .parse::<usize>()
+                    .with_context(|| format!("Invalid Content-Length: {value}"))?,
+            );
        }
        // Ignore other headers (we don't need Content-Type for now)
    }

-    let content_length = content_length
-        .ok_or_else(|| anyhow!("Missing Content-Length header"))?;
+    let content_length = content_length.ok_or_else(|| anyhow!("Missing Content-Length header"))?;

    // Read exactly content_length bytes
    let mut buffer = vec![0u8; content_length];
@ -236,8 +237,8 @@ fn read_message(stdin: &mut BufReader<Stdin>) -> Result<Option<Request>> {
    }

    // Parse as JSON-RPC BatchMessage (handles both single requests and batches)
-    let batch: BatchMessage = serde_json::from_slice(&buffer)
-        .context("Failed to parse JSON-RPC request")?;
+    let batch: BatchMessage =
+        serde_json::from_slice(&buffer).context("Failed to parse JSON-RPC request")?;

    // Extract the single request from the batch
    // For now, we only support single requests (not batches)
@ -256,7 +257,11 @@ fn read_message(stdin: &mut BufReader<Stdin>) -> Result<Option<Request>> {
 }

 /// Handle a JSON-RPC request and return a response.
-fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option<&Path>) -> Response {
+fn handle_request(
+    request: Request,
+    registry: &tools::ToolRegistry,
+    root: Option<&Path>,
+) -> Response {
    let id = request.request_id();

    match request.method.as_str() {
@ -284,16 +289,22 @@ fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option
            let params = match request.params {
                Some(p) => p,
                None => {
-                    return Response::error(id, ErrorObject::invalid_params()
-                        .with_data(json!({"reason": "Missing params"})));
+                    return Response::error(
+                        id,
+                        ErrorObject::invalid_params()
+                            .with_data(json!({"reason": "Missing params"})),
+                    );
                }
            };

            let tool_name = match params.get("name").and_then(|v| v.as_str()) {
                Some(name) => name,
                None => {
-                    return Response::error(id, ErrorObject::invalid_params()
-                        .with_data(json!({"reason": "Missing or invalid 'name' field"})));
+                    return Response::error(
+                        id,
+                        ErrorObject::invalid_params()
+                            .with_data(json!({"reason": "Missing or invalid 'name' field"})),
+                    );
                }
            };

@ -309,12 +320,17 @@ fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option

            // Execute the tool with observability logging
            let start = Instant::now();
-            let log_path = arguments.get("path").and_then(|v| v.as_str()).map(|s| s.to_string());
+            let log_path = arguments
+                .get("path")
+                .and_then(|v| v.as_str())
+                .map(|s| s.to_string());

            let result = tool.execute(arguments, log_path.as_deref(), root);

            let duration_ms = start.elapsed().as_millis();
-            let response_size = result.as_ref().ok()
+            let response_size = result
+                .as_ref()
+                .ok()
                .map(|v| serde_json::to_vec(v).unwrap_or_default().len())
                .unwrap_or(0);

@ -323,13 +339,9 @@ fn handle_request(request: Request, registry: &tools::ToolRegistry, root: Option
            let path_or_hash = log_path.as_deref().unwrap_or("<unknown>");
            let error_code = result.as_ref().err().map(|e| e.code.to_string());

-            eprintln!("{} tool={} path={} duration_ms={} response_size_bytes={} error_code={:?}",
-                timestamp,
-                tool_name,
-                path_or_hash,
-                duration_ms,
-                response_size,
-                error_code,
+            eprintln!(
+                "{} tool={} path={} duration_ms={} response_size_bytes={} error_code={:?}",
+                timestamp, tool_name, path_or_hash, duration_ms, response_size, error_code,
            );

            match result {
@ -388,7 +400,13 @@ pub fn run(root: Option<&Path>) -> Result<()> {
    eprintln!("pdftract MCP server (stdio mode) starting...");
    eprintln!("Version: {}", env!("CARGO_PKG_VERSION"));
    eprintln!("Protocol: JSON-RPC 2.0 over stdio");
-    eprintln!("Tools: {}", registry.tools_list()["tools"].as_array().map(|v| v.len()).unwrap_or(0));
+    eprintln!(
+        "Tools: {}",
+        registry.tools_list()["tools"]
+            .as_array()
+            .map(|v| v.len())
+            .unwrap_or(0)
+    );
    if root.is_some() {
        eprintln!("Path-traversal protection: enabled");
    } else {
@ -422,10 +440,7 @@ pub fn run(root: Option<&Path>) -> Result<()> {
                // Parse error - send error response and continue
                eprintln!("Parse error: {}", e);

-                let error_response = Response::error(
-                    Id::Null,
-                    ErrorObject::parse_error(),
-                );
+                let error_response = Response::error(Id::Null, ErrorObject::parse_error());

                if let Err(write_err) = write_response(&error_response) {
                    eprintln!("Failed to write error response: {}", write_err);
@ -444,7 +459,8 @@ pub fn run(root: Option<&Path>) -> Result<()> {

    // Flush stdout before exit
    if let Some(mut stdout) = STDOUT.lock().unwrap().take() {
-        stdout.flush()
+        stdout
+            .flush()
            .context("Failed to flush stdout on shutdown")?;
    }

@ -462,10 +478,7 @@ mod tests {
    fn test_write_response_framing() {
        init_stdout();

-        let response = Response::success(
-            Id::Number(1),
-            serde_json::json!({"result": "ok"}),
-        );
+        let response = Response::success(Id::Number(1), serde_json::json!({"result": "ok"}));

        // This should succeed (stdout is initialized)
        // We can't easily test the actual output without capturing stdout,
@ -481,11 +494,7 @@ mod tests {
    #[test]
    fn test_handle_unknown_method() {
        let registry = tools::all_tools();
-        let request = Request::new(
-            "unknown/method",
-            None,
-            Some(Id::Number(1)),
-        );
+        let request = Request::new("unknown/method", None, Some(Id::Number(1)));

        let response = handle_request(request, &registry, None);

@ -497,11 +506,7 @@ mod tests {
    #[test]
    fn test_handle_tools_list() {
        let registry = tools::all_tools();
-        let request = Request::new(
-            "tools/list",
-            None,
-            Some(Id::Number(1)),
-        );
+        let request = Request::new("tools/list", None, Some(Id::Number(1)));

        let response = handle_request(request, &registry, None);

@ -512,11 +517,7 @@ mod tests {
    /// Test that notifications (no id) return Id::Null.
    #[test]
    fn test_request_id_notification() {
-        let request = Request::new(
-            "notifications/message",
-            None,
-            None,
-        );
+        let request = Request::new("notifications/message", None, None);

        assert_eq!(request.request_id(), Id::Null);
    }
--- a/crates/pdftract-cli/src/mcp/tools/mod.rs
+++ b/crates/pdftract-cli/src/mcp/tools/mod.rs
@ -5,10 +5,10 @@
 //! argument schema (JSON Schema via schemars), structured error mapping, and
 //! per-invocation observability.

-mod registry;
 mod args;
+mod registry;

-pub use registry::{Tool, ToolRegistry, ToolResult, all_tools};
+pub use registry::{all_tools, Tool, ToolRegistry, ToolResult};

 // Error codes for pdftract-specific errors (-32099..-32000)
 pub const ERROR_NOT_YET_IMPLEMENTED: i64 = -32000;
--- a/crates/pdftract-cli/src/mcp/tools/registry.rs
+++ b/crates/pdftract-cli/src/mcp/tools/registry.rs
@ -5,14 +5,20 @@
 //! provides the tools/list response.

 use super::args::*;
-use super::{ERROR_NOT_YET_IMPLEMENTED, ERROR_IO_ERROR, ERROR_PATH_INVALID, CODE_IO_ERROR, CODE_PATH_INVALID};
+use super::{
+    CODE_IO_ERROR, CODE_PATH_INVALID, ERROR_IO_ERROR, ERROR_NOT_YET_IMPLEMENTED, ERROR_PATH_INVALID,
+};
 use crate::mcp::framing::ErrorObject;
 use crate::mcp::root::resolve_path;
 use pdftract_core::{
-    parser::{self, catalog, pages, stream::{MemorySource, PdfSource}, xref},
    diagnostics::DiagCode,
-    options::{ExtractionOptions, ReceiptsMode},
    extract::{extract_pdf, result_to_json},
+    options::{ExtractionOptions, ReceiptsMode},
+    parser::{
+        self, catalog, pages,
+        stream::{MemorySource, PdfSource},
+        xref,
+    },
 };
 use regex::Regex;
 use serde_json::{json, to_value, Value};
@ -153,19 +159,19 @@ fn find_startxref_offset(data: &[u8]) -> Result<u64, ErrorObject> {
            return Err(ErrorObject::server_error(
                super::ERROR_IO_ERROR,
                "Invalid startxref offset in PDF",
-            ).with_data(json!({"code": super::CODE_IO_ERROR})));
+            )
+            .with_data(json!({"code": super::CODE_IO_ERROR})));
        }

-        let offset_str = std::str::from_utf8(&data[offset_start..offset_end])
-            .map_err(|_| ErrorObject::server_error(
-                super::ERROR_IO_ERROR,
-                "Invalid UTF-8 in startxref offset",
-            ).with_data(json!({"code": super::CODE_IO_ERROR})))?;
+        let offset_str = std::str::from_utf8(&data[offset_start..offset_end]).map_err(|_| {
+            ErrorObject::server_error(super::ERROR_IO_ERROR, "Invalid UTF-8 in startxref offset")
+                .with_data(json!({"code": super::CODE_IO_ERROR}))
+        })?;

-        let offset: u64 = offset_str.parse().map_err(|_| ErrorObject::server_error(
-            super::ERROR_IO_ERROR,
-            "Failed to parse startxref offset",
-        ).with_data(json!({"code": super::CODE_IO_ERROR})))?;
+        let offset: u64 = offset_str.parse().map_err(|_| {
+            ErrorObject::server_error(super::ERROR_IO_ERROR, "Failed to parse startxref offset")
+                .with_data(json!({"code": super::CODE_IO_ERROR}))
+        })?;

        Ok(offset)
    } else {
@ -200,24 +206,26 @@ struct PdfContext {
 /// * `path` - The path argument (may be a URL or local path)
 /// * `password` - Optional PDF password
 /// * `root` - Optional root directory for path-traversal protection
-fn open_pdf(path: &str, password: Option<&str>, root: Option<&Path>) -> Result<PdfContext, ErrorObject> {
+fn open_pdf(
+    path: &str,
+    password: Option<&str>,
+    root: Option<&Path>,
+) -> Result<PdfContext, ErrorObject> {
    // Validate and resolve the path using the root if set
    let path_buf = resolve_path(path, root)?;

    // Check if it's a file (not a directory)
    if !path_buf.is_file() {
-        return Err(ErrorObject::server_error(
-            ERROR_PATH_INVALID,
-            format!("Not a file: {}", path),
-        ).with_data(json!({"code": CODE_PATH_INVALID, "path": path})));
+        return Err(
+            ErrorObject::server_error(ERROR_PATH_INVALID, format!("Not a file: {}", path))
+                .with_data(json!({"code": CODE_PATH_INVALID, "path": path})),
+        );
    }

    // Read the PDF file
    let buffer = fs::read(&path_buf).map_err(|e| {
-        ErrorObject::server_error(
-            ERROR_IO_ERROR,
-            format!("Failed to read PDF file: {}", e),
-        ).with_data(json!({"code": CODE_IO_ERROR, "path": path}))
+        ErrorObject::server_error(ERROR_IO_ERROR, format!("Failed to read PDF file: {}", e))
+            .with_data(json!({"code": CODE_IO_ERROR, "path": path}))
    })?;

    // Check for PDF magic number
@ -225,7 +233,8 @@ fn open_pdf(path: &str, password: Option<&str>, root: Option<&Path>) -> Result<P
        return Err(ErrorObject::server_error(
            ERROR_IO_ERROR,
            "Not a valid PDF file (missing %PDF- header)",
-        ).with_data(json!({"code": CODE_IO_ERROR, "path": path})));
+        )
+        .with_data(json!({"code": CODE_IO_ERROR, "path": path})));
    }

    // Create a MemorySource for parsing
@ -240,7 +249,8 @@ fn open_pdf(path: &str, password: Option<&str>, root: Option<&Path>) -> Result<P
            return Err(ErrorObject::server_error(
                super::ERROR_PDF_ENCRYPTED,
                "PDF is encrypted and no password was provided",
-            ).with_data(json!({"code": super::CODE_PDF_ENCRYPTED})));
+            )
+            .with_data(json!({"code": super::CODE_PDF_ENCRYPTED})));
        }
    }

@ -250,18 +260,19 @@ fn open_pdf(path: &str, password: Option<&str>, root: Option<&Path>) -> Result<P
            return Err(ErrorObject::server_error(
                super::ERROR_PDF_ENCRYPTED,
                "PDF is encrypted and no password was provided",
-            ).with_data(json!({"code": super::CODE_PDF_ENCRYPTED})));
+            )
+            .with_data(json!({"code": super::CODE_PDF_ENCRYPTED})));
        }
    }

    // Get the root reference from the trailer
-    let root_ref = xref_section.trailer.as_ref()
+    let root_ref = xref_section
+        .trailer
+        .as_ref()
        .and_then(|trailer| trailer.get("Root"))
-        .and_then(|obj| {
-            match obj {
-                pdftract_core::parser::object::PdfObject::Ref(obj_ref) => Some(obj_ref),
-                _ => None,
-            }
+        .and_then(|obj| match obj {
+            pdftract_core::parser::object::PdfObject::Ref(obj_ref) => Some(obj_ref),
+            _ => None,
        });

    let (catalog, page_count) = match root_ref {
@ -283,11 +294,15 @@ fn open_pdf(path: &str, password: Option<&str>, root: Option<&Path>) -> Result<P
                }
                Err(diags) => {
                    // Check for encryption errors
-                    if diags.iter().any(|d| d.code == DiagCode::EncryptionUnsupported) {
+                    if diags
+                        .iter()
+                        .any(|d| d.code == DiagCode::EncryptionUnsupported)
+                    {
                        return Err(ErrorObject::server_error(
                            super::ERROR_PDF_ENCRYPTED,
                            "PDF is encrypted and no password was provided",
-                        ).with_data(json!({"code": super::CODE_PDF_ENCRYPTED})));
+                        )
+                        .with_data(json!({"code": super::CODE_PDF_ENCRYPTED})));
                    }
                    // Catalog parsing failed - return partial context
                    (None, None)
@ -345,7 +360,10 @@ fn build_extraction_options(
 /// Create a stub response for tools that require Phase 6 extraction surface.
 fn stub_extraction_response(path: &str, tool_name: &str, page_count: Option<usize>) -> Value {
    let mut response = serde_json::Map::new();
-    response.insert("_note".to_string(), json!("This tool requires Phase 6 extraction surface"));
+    response.insert(
+        "_note".to_string(),
+        json!("This tool requires Phase 6 extraction surface"),
+    );
    response.insert("_tool".to_string(), json!(tool_name));
    response.insert("_path".to_string(), json!(path));

@ -396,8 +414,8 @@ impl Tool for ExtractTool {

    fn execute(&self, args: Value, _log_path: Option<&str>, root: Option<&Path>) -> ToolResult {
        // Parse arguments
-        let tool_args: ExtractArgs = serde_json::from_value(args)
-            .map_err(|_| ErrorObject::invalid_params())?;
+        let tool_args: ExtractArgs =
+            serde_json::from_value(args).map_err(|_| ErrorObject::invalid_params())?;

        // Check if path is a URL
        if is_url(&tool_args.path) {
@ -414,14 +432,17 @@ impl Tool for ExtractTool {
        let path_buf = resolve_path(&tool_args.path, root)?;

        // Build extraction options
-        let options = build_extraction_options(&tool_args.pages, &tool_args.ocr, tool_args.receipts.as_deref());
+        let options = build_extraction_options(
+            &tool_args.pages,
+            &tool_args.ocr,
+            tool_args.receipts.as_deref(),
+        );

        // Perform the extraction
-        let result = extract_pdf(&path_buf, &options)
-            .map_err(|e| ErrorObject::server_error(
-                super::ERROR_IO_ERROR,
-                format!("Extraction failed: {}", e),
-            ).with_data(json!({"code": super::CODE_IO_ERROR})))?;
+        let result = extract_pdf(&path_buf, &options).map_err(|e| {
+            ErrorObject::server_error(super::ERROR_IO_ERROR, format!("Extraction failed: {}", e))
+                .with_data(json!({"code": super::CODE_IO_ERROR}))
+        })?;

        Ok(result_to_json(&result))
    }
@ -444,8 +465,8 @@ impl Tool for ExtractTextTool {
    }

    fn execute(&self, args: Value, _log_path: Option<&str>, root: Option<&Path>) -> ToolResult {
-        let tool_args: ExtractTextArgs = serde_json::from_value(args)
-            .map_err(|_| ErrorObject::invalid_params())?;
+        let tool_args: ExtractTextArgs =
+            serde_json::from_value(args).map_err(|_| ErrorObject::invalid_params())?;

        if is_url(&tool_args.path) {
            return Ok(json!({
@ -460,17 +481,22 @@ impl Tool for ExtractTextTool {
        let path_buf = resolve_path(&tool_args.path, root)?;

        // Build extraction options
-        let options = build_extraction_options(&tool_args.pages, &tool_args.ocr, tool_args.receipts.as_deref());
+        let options = build_extraction_options(
+            &tool_args.pages,
+            &tool_args.ocr,
+            tool_args.receipts.as_deref(),
+        );

        // Perform the extraction
-        let result = extract_pdf(&path_buf, &options)
-            .map_err(|e| ErrorObject::server_error(
-                super::ERROR_IO_ERROR,
-                format!("Extraction failed: {}", e),
-            ).with_data(json!({"code": super::CODE_IO_ERROR})))?;
+        let result = extract_pdf(&path_buf, &options).map_err(|e| {
+            ErrorObject::server_error(super::ERROR_IO_ERROR, format!("Extraction failed: {}", e))
+                .with_data(json!({"code": super::CODE_IO_ERROR}))
+        })?;

        // Convert to plain text
-        let text = result.pages.iter()
+        let text = result
+            .pages
+            .iter()
            .flat_map(|page| page.spans.iter().map(|span| span.text.as_str()))
            .collect::<Vec<&str>>()
            .join("\n");
@ -496,8 +522,8 @@ impl Tool for ExtractMarkdownTool {
    }

    fn execute(&self, args: Value, _log_path: Option<&str>, root: Option<&Path>) -> ToolResult {
-        let tool_args: ExtractMarkdownArgs = serde_json::from_value(args)
-            .map_err(|_| ErrorObject::invalid_params())?;
+        let tool_args: ExtractMarkdownArgs =
+            serde_json::from_value(args).map_err(|_| ErrorObject::invalid_params())?;

        if is_url(&tool_args.path) {
            return Ok(json!({
@ -512,19 +538,24 @@ impl Tool for ExtractMarkdownTool {
        let path_buf = resolve_path(&tool_args.path, root)?;

        // Build extraction options
-        let options = build_extraction_options(&tool_args.pages, &tool_args.ocr, tool_args.receipts.as_deref());
+        let options = build_extraction_options(
+            &tool_args.pages,
+            &tool_args.ocr,
+            tool_args.receipts.as_deref(),
+        );

        // Perform the extraction
-        let result = extract_pdf(&path_buf, &options)
-            .map_err(|e| ErrorObject::server_error(
-                super::ERROR_IO_ERROR,
-                format!("Extraction failed: {}", e),
-            ).with_data(json!({"code": super::CODE_IO_ERROR})))?;
+        let result = extract_pdf(&path_buf, &options).map_err(|e| {
+            ErrorObject::server_error(super::ERROR_IO_ERROR, format!("Extraction failed: {}", e))
+                .with_data(json!({"code": super::CODE_IO_ERROR}))
+        })?;

        // Convert to markdown
-        let markdown = result.pages.iter()
-            .flat_map(|page| page.blocks.iter().map(|block| {
-                match block.kind.as_str() {
+        let markdown = result
+            .pages
+            .iter()
+            .flat_map(|page| {
+                page.blocks.iter().map(|block| match block.kind.as_str() {
                    "heading" => {
                        let level = block.level.unwrap_or(1);
                        let prefix = "#".repeat(level as usize);
@ -532,8 +563,8 @@ impl Tool for ExtractMarkdownTool {
                    }
                    "paragraph" => format!("{}\n", block.text),
                    _ => format!("{}\n", block.text),
-                }
-            }))
+                })
+            })
            .collect::<Vec<String>>()
            .join("\n");

@ -558,8 +589,8 @@ impl Tool for SearchTool {
    }

    fn execute(&self, args: Value, _log_path: Option<&str>, root: Option<&Path>) -> ToolResult {
-        let tool_args: SearchArgs = serde_json::from_value(args)
-            .map_err(|_| ErrorObject::invalid_params())?;
+        let tool_args: SearchArgs =
+            serde_json::from_value(args).map_err(|_| ErrorObject::invalid_params())?;

        // Validate the regex pattern
        let _regex = Regex::new(&tool_args.pattern).map_err(|e| {
@ -603,8 +634,8 @@ impl Tool for GetMetadataTool {
    }

    fn execute(&self, args: Value, _log_path: Option<&str>, root: Option<&Path>) -> ToolResult {
-        let tool_args: GetMetadataArgs = serde_json::from_value(args)
-            .map_err(|_| ErrorObject::invalid_params())?;
+        let tool_args: GetMetadataArgs =
+            serde_json::from_value(args).map_err(|_| ErrorObject::invalid_params())?;

        // Check if path is a URL
        if is_url(&tool_args.path) {
@ -657,14 +688,18 @@ fn extract_metadata(path: &str, _password: Option<&str>, root: Option<&Path>) ->

        // Fingerprint - compute a simple one based on file size and page count
        // Full fingerprint computation would use the Phase 1.7 algorithm
-        let fingerprint = format!("pdftract-v1:{:064x}",
+        let fingerprint = format!(
+            "pdftract-v1:{:064x}",
            sha2::Sha256::digest(
-                format!("{}:{}:{}",
+                format!(
+                    "{}:{}:{}",
                    ctx.source.len().unwrap_or(0),
                    ctx.page_count.unwrap_or(0),
                    catalog.pages_ref.object
-                ).as_bytes()
-            ));
+                )
+                .as_bytes()
+            )
+        );

        Ok(json!({
            "metadata": metadata,
@ -673,13 +708,17 @@ fn extract_metadata(path: &str, _password: Option<&str>, root: Option<&Path>) ->
        }))
    } else {
        // Catalog not available, return partial metadata
-        let fingerprint = format!("pdftract-v1:{:064x}",
+        let fingerprint = format!(
+            "pdftract-v1:{:064x}",
            sha2::Sha256::digest(
-                format!("{}:{}",
+                format!(
+                    "{}:{}",
                    ctx.source.len().unwrap_or(0),
                    ctx.page_count.unwrap_or(0)
-                ).as_bytes()
-            ));
+                )
+                .as_bytes()
+            )
+        );

        Ok(json!({
            "metadata": metadata,
@ -706,8 +745,8 @@ impl Tool for HashTool {
    }

    fn execute(&self, args: Value, _log_path: Option<&str>, root: Option<&Path>) -> ToolResult {
-        let tool_args: HashArgs = serde_json::from_value(args)
-            .map_err(|_| ErrorObject::invalid_params())?;
+        let tool_args: HashArgs =
+            serde_json::from_value(args).map_err(|_| ErrorObject::invalid_params())?;

        // Check if path is a URL
        if is_url(&tool_args.path) {
@ -728,31 +767,43 @@ impl Tool for HashTool {
 }

 /// Compute the fingerprint of a PDF file.
-fn compute_fingerprint(path: &str, _password: Option<&str>, root: Option<&Path>) -> Result<String, ErrorObject> {
+fn compute_fingerprint(
+    path: &str,
+    _password: Option<&str>,
+    root: Option<&Path>,
+) -> Result<String, ErrorObject> {
    let ctx = open_pdf(path, _password, root)?;

    // Compute a simplified fingerprint for now
    // Full fingerprint computation would use the Phase 1.7 algorithm with
    // content stream hashing, resource dict hashing, etc.
    if let Some(catalog) = &ctx.catalog {
-        let fingerprint = format!("pdftract-v1:{:064x}",
+        let fingerprint = format!(
+            "pdftract-v1:{:064x}",
            sha2::Sha256::digest(
-                format!("{}:{}:{}:{}",
+                format!(
+                    "{}:{}:{}:{}",
                    ctx.source.len().unwrap_or(0),
                    ctx.page_count.unwrap_or(0),
                    catalog.pages_ref.object,
                    catalog.mark_info.is_tagged
-                ).as_bytes()
-            ));
+                )
+                .as_bytes()
+            )
+        );
        Ok(fingerprint)
    } else {
-        let fingerprint = format!("pdftract-v1:{:064x}",
+        let fingerprint = format!(
+            "pdftract-v1:{:064x}",
            sha2::Sha256::digest(
-                format!("{}:{}",
+                format!(
+                    "{}:{}",
                    ctx.source.len().unwrap_or(0),
                    ctx.page_count.unwrap_or(0)
-                ).as_bytes()
-            ));
+                )
+                .as_bytes()
+            )
+        );
        Ok(fingerprint)
    }
 }
@ -1006,7 +1057,11 @@ mod tests {

        // Test get_table
        let tool = registry.get("get_table").unwrap();
-        let result = tool.execute(json!({"path": "test.pdf", "page": 0, "table_index": 0}), None, None);
+        let result = tool.execute(
+            json!({"path": "test.pdf", "page": 0, "table_index": 0}),
+            None,
+            None,
+        );
        assert!(result.is_err());
        let err = result.unwrap_err();
        assert_eq!(err.code, ERROR_NOT_YET_IMPLEMENTED);
@ -1061,7 +1116,10 @@ mod tests {

        // Create a JSON Schema validator
        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "Extract tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "Extract tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1070,7 +1128,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "ExtractText tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "ExtractText tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1079,7 +1140,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "ExtractMarkdown tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "ExtractMarkdown tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1088,7 +1152,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "Search tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "Search tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1097,7 +1164,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "GetMetadata tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "GetMetadata tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1106,7 +1176,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "Hash tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "Hash tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1115,7 +1188,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "GetTable tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "GetTable tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1124,7 +1200,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "GetFormFields tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "GetFormFields tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1133,7 +1212,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "GetAttachments tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "GetAttachments tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1142,7 +1224,10 @@ mod tests {
        let schema = tool.input_schema();

        let compilation_result = jsonschema::JSONSchema::compile(&schema);
-        assert!(compilation_result.is_ok(), "Classify tool schema should be valid JSON Schema");
+        assert!(
+            compilation_result.is_ok(),
+            "Classify tool schema should be valid JSON Schema"
+        );
    }

    #[test]
@ -1152,10 +1237,12 @@ mod tests {
        for (_key, tool) in &registry.tools {
            let schema = tool.input_schema();
            let compilation_result = jsonschema::JSONSchema::compile(&schema);
-            assert!(compilation_result.is_ok(),
+            assert!(
+                compilation_result.is_ok(),
                "Tool '{}' schema should be valid JSON Schema: {:?}",
                tool.name(),
-                compilation_result.err());
+                compilation_result.err()
+            );
        }
    }

--- a/crates/pdftract-cli/src/password.rs
+++ b/crates/pdftract-cli/src/password.rs
@ -105,7 +105,9 @@ fn read_password_from_stdin() -> Result<Option<secrecy::SecretString>> {
        return Ok(None);
    }

-    Ok(Some(secrecy::SecretString::new(password.to_string().into_boxed_str())))
+    Ok(Some(secrecy::SecretString::new(
+        password.to_string().into_boxed_str(),
+    )))
 }

 #[cfg(test)]
@ -153,7 +155,10 @@ mod tests {
    fn test_resolve_password_empty_env_var() {
        std::env::set_var(ENV_PASSWORD, "");
        let result = resolve_password(false, None).unwrap();
-        assert!(result.is_none(), "Empty env var should be treated as no password");
+        assert!(
+            result.is_none(),
+            "Empty env var should be treated as no password"
+        );
        std::env::remove_var(ENV_PASSWORD);
    }

--- a/crates/pdftract-cli/src/serve.rs
+++ b/crates/pdftract-cli/src/serve.rs
@ -25,9 +25,9 @@ use axum::{
    routing::{get, post},
    Router,
 };
-use pdftract_core::options::{ExtractionOptions, ReceiptsMode};
-use pdftract_core::extract::{extract_pdf, result_to_json};
 use pdftract_core::cache;
+use pdftract_core::extract::{extract_pdf, result_to_json};
+use pdftract_core::options::{ExtractionOptions, ReceiptsMode};
 use serde::Deserialize;
 use std::path::{Path, PathBuf};
 use std::sync::Arc;
@ -145,17 +145,23 @@ pub async fn run(
        .layer(RequestBodyLimitLayer::new(max_body_bytes))
        .with_state(state);

-    let listener = tokio::net::TcpListener::bind(&bind_addr).await
+    let listener = tokio::net::TcpListener::bind(&bind_addr)
+        .await
        .context(format!("Failed to bind to {}", bind_addr))?;

    eprintln!("pdftract serve listening on http://{}", bind_addr);
    if let Some(dir) = cache_dir_for_logging {
-        eprintln!("Cache enabled: {} (max {} bytes)", dir.display(), cache_size_bytes);
+        eprintln!(
+            "Cache enabled: {} (max {} bytes)",
+            dir.display(),
+            cache_size_bytes
+        );
    } else {
        eprintln!("Cache disabled");
    }

-    axum::serve(listener, app).await
+    axum::serve(listener, app)
+        .await
        .context("HTTP server error")?;

    Ok(())
@ -199,8 +205,14 @@ async fn extract_handler(
    let pdf_file_clone = pdf_file.clone();
    let (result, cache_status, cache_age) = tokio::task::spawn_blocking(move || {
        let cache_dir_ref = cache_dir.as_deref();
-        cache::extract_with_cache(&pdf_file_clone, &options, cache_dir_ref, cache_disabled, Some(cache_size_bytes))
-            .map_err(|e| AxumError::Extraction(format!("{:?}", e)))
+        cache::extract_with_cache(
+            &pdf_file_clone,
+            &options,
+            cache_dir_ref,
+            cache_disabled,
+            Some(cache_size_bytes),
+        )
+        .map_err(|e| AxumError::Extraction(format!("{:?}", e)))
    })
    .await
    .map_err(|e| AxumError::Internal(format!("{:?}", e)))?
@ -216,7 +228,10 @@ async fn extract_handler(
    let response = AxumResponse::builder()
        .status(StatusCode::OK)
        .header("Content-Type", "application/json")
-        .header("X-Pdftract-Cache", CacheStatus::from_string(&cache_status).header_value())
+        .header(
+            "X-Pdftract-Cache",
+            CacheStatus::from_string(&cache_status).header_value(),
+        )
        .body(Body::from(serde_json::to_string(&json).unwrap()))
        .map_err(|e| AxumError::Internal(format!("{:?}", e)))?;

@ -240,8 +255,14 @@ async fn extract_text_handler(

    let (result, cache_status, _cache_age) = tokio::task::spawn_blocking(move || {
        let cache_dir_ref = cache_dir.as_deref();
-        cache::extract_with_cache(&pdf_file, &options, cache_dir_ref, cache_disabled, Some(cache_size_bytes))
-            .map_err(|e| AxumError::Extraction(format!("{:?}", e)))
+        cache::extract_with_cache(
+            &pdf_file,
+            &options,
+            cache_dir_ref,
+            cache_disabled,
+            Some(cache_size_bytes),
+        )
+        .map_err(|e| AxumError::Extraction(format!("{:?}", e)))
    })
    .await
    .map_err(|e| AxumError::Internal(format!("{:?}", e)))?
@ -257,7 +278,10 @@ async fn extract_text_handler(

    let response = AxumResponse::builder()
        .status(StatusCode::OK)
-        .header("X-Pdftract-Cache", CacheStatus::from_string(&cache_status).header_value())
+        .header(
+            "X-Pdftract-Cache",
+            CacheStatus::from_string(&cache_status).header_value(),
+        )
        .body(Body::from(text))
        .map_err(|e| AxumError::Internal(format!("{:?}", e)))?;

@ -281,8 +305,14 @@ async fn extract_stream_handler(

    let (result, _cache_status, _cache_age) = tokio::task::spawn_blocking(move || {
        let cache_dir_ref = cache_dir.as_deref();
-        cache::extract_with_cache(&pdf_file, &options, cache_dir_ref, cache_disabled, Some(cache_size_bytes))
-            .map_err(|e| AxumError::Extraction(format!("{:?}", e)))
+        cache::extract_with_cache(
+            &pdf_file,
+            &options,
+            cache_dir_ref,
+            cache_disabled,
+            Some(cache_size_bytes),
+        )
+        .map_err(|e| AxumError::Extraction(format!("{:?}", e)))
    })
    .await
    .map_err(|e| AxumError::Internal(format!("{:?}", e)))?
@ -319,19 +349,24 @@ async fn receive_pdf(multipart: &mut Multipart) -> Result<(PathBuf, ExtractParam
        full_render: false,
    };

-    while let Some(field) = multipart.next_field().await
+    while let Some(field) = multipart
+        .next_field()
+        .await
        .map_err(|e| AxumError::Internal(format!("{:?}", e)))?
    {
        let name = field.name().unwrap_or("").to_string();

        if name == "file" || name == "pdf" {
-            let data = field.bytes().await
+            let data = field
+                .bytes()
+                .await
                .map_err(|e| AxumError::Internal(format!("{:?}", e)))?;

            // Create a temp file that will persist for the duration of the request
            let temp_dir = std::env::temp_dir();
            let temp_file = temp_dir.join(format!("pdftract-upload-{}.pdf", uuid::Uuid::new_v4()));
-            tokio::fs::write(&temp_file, &data).await
+            tokio::fs::write(&temp_file, &data)
+                .await
                .map_err(|e| AxumError::Internal(format!("{:?}", e)))?;
            pdf_path = Some(temp_file);
        } else if name == "receipts" {
@ -352,7 +387,8 @@ async fn receive_pdf(multipart: &mut Multipart) -> Result<(PathBuf, ExtractParam
        }
    }

-    let pdf_path = pdf_path.ok_or_else(|| AxumError::BadRequest("No PDF file uploaded".to_string()))?;
+    let pdf_path =
+        pdf_path.ok_or_else(|| AxumError::BadRequest("No PDF file uploaded".to_string()))?;

    Ok((pdf_path, params))
 }
@ -378,7 +414,8 @@ fn build_options(params: &ExtractParams) -> Result<ExtractionOptions, AxumError>
            if !has_full_render() {
                return Err(AxumError::BadRequest(
                    "full_render requested but PDFium is not available at runtime. \
-                    Ensure the PDFium native library is installed.".to_string()
+                    Ensure the PDFium native library is installed."
+                        .to_string(),
                ));
            }
        }
--- a/crates/pdftract-cli/src/verify_receipt.rs
+++ b/crates/pdftract-cli/src/verify_receipt.rs
@ -6,11 +6,11 @@
 use anyhow::{Context, Result};
 use clap::Args;
 use pdftract_core::document::{self, compute_pdf_fingerprint, extract_spans_from_page};
-use pdftract_core::receipts::Receipt;
 use pdftract_core::receipts::verifier::{exit_code, SpanData, VerificationResult};
+use pdftract_core::receipts::Receipt;
 use std::fs;
-use std::path::PathBuf;
 use std::io::{self, Read};
+use std::path::PathBuf;

 /// Verify a receipt against a PDF file.
 #[derive(Args)]
@ -96,7 +96,10 @@ pub fn run_verify_receipt(cmd: VerifyReceiptCommand) -> Result<()> {
        binary_version,
    ) {
        eprintln!("Error: {}", e);
-        eprintln!("Install pdftract v{} to verify this receipt", receipt.extraction_version);
+        eprintln!(
+            "Install pdftract v{} to verify this receipt",
+            receipt.extraction_version
+        );
        std::process::exit(exit_code::EXTRACTION_FAILED);
    }

@ -130,18 +133,18 @@ pub fn run_verify_receipt(cmd: VerifyReceiptCommand) -> Result<()> {
        Ok(spans) => spans,
        Err(e) => {
            if !cmd.json && !cmd.quiet {
-                eprintln!("Error: Failed to extract spans from page {}: {}", receipt.page_index, e);
+                eprintln!(
+                    "Error: Failed to extract spans from page {}: {}",
+                    receipt.page_index, e
+                );
            }
            std::process::exit(exit_code::EXTRACTION_FAILED);
        }
    };

    // Step 5: Run verification protocol
-    let result = pdftract_core::receipts::verifier::verify_receipt(
-        &receipt,
-        &spans,
-        &actual_fingerprint,
-    );
+    let result =
+        pdftract_core::receipts::verifier::verify_receipt(&receipt, &spans, &actual_fingerprint);

    // Step 6: Output result
    output_result(&result, &receipt, &actual_fingerprint, &cmd);
@ -156,7 +159,8 @@ fn load_receipt(cmd: &VerifyReceiptCommand) -> Result<Receipt> {
        inline.clone()
    } else if cmd.stdin || cmd.receipt_path.to_string_lossy() == "-" {
        let mut buffer = String::new();
-        io::stdin().read_to_string(&mut buffer)
+        io::stdin()
+            .read_to_string(&mut buffer)
            .context("Failed to read receipt from stdin")?;
        buffer
    } else {
@ -164,8 +168,8 @@ fn load_receipt(cmd: &VerifyReceiptCommand) -> Result<Receipt> {
            .with_context(|| format!("Failed to read receipt from {:?}", cmd.receipt_path))?
    };

-    let receipt: Receipt = serde_json::from_str(&receipt_json)
-        .context("Failed to parse receipt JSON")?;
+    let receipt: Receipt =
+        serde_json::from_str(&receipt_json).context("Failed to parse receipt JSON")?;
    Ok(receipt)
 }

@ -179,7 +183,10 @@ fn output_result(
    if cmd.json {
        // JSON output
        let output = match result {
-            VerificationResult::Ok { best_iou, actual_content_hash } => {
+            VerificationResult::Ok {
+                best_iou,
+                actual_content_hash,
+            } => {
                let expected_hash = receipt.content_hash.clone();
                VerificationJsonOutput {
                    status: "ok".to_string(),
@ -202,45 +209,47 @@ fn output_result(
                    error: Some(format!("Expected fingerprint {}, got {}", expected, actual)),
                }
            }
-            VerificationResult::BboxMismatch { best_iou, threshold } => {
-                VerificationJsonOutput {
-                    status: "bbox_mismatch".to_string(),
-                    pdf_fingerprint: actual_fingerprint.to_string(),
-                    page_index: receipt.page_index,
-                    best_iou: *best_iou,
-                    expected_content_hash: None,
-                    actual_content_hash: None,
-                    error: Some(format!(
-                        "No span meets IoU threshold {} (best IoU: {:.3})",
-                        threshold, best_iou
-                    )),
-                }
-            }
+            VerificationResult::BboxMismatch {
+                best_iou,
+                threshold,
+            } => VerificationJsonOutput {
+                status: "bbox_mismatch".to_string(),
+                pdf_fingerprint: actual_fingerprint.to_string(),
+                page_index: receipt.page_index,
+                best_iou: *best_iou,
+                expected_content_hash: None,
+                actual_content_hash: None,
+                error: Some(format!(
+                    "No span meets IoU threshold {} (best IoU: {:.3})",
+                    threshold, best_iou
+                )),
+            },
            VerificationResult::ContentMismatch {
                best_iou,
                expected_hash,
                actual_hash,
-            } => {
-                VerificationJsonOutput {
-                    status: "content_mismatch".to_string(),
-                    pdf_fingerprint: actual_fingerprint.to_string(),
-                    page_index: receipt.page_index,
-                    best_iou: *best_iou,
-                    expected_content_hash: Some(expected_hash.clone()),
-                    actual_content_hash: Some(actual_hash.clone()),
-                    error: Some(format!(
-                        "Content hash mismatch: expected {}, got {}",
-                        expected_hash, actual_hash
-                    )),
-                }
-            }
+            } => VerificationJsonOutput {
+                status: "content_mismatch".to_string(),
+                pdf_fingerprint: actual_fingerprint.to_string(),
+                page_index: receipt.page_index,
+                best_iou: *best_iou,
+                expected_content_hash: Some(expected_hash.clone()),
+                actual_content_hash: Some(actual_hash.clone()),
+                error: Some(format!(
+                    "Content hash mismatch: expected {}, got {}",
+                    expected_hash, actual_hash
+                )),
+            },
        };

        println!("{}", serde_json::to_string(&output).unwrap());
    } else if !cmd.quiet {
        // Human-readable output
        match result {
-            VerificationResult::Ok { best_iou, actual_content_hash } => {
+            VerificationResult::Ok {
+                best_iou,
+                actual_content_hash,
+            } => {
                println!(
                    "Receipt verified: {} page {} bbox [{}, {}, {}, {}]",
                    receipt.pdf_fingerprint,
@ -250,7 +259,10 @@ fn output_result(
                    receipt.bbox[2],
                    receipt.bbox[3]
                );
-                println!("Best-match span IoU: {:.3}, content_hash: {}", best_iou, actual_content_hash);
+                println!(
+                    "Best-match span IoU: {:.3}, content_hash: {}",
+                    best_iou, actual_content_hash
+                );
            }
            VerificationResult::FingerprintMismatch { expected, actual } => {
                eprintln!("Error: PDF fingerprint mismatch");
@ -259,14 +271,24 @@ fn output_result(
                eprintln!();
                eprintln!("The receipt was created for a different PDF file.");
            }
-            VerificationResult::BboxMismatch { best_iou, threshold } => {
-                eprintln!("Error: Bbox mismatch (no span meets {}% IoU threshold)", threshold * 100.0);
+            VerificationResult::BboxMismatch {
+                best_iou,
+                threshold,
+            } => {
+                eprintln!(
+                    "Error: Bbox mismatch (no span meets {}% IoU threshold)",
+                    threshold * 100.0
+                );
                eprintln!("  Best IoU: {:.3}%", best_iou * 100.0);
-                eprintln!("  Receipt bbox: [{}, {}, {}, {}]",
-                    receipt.bbox[0], receipt.bbox[1], receipt.bbox[2], receipt.bbox[3]);
+                eprintln!(
+                    "  Receipt bbox: [{}, {}, {}, {}]",
+                    receipt.bbox[0], receipt.bbox[1], receipt.bbox[2], receipt.bbox[3]
+                );
                eprintln!();
-                eprintln!("No text span on page {} matches the receipt's bounding box.",
-                    receipt.page_index);
+                eprintln!(
+                    "No text span on page {} matches the receipt's bounding box.",
+                    receipt.page_index
+                );
            }
            VerificationResult::ContentMismatch {
                best_iou,
@ -278,7 +300,9 @@ fn output_result(
                eprintln!("  Expected hash: {}", expected_hash);
                eprintln!("  Actual hash:   {}", actual_hash);
                eprintln!();
-                eprintln!("The text at the receipt's location has changed since the receipt was created.");
+                eprintln!(
+                    "The text at the receipt's location has changed since the receipt was created."
+                );
            }
        }
    }
--- a/crates/pdftract-cli/tests/conformance.rs
+++ b/crates/pdftract-cli/tests/conformance.rs
@ -19,14 +19,8 @@ const SDK_VERSION: &str = env!("CARGO_PKG_VERSION");

 /// Simple semver comparison - returns Less if v1 < v2
 fn compare_versions(v1: &str, v2: &str) -> std::cmp::Ordering {
-    let v1_parts: Vec<u32> = v1
-        .split('.')
-        .filter_map(|s| s.parse().ok())
-        .collect();
-    let v2_parts: Vec<u32> = v2
-        .split('.')
-        .filter_map(|s| s.parse().ok())
-        .collect();
+    let v1_parts: Vec<u32> = v1.split('.').filter_map(|s| s.parse().ok()).collect();
+    let v2_parts: Vec<u32> = v2.split('.').filter_map(|s| s.parse().ok()).collect();

    for (a, b) in v1_parts.iter().zip(v2_parts.iter()) {
        match a.cmp(b) {
@ -181,8 +175,8 @@ fn run_conformance(suite_path: &str, output_path: &str) -> Result<()> {
 }

 fn load_suite(path: &str) -> Result<Value> {
-    let suite_json = fs::read_to_string(path)
-        .context(format!("Failed to read suite from {}", path))?;
+    let suite_json =
+        fs::read_to_string(path).context(format!("Failed to read suite from {}", path))?;
    serde_json::from_str(&suite_json).context("Failed to parse suite as JSON")
 }

@ -212,8 +206,14 @@ fn run_test_case(case: &Value, schema_version: &str) -> Result<TestResult> {

    let fixture = case["fixture"].as_str().unwrap_or("");
    let method = case["method"].as_str().unwrap_or("extract");
-    let options = case.get("options").cloned().unwrap_or(Value::Object(Default::default()));
-    let expected = case.get("expected").cloned().unwrap_or(Value::Object(Default::default()));
+    let options = case
+        .get("options")
+        .cloned()
+        .unwrap_or(Value::Object(Default::default()));
+    let expected = case
+        .get("expected")
+        .cloned()
+        .unwrap_or(Value::Object(Default::default()));
    let tolerances = case.get("tolerances").cloned();

    let fixture_path = if fixture.starts_with("http://") || fixture.starts_with("https://") {
@ -283,10 +283,10 @@ fn execute_method(method: &str, fixture: &str, options: &Value) -> Result<Value>
            }))
        }
        "extract_text" => Ok(Value::String("Sample text content".to_string())),
-        "extract_markdown" => Ok(Value::String("# Sample Markdown\n\nContent here".to_string())),
-        "extract_stream" => {
-            Ok(serde_json::json!({"output_type": "iterator", "frame_count": 3}))
-        }
+        "extract_markdown" => Ok(Value::String(
+            "# Sample Markdown\n\nContent here".to_string(),
+        )),
+        "extract_stream" => Ok(serde_json::json!({"output_type": "iterator", "frame_count": 3})),
        "search" => Ok(serde_json::json!({
            "output_type": "iterator",
            "matches": [{"page": 0, "text": "found"}]
@ -346,7 +346,10 @@ fn compare_recursive(
            }
        }
        (Value::String(act), Value::Object(exp)) => {
-            if let Some(min_len) = exp.get("min_length").and_then(|v| v.as_u64().map(|v| v as usize)) {
+            if let Some(min_len) = exp
+                .get("min_length")
+                .and_then(|v| v.as_u64().map(|v| v as usize))
+            {
                if act.len() < min_len {
                    return Err(format!(
                        "[{}]: string length {} is less than minimum {}",
@ -428,14 +431,14 @@ fn compare_number(
    tolerance: Option<&Value>,
    path: &str,
 ) -> Result<(), String> {
-    let act_val = actual.as_f64().ok_or_else(|| {
-        format!("[{}]: actual number is not f64-representable", path)
-    })?;
+    let act_val = actual
+        .as_f64()
+        .ok_or_else(|| format!("[{}]: actual number is not f64-representable", path))?;

    let exp_val = match expected {
-        Value::Number(n) => n.as_f64().ok_or_else(|| {
-            format!("[{}]: expected number is not f64-representable", path)
-        })?,
+        Value::Number(n) => n
+            .as_f64()
+            .ok_or_else(|| format!("[{}]: expected number is not f64-representable", path))?,
        _ => {
            return Err(format!("[{}]: expected value is not a number", path));
        }
@ -532,13 +535,15 @@ fn write_report(report: &ConformanceReport, path: &str) -> Result<()> {
        obj.insert("id".to_string(), Value::String(r.id.clone()));
        obj.insert(
            "status".to_string(),
-            Value::String(match r.status {
-                TestStatus::Pass => "pass",
-                TestStatus::Fail => "fail",
-                TestStatus::Skip => "skip",
-                TestStatus::Error => "error",
-            }
-            .to_string()),
+            Value::String(
+                match r.status {
+                    TestStatus::Pass => "pass",
+                    TestStatus::Fail => "fail",
+                    TestStatus::Skip => "skip",
+                    TestStatus::Error => "error",
+                }
+                .to_string(),
+            ),
        );
        if let Some(actual) = &r.actual {
            obj.insert("actual".to_string(), actual.clone());
--- a/crates/pdftract-cli/tests/mcp-cli-args.rs
+++ b/crates/pdftract-cli/tests/mcp-cli-args.rs
@ -24,13 +24,27 @@ fn test_stdio_and_bind_mutually_exclusive() {
        .expect("Failed to execute pdftract mcp --stdio --bind");

    // Should fail with exit code 2 (clap's error exit code)
-    assert_eq!(output.status.code(), Some(2), "Expected exit code 2, got {:?}", output.status.code());
+    assert_eq!(
+        output.status.code(),
+        Some(2),
+        "Expected exit code 2, got {:?}",
+        output.status.code()
+    );

    // Error message should mention both flags
    let stderr = String::from_utf8_lossy(&output.stderr);
-    assert!(stderr.contains("--stdio"), "Error message should mention --stdio");
-    assert!(stderr.contains("--bind"), "Error message should mention --bind");
-    assert!(stderr.contains("cannot be used"), "Error message should mention conflict");
+    assert!(
+        stderr.contains("--stdio"),
+        "Error message should mention --stdio"
+    );
+    assert!(
+        stderr.contains("--bind"),
+        "Error message should mention --bind"
+    );
+    assert!(
+        stderr.contains("cannot be used"),
+        "Error message should mention conflict"
+    );
 }

 /// Test that `pdftract mcp` (no flags) parses successfully.
@ -45,12 +59,21 @@ fn test_default_to_stdio() {
        .expect("Failed to execute pdftract mcp --help");

    // Should succeed
-    assert!(output.status.success(), "pdftract mcp --help should succeed");
+    assert!(
+        output.status.success(),
+        "pdftract mcp --help should succeed"
+    );

    // Help text should mention the default behavior
    let stdout = String::from_utf8_lossy(&output.stdout);
-    assert!(stdout.contains("default"), "Help should mention default transport mode");
-    assert!(stdout.contains("stdio"), "Help should mention stdio transport");
+    assert!(
+        stdout.contains("default"),
+        "Help should mention default transport mode"
+    );
+    assert!(
+        stdout.contains("stdio"),
+        "Help should mention stdio transport"
+    );
 }

 /// Test that `pdftract mcp --stdio` parses successfully.
@ -67,7 +90,10 @@ fn test_stdio_flag_valid() {

    // Note: --help overrides the subcommand, so this succeeds
    // In actual use, --stdio would start the stdio server
-    assert!(output.status.success(), "pdftract mcp --stdio --help should succeed");
+    assert!(
+        output.status.success(),
+        "pdftract mcp --stdio --help should succeed"
+    );
 }

 /// Test that `pdftract mcp --bind ADDR` parses successfully.
@ -85,7 +111,10 @@ fn test_bind_flag_valid() {

    // Note: --help overrides the subcommand, so this succeeds
    // In actual use, --bind would start the HTTP server
-    assert!(output.status.success(), "pdftract mcp --bind ADDR --help should succeed");
+    assert!(
+        output.status.success(),
+        "pdftract mcp --bind ADDR --help should succeed"
+    );
 }

 /// Test that the help text mentions ADR-006 and the mutual exclusion rationale.
@ -99,10 +128,16 @@ fn test_help_mentions_adr_006() {
        .output()
        .expect("Failed to execute pdftract mcp --help");

-    assert!(output.status.success(), "pdftract mcp --help should succeed");
+    assert!(
+        output.status.success(),
+        "pdftract mcp --help should succeed"
+    );

    let stdout = String::from_utf8_lossy(&output.stdout);
    // Help text should mention ADR-006 and the rationale
    assert!(stdout.contains("ADR-006"), "Help should mention ADR-006");
-    assert!(stdout.contains("mutually exclusive"), "Help should mention mutual exclusion");
+    assert!(
+        stdout.contains("mutually exclusive"),
+        "Help should mention mutual exclusion"
+    );
 }
--- a/crates/pdftract-cli/tests/mcp-http.rs
+++ b/crates/pdftract-cli/tests/mcp-http.rs
@ -10,13 +10,13 @@
 //! - Batch request handling
 //! - Concurrent client handling (50 clients)

-use std::process::{Command, Stdio, Child};
-use std::thread;
-use std::time::Duration;
-use std::io::{BufRead, BufReader};
-use std::net::TcpListener;
 use reqwest::blocking::Client;
 use serde_json::Value;
+use std::io::{BufRead, BufReader};
+use std::net::TcpListener;
+use std::process::{Child, Command, Stdio};
+use std::thread;
+use std::time::Duration;

 /// Find an available port for testing.
 fn find_available_port() -> u16 {
@ -61,7 +61,8 @@ fn wait_for_server(port: u16, max_wait_ms: u64) -> bool {

    let start = std::time::Instant::now();
    while start.elapsed() < Duration::from_millis(max_wait_ms) {
-        if client.get(&format!("http://127.0.0.1:{}/health", port))
+        if client
+            .get(&format!("http://127.0.0.1:{}/health", port))
            .send()
            .map_or(false, |r| r.status().is_success())
        {
@ -79,7 +80,10 @@ fn test_post_tools_list() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    let request_body = serde_json::json!({
@ -112,7 +116,10 @@ fn test_post_batch_request() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    let request_body = serde_json::json!([
@ -153,7 +160,10 @@ fn test_post_single_request_returns_single_response() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    let request_body = serde_json::json!({
@ -187,7 +197,10 @@ fn test_post_payload_too_large() {
    let mut child = spawn_mcp_http_with_limit(port, 1);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    // Create a payload larger than 1 MB
@ -209,7 +222,10 @@ fn test_post_payload_too_large() {

    let json: Value = response.json().expect("Response is not valid JSON");
    assert_eq!(json["error"]["code"], -32002);
-    assert!(json["error"]["message"].as_str().unwrap().contains("too large"));
+    assert!(json["error"]["message"]
+        .as_str()
+        .unwrap()
+        .contains("too large"));

    // Clean shutdown
    child.kill().ok();
@ -222,7 +238,10 @@ fn test_get_health() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    let response = client
@ -247,7 +266,10 @@ fn test_get_sse_stream() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = reqwest::blocking::Client::builder()
        .timeout(None)
@ -260,8 +282,15 @@ fn test_get_sse_stream() {
        .expect("Failed to send request");

    assert_eq!(response.status(), reqwest::StatusCode::OK);
-    assert_eq!(response.headers().get("content-type").unwrap().to_str().unwrap(),
-               "text/event-stream");
+    assert_eq!(
+        response
+            .headers()
+            .get("content-type")
+            .unwrap()
+            .to_str()
+            .unwrap(),
+        "text/event-stream"
+    );

    // Read the initial connection message
    let reader = BufReader::new(response);
@ -269,7 +298,11 @@ fn test_get_sse_stream() {

    // First line should be a comment (connected)
    if let Some(Ok(line)) = lines.next() {
-        assert!(line.starts_with(": connected"), "Expected ': connected', got: {}", line);
+        assert!(
+            line.starts_with(": connected"),
+            "Expected ': connected', got: {}",
+            line
+        );
    }

    // Clean shutdown
@ -286,7 +319,10 @@ fn test_auth_required_for_non_loopback() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    let request_body = serde_json::json!({
@ -316,7 +352,10 @@ fn test_unknown_method() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = Client::new();
    let request_body = serde_json::json!({
@ -351,7 +390,10 @@ fn test_50_concurrent_clients() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = reqwest::blocking::Client::builder()
        .timeout(Duration::from_secs(5))
@ -372,10 +414,7 @@ fn test_50_concurrent_clients() {
            let url = format!("http://127.0.0.1:{}/", port);

            thread::spawn(move || {
-                let response = client
-                    .post(&url)
-                    .json(&request_body)
-                    .send();
+                let response = client.post(&url).json(&request_body).send();

                (i, response)
            })
@ -413,7 +452,11 @@ fn test_50_concurrent_clients() {
    // All 50 clients should succeed without 5xx errors
    assert_eq!(five_xx_count, 0, "Got {} 5xx errors", five_xx_count);
    assert_eq!(error_count, 0, "Got {} errors", error_count);
-    assert_eq!(success_count, 50, "Got {} successes, expected 50", success_count);
+    assert_eq!(
+        success_count, 50,
+        "Got {} successes, expected 50",
+        success_count
+    );

    // Clean shutdown
    child.kill().ok();
@ -426,7 +469,10 @@ fn test_health_during_load() {
    let mut child = spawn_mcp_http(port);

    // Wait for server to be ready
-    assert!(wait_for_server(port, 2000), "Server did not start within 2 seconds");
+    assert!(
+        wait_for_server(port, 2000),
+        "Server did not start within 2 seconds"
+    );

    let client = reqwest::blocking::Client::builder()
        .timeout(Duration::from_secs(5))
@ -446,9 +492,7 @@ fn test_health_during_load() {
            let request_body = request_body.clone();
            let url = format!("http://127.0.0.1:{}/", port);

-            thread::spawn(move || {
-                client.post(&url).json(&request_body).send()
-            })
+            thread::spawn(move || client.post(&url).json(&request_body).send())
        })
        .collect();

--- a/crates/pdftract-cli/tests/mcp-stdio.rs
+++ b/crates/pdftract-cli/tests/mcp-stdio.rs
@ -25,7 +25,10 @@ fn spawn_mcp_stdio() -> std::process::Child {
 }

 /// Helper to write a framed JSON-RPC message to stdin.
-fn write_framed_message(stdin: &mut std::process::ChildStdin, json_body: &str) -> std::io::Result<()> {
+fn write_framed_message(
+    stdin: &mut std::process::ChildStdin,
+    json_body: &str,
+) -> std::io::Result<()> {
    let header = format!("Content-Length: {}\r\n\r\n", json_body.len());
    stdin.write_all(header.as_bytes())?;
    stdin.write_all(json_body.as_bytes())?;
@ -52,13 +55,20 @@ fn read_framed_response<R: Read>(reader: &mut BufReader<R>) -> std::io::Result<O
        }

        if let Some(value) = line.strip_prefix("Content-Length:") {
-            content_length = Some(value.trim().parse::<usize>()
-                .map_err(|e| std::io::Error::new(std::io::ErrorKind::InvalidData, e))?);
+            content_length = Some(
+                value
+                    .trim()
+                    .parse::<usize>()
+                    .map_err(|e| std::io::Error::new(std::io::ErrorKind::InvalidData, e))?,
+            );
        }
    }

    let content_length = content_length.ok_or_else(|| {
-        std::io::Error::new(std::io::ErrorKind::InvalidData, "Missing Content-Length header")
+        std::io::Error::new(
+            std::io::ErrorKind::InvalidData,
+            "Missing Content-Length header",
+        )
    })?;

    let mut buffer = vec![0u8; content_length];
@ -98,8 +108,8 @@ fn test_tools_list_roundtrip() {
    assert!(response.contains(r#""result""#));

    // Verify it's valid JSON
-    let parsed: serde_json::Value = serde_json::from_str(&response)
-        .expect("Response is not valid JSON");
+    let parsed: serde_json::Value =
+        serde_json::from_str(&response).expect("Response is not valid JSON");

    assert_eq!(parsed["jsonrpc"], "2.0");
    assert_eq!(parsed["id"], 1);
@ -135,7 +145,11 @@ fn test_eof_clean_shutdown() {
        }
    };

-    assert!(status.success(), "Process did not exit cleanly: {:?}", status);
+    assert!(
+        status.success(),
+        "Process did not exit cleanly: {:?}",
+        status
+    );
 }

 /// Test that a parse error returns -32700 with id: null.
@ -186,8 +200,7 @@ fn test_parse_error_recovery() {
    {
        let stdout = child.stdout.as_mut().expect("Failed to open stdout");
        let mut reader = BufReader::new(stdout);
-        read_framed_response(&mut reader)
-            .expect("Failed to read error response");
+        read_framed_response(&mut reader).expect("Failed to read error response");
    }

    // Now send a valid request
@ -253,18 +266,24 @@ fn test_stdout_json_rpc_only() {
    child.kill().ok();

    // Verify stdout is valid framed JSON-RPC
-    assert!(response.contains(r#"{"jsonrpc":"2.0""#), "Missing JSON-RPC response");
+    assert!(
+        response.contains(r#"{"jsonrpc":"2.0""#),
+        "Missing JSON-RPC response"
+    );
    assert!(response.contains(r#""result""#), "Missing result field");

    // Verify stderr contains logs (logs go to stderr, not stdout)
    // The startup banner or other logs should be in stderr
-    let stderr_has_logs = !stderr_output.is_empty() ||
-        stderr_output.contains("pdftract") ||
-        stderr_output.contains("stdio") ||
-        stderr_output.contains("MCP") ||
-        stderr_output.contains("Signal");
-    assert!(stderr_has_logs || stderr_output.is_empty(),
-            "Stderr should contain logs, got: {}", stderr_output);
+    let stderr_has_logs = !stderr_output.is_empty()
+        || stderr_output.contains("pdftract")
+        || stderr_output.contains("stdio")
+        || stderr_output.contains("MCP")
+        || stderr_output.contains("Signal");
+    assert!(
+        stderr_has_logs || stderr_output.is_empty(),
+        "Stderr should contain logs, got: {}",
+        stderr_output
+    );
 }

 /// Test timing: request-response should complete within 50ms.
@ -291,8 +310,11 @@ fn test_request_response_timing() {
    }
    let elapsed = start.elapsed();

-    assert!(elapsed < Duration::from_millis(100),
-            "Request-response took {:?}, expected < 50ms", elapsed);
+    assert!(
+        elapsed < Duration::from_millis(100),
+        "Request-response took {:?}, expected < 50ms",
+        elapsed
+    );

    // Clean shutdown
    drop(child.stdin.take());
@ -362,7 +384,10 @@ fn test_notification_no_response() {
    // Notifications don't get responses, so we shouldn't see data immediately
    // (unless there's buffering from a previous request)
    // For this test, we just verify the process is still alive
-    assert!(child.try_wait().unwrap().is_none(), "Process died unexpectedly");
+    assert!(
+        child.try_wait().unwrap().is_none(),
+        "Process died unexpectedly"
+    );

    // Clean shutdown
    drop(child.stdin.take());
--- a/crates/pdftract-cli/tests/mcp-tools-integration.rs
+++ b/crates/pdftract-cli/tests/mcp-tools-integration.rs
@ -105,7 +105,10 @@ fn test_phase_7_stub_tools_return_not_implemented() {
    let registry = tools::all_tools();

    let stub_tools = [
-        ("get_table", serde_json::json!({"path": "test.pdf", "page": 0, "table_index": 0})),
+        (
+            "get_table",
+            serde_json::json!({"path": "test.pdf", "page": 0, "table_index": 0}),
+        ),
        ("get_form_fields", serde_json::json!({"path": "test.pdf"})),
        ("get_attachments", serde_json::json!({"path": "test.pdf"})),
        ("classify", serde_json::json!({"path": "test.pdf"})),
@ -161,7 +164,10 @@ fn test_extract_tool_with_real_pdf() {

    let result = tool.execute(args, None, None);
    if let Err(ref e) = result {
-        eprintln!("Error from tool: code={}, message={}, data={:?}", e.code, e.message, e.data);
+        eprintln!(
+            "Error from tool: code={}, message={}, data={:?}",
+            e.code, e.message, e.data
+        );
    }
    assert!(result.is_ok(), "Tool should succeed: {:?}", result);

@ -210,7 +216,10 @@ fn test_path_resolution() {

    // Also check using CARGO_MANIFEST_DIR
    if let Ok(manifest_dir) = std::env::var("CARGO_MANIFEST_DIR") {
-        let abs_path = format!("{}/{}", manifest_dir, "../../tests/sdk-conformance/fixtures/large/100pages.pdf");
+        let abs_path = format!(
+            "{}/{}",
+            manifest_dir, "../../tests/sdk-conformance/fixtures/large/100pages.pdf"
+        );
        let exists = std::path::Path::new(&abs_path).exists();
        println!("Absolute path '{}' exists: {}", abs_path, exists);
    }
@ -252,7 +261,10 @@ fn test_encrypted_pdf_returns_pdf_encrypted_error() {

    // Debug: print the result if it succeeds unexpectedly
    if let Ok(ref response) = result {
-        eprintln!("Unexpected success on encrypted PDF: {}", serde_json::to_string_pretty(response).unwrap());
+        eprintln!(
+            "Unexpected success on encrypted PDF: {}",
+            serde_json::to_string_pretty(response).unwrap()
+        );
    }

    assert!(result.is_err(), "Encrypted PDF should return error");
--- a/crates/pdftract-cli/tests/root-path-protection.rs
+++ b/crates/pdftract-cli/tests/root-path-protection.rs
@ -25,7 +25,10 @@ fn test_acceptance_criteria_path_traversal_rejected() {
    let result = resolve_path("../../../etc/passwd", Some(root));
    assert!(result.is_err());
    let err = result.unwrap_err();
-    assert_eq!(err.code, -32602, "Should return -32602 (Invalid params) for path traversal");
+    assert_eq!(
+        err.code, -32602,
+        "Should return -32602 (Invalid params) for path traversal"
+    );
    assert!(err.message.contains("escapes root"));
 }

@ -67,7 +70,10 @@ fn test_acceptance_criteria_https_url_bypasses_check() {

    let result = resolve_path("https://example.com/file.pdf", Some(root));
    assert!(result.is_ok());
-    assert_eq!(result.unwrap(), std::path::PathBuf::from("https://example.com/file.pdf"));
+    assert_eq!(
+        result.unwrap(),
+        std::path::PathBuf::from("https://example.com/file.pdf")
+    );
 }

 #[test]
@ -75,7 +81,10 @@ fn test_acceptance_criteria_no_root_trust_the_caller() {
    // Without --root, paths should be returned as-is (trust-the-caller mode)
    let result = resolve_path("../../../etc/passwd", None);
    assert!(result.is_ok());
-    assert_eq!(result.unwrap(), std::path::PathBuf::from("../../../etc/passwd"));
+    assert_eq!(
+        result.unwrap(),
+        std::path::PathBuf::from("../../../etc/passwd")
+    );
 }

 #[test]
@ -92,10 +101,8 @@ fn test_acceptance_criteria_symlink_escape_rejected() {

    #[cfg(windows)]
    {
-        std::os::windows::fs::symlink_file(
-            r"C:\Windows\System32\drivers\etc\hosts",
-            &symlink_path
-        ).unwrap();
+        std::os::windows::fs::symlink_file(r"C:\Windows\System32\drivers\etc\hosts", &symlink_path)
+            .unwrap();
    }

    // Try to access the symlink
@ -134,7 +141,10 @@ fn test_plan_critical_test_path_traversal_with_root() {
    let result = resolve_path("../../etc/passwd", Some(root));
    assert!(result.is_err());
    let err = result.unwrap_err();
-    assert_eq!(err.code, -32602, "Critical test: path traversal must return -32602");
+    assert_eq!(
+        err.code, -32602,
+        "Critical test: path traversal must return -32602"
+    );
    assert!(err.message.contains("escapes root"));

    // Verify the error data contains the expected code
@ -152,7 +162,10 @@ fn test_http_url_bypasses_check() {

    let result = resolve_path("http://example.com/file.pdf", Some(root));
    assert!(result.is_ok());
-    assert_eq!(result.unwrap(), std::path::PathBuf::from("http://example.com/file.pdf"));
+    assert_eq!(
+        result.unwrap(),
+        std::path::PathBuf::from("http://example.com/file.pdf")
+    );
 }

 #[test]
@ -205,6 +218,10 @@ fn test_complex_path_traversal_patterns() {
        let result = resolve_path(pattern, Some(root));
        assert!(result.is_err(), "Pattern '{}' should be rejected", pattern);
        let err = result.unwrap_err();
-        assert_eq!(err.code, -32602, "Pattern '{}' should return -32602", pattern);
+        assert_eq!(
+            err.code, -32602,
+            "Pattern '{}' should return -32602",
+            pattern
+        );
    }
 }
--- a/crates/pdftract-core/benches/table_detection.rs
+++ b/crates/pdftract-core/benches/table_detection.rs
@ -3,12 +3,12 @@
 // Tests the performance of line-based and borderless table detection
 // on pages with varying numbers of path segments and text positions.

-use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
-use pdftract_core::table::{TableDetector, PageContext};
-use pdftract_core::parser::pages::PageDict;
-use std::sync::Arc;
+use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion};
 use pdftract_core::parser::object::ObjRef;
+use pdftract_core::parser::pages::PageDict;
 use pdftract_core::parser::resources::ResourceDict;
+use pdftract_core::table::{PageContext, TableDetector};
+use std::sync::Arc;

 fn make_page() -> PageDict {
    PageDict {
@ -99,9 +99,7 @@ fn bench_table_detection(c: &mut Criterion) {
                let content = generate_grid_content(num_horiz, num_vert);
                let ctx = PageContext::new(&page, &content);

-                b.iter(|| {
-                    black_box(detector.detect_line_based(black_box(&ctx)))
-                });
+                b.iter(|| black_box(detector.detect_line_based(black_box(&ctx))));
            },
        );
    }
@ -111,9 +109,7 @@ fn bench_table_detection(c: &mut Criterion) {
        let content = generate_grid_content(500, 500);
        let ctx = PageContext::new(&page, &content);

-        b.iter(|| {
-            black_box(detector.detect_line_based(black_box(&ctx)))
-        });
+        b.iter(|| black_box(detector.detect_line_based(black_box(&ctx))));
    });

    group.finish();
@ -135,9 +131,7 @@ fn bench_borderless_detection(c: &mut Criterion) {
                let content = generate_borderless_content(num_rows, num_cols);
                let ctx = PageContext::new(&page, &content);

-                b.iter(|| {
-                    black_box(detector.detect_borderless(black_box(&ctx)))
-                });
+                b.iter(|| black_box(detector.detect_borderless(black_box(&ctx))));
            },
        );
    }
--- a/crates/pdftract-core/build.rs
+++ b/crates/pdftract-core/build.rs
@ -33,37 +33,42 @@ fn main() {
 }

 fn generate_std14_metrics(out_dir: &Path, metrics_path: &Path) {
+    let json_content = fs::read_to_string(metrics_path).expect("Failed to read std14-metrics.json");

-    let json_content = fs::read_to_string(metrics_path)
-        .expect("Failed to read std14-metrics.json");
+    let data: serde_json::Value =
+        serde_json::from_str(&json_content).expect("Failed to parse std14-metrics.json");

-    let data: serde_json::Value = serde_json::from_str(&json_content)
-        .expect("Failed to parse std14-metrics.json");
-
-    let fonts = data["fonts"].as_object()
-        .expect("fonts object missing");
+    let fonts = data["fonts"].as_object().expect("fonts object missing");

    let mut metrics_structs = String::new();

    for (font_name, font_data) in fonts {
        let font_ident = font_name.replace("-", "_");
-        let weights = font_data["weights"].as_array()
+        let weights = font_data["weights"]
+            .as_array()
            .expect("weights array missing");

-        let weights_array: Vec<String> = weights.iter()
+        let weights_array: Vec<String> = weights
+            .iter()
            .map(|v| v.as_u64().unwrap_or(0).to_string())
            .collect();

-        let font_bbox = font_data["font_bbox"].as_array()
+        let font_bbox = font_data["font_bbox"]
+            .as_array()
            .expect("font_bbox array missing");
-        let font_bbox: Vec<String> = font_bbox.iter()
+        let font_bbox: Vec<String> = font_bbox
+            .iter()
            .map(|v| v.as_i64().unwrap_or(0).to_string())
            .collect();

        let ascent = font_data["ascent"].as_i64().expect("ascent missing");
        let descent = font_data["descent"].as_i64().expect("descent missing");
-        let italic_angle = font_data["italic_angle"].as_f64().expect("italic_angle missing");
-        let cap_height = font_data["cap_height"].as_i64().expect("cap_height missing");
+        let italic_angle = font_data["italic_angle"]
+            .as_f64()
+            .expect("italic_angle missing");
+        let cap_height = font_data["cap_height"]
+            .as_i64()
+            .expect("cap_height missing");
        let stem_v = font_data["stem_v"].as_i64().expect("stem_v missing");

        let encoding_str = font_data["encoding"].as_str().expect("encoding missing");
@ -74,7 +79,8 @@ fn generate_std14_metrics(out_dir: &Path, metrics_path: &Path) {
            _ => "NamedEncoding::Standard",
        };

-        metrics_structs.push_str(&format!(r#"
+        metrics_structs.push_str(&format!(
+            r#"
 static {}_WIDTHS: &[u16; 256] = &[{}];
 static {}_METRICS: Std14Metrics = Std14Metrics {{
    widths: &{}_WIDTHS,
@ -106,10 +112,14 @@ static {}_METRICS: Std14Metrics = Std14Metrics {{

    for font_name in fonts.keys() {
        let ident = font_name.replace("-", "_");
-        map_builder.entry(font_name.as_str(), &format!("&{}_METRICS", ident.to_uppercase()));
+        map_builder.entry(
+            font_name.as_str(),
+            &format!("&{}_METRICS", ident.to_uppercase()),
+        );
    }

-    let rust_code = format!(r#"
+    let rust_code = format!(
+        r#"
 // Auto-generated Standard 14 font metrics.
 // Do not edit manually.

@ -129,14 +139,13 @@ pub fn get_std14_metrics(name: &str) -> Option<&'static Std14Metrics> {{
 }

 fn generate_named_encodings(out_dir: &Path, encodings_path: &Path) {
-    let json_content = fs::read_to_string(encodings_path)
-        .expect("Failed to read named-encodings.json");
+    let json_content =
+        fs::read_to_string(encodings_path).expect("Failed to read named-encodings.json");

-    let data: serde_json::Value = serde_json::from_str(&json_content)
-        .expect("Failed to parse named-encodings.json");
+    let data: serde_json::Value =
+        serde_json::from_str(&json_content).expect("Failed to parse named-encodings.json");

-    let encodings = data.as_object()
-        .expect("encodings object missing");
+    let encodings = data.as_object().expect("encodings object missing");

    let mut encoding_arrays = String::new();

@ -151,7 +160,8 @@ fn generate_named_encodings(out_dir: &Path, encodings_path: &Path) {
            _ => continue,
        };

-        let entries = encoding_data.as_object()
+        let entries = encoding_data
+            .as_object()
            .expect("encoding data is not an object");

        let mut array_values = Vec::new();
@ -165,7 +175,8 @@ fn generate_named_encodings(out_dir: &Path, encodings_path: &Path) {
            array_values.push(rust_value);
        }

-        encoding_arrays.push_str(&format!(r#"
+        encoding_arrays.push_str(&format!(
+            r#"
 pub static {}: [Option<&'static str>; 256] = [
 {}];
 "#,
@ -174,7 +185,8 @@ pub static {}: [Option<&'static str>; 256] = [
        ));
    }

-    let rust_code = format!(r#"
+    let rust_code = format!(
+        r#"
 // Auto-generated named encoding tables.
 // Do not edit manually.
 // Source: ISO 32000-1 Annex D
@ -200,39 +212,39 @@ pub fn get_named_encoding_table(encoding: NamedEncoding) -> &'static [Option<&'s
 }

 fn generate_agl_maps(out_dir: &Path, agl_path: &Path) {
-    let json_content = fs::read_to_string(agl_path)
-        .expect("Failed to read agl.json");
+    let json_content = fs::read_to_string(agl_path).expect("Failed to read agl.json");

-    let data: serde_json::Value = serde_json::from_str(&json_content)
-        .expect("Failed to parse agl.json");
+    let data: serde_json::Value =
+        serde_json::from_str(&json_content).expect("Failed to parse agl.json");

    // Single-codepoint map
-    let single = data["merged_single"].as_object()
+    let single = data["merged_single"]
+        .as_object()
        .expect("merged_single object missing");

    let mut single_map_builder = phf_codegen::Map::new();

    for (name, uvalue) in single {
-        let uvalue_str = uvalue.as_str()
-            .expect("unicode value is not a string");
+        let uvalue_str = uvalue.as_str().expect("unicode value is not a string");
        // Parse the JSON unicode escape like "A" into a Rust char literal
        let unicode_char = decode_json_unicode(uvalue_str);
        single_map_builder.entry(name.as_str(), &format!("'\\u{{{}}}'", unicode_char));
    }

    // Multi-codepoint map
-    let multi = data["merged_multi"].as_object()
+    let multi = data["merged_multi"]
+        .as_object()
        .expect("merged_multi object missing");

    let mut multi_arrays = String::new();
    let mut multi_map_builder = phf_codegen::Map::new();

    for (name, uvalues) in multi {
-        let uvalues_arr = uvalues.as_array()
-            .expect("multi value is not an array");
+        let uvalues_arr = uvalues.as_array().expect("multi value is not an array");
        let ident = name.to_uppercase().replace("-", "_").replace(".", "_");

-        let chars: Vec<String> = uvalues_arr.iter()
+        let chars: Vec<String> = uvalues_arr
+            .iter()
            .map(|v| {
                let uvalue_str = v.as_str().expect("unicode value is not a string");
                let unicode_char = decode_json_unicode(uvalue_str);
@ -240,7 +252,8 @@ fn generate_agl_maps(out_dir: &Path, agl_path: &Path) {
            })
            .collect();

-        multi_arrays.push_str(&format!(r#"
+        multi_arrays.push_str(&format!(
+            r#"
 static {}: &[char] = &[{}];
 "#,
            ident,
@ -250,7 +263,8 @@ static {}: &[char] = &[{}];
        multi_map_builder.entry(name.as_str(), &format!("&{}", ident));
    }

-    let rust_code = format!(r#"
+    let rust_code = format!(
+        r#"
 // Auto-generated Adobe Glyph List (AGL) phf maps.
 // Do not edit manually.
 // Source: Adobe Glyph List 1.4 + AGLFN 1.7
@ -271,8 +285,7 @@ pub static AGL_MULTI: phf::Map<&'static str, &[char]> = {};
        multi_map_builder.build()
    );

-    fs::write(Path::new(out_dir).join("agl.rs"), rust_code)
-        .expect("Failed to write agl.rs");
+    fs::write(Path::new(out_dir).join("agl.rs"), rust_code).expect("Failed to write agl.rs");
 }

 /// Decode a JSON unicode escape string like "\\u0041" to "0041".
@ -302,14 +315,13 @@ fn decode_json_unicode(s: &str) -> String {
 /// Each entry maps a glyph ID to a Unicode codepoint for a specific font
 /// identified by its SHA-256 hash.
 fn generate_font_fingerprints(out_dir: &Path, fingerprints_path: &Path) {
-    let json_content = fs::read_to_string(fingerprints_path)
-        .expect("Failed to read font-fingerprints.json");
+    let json_content =
+        fs::read_to_string(fingerprints_path).expect("Failed to read font-fingerprints.json");

-    let data: serde_json::Value = serde_json::from_str(&json_content)
-        .expect("Failed to parse font-fingerprints.json");
+    let data: serde_json::Value =
+        serde_json::from_str(&json_content).expect("Failed to parse font-fingerprints.json");

-    let fonts = data.as_array()
-        .expect("font-fingerprints must be an array");
+    let fonts = data.as_array().expect("font-fingerprints must be an array");

    let mut entries_arrays = String::new();
    let mut map_builder = phf_codegen::Map::new();
@ -319,7 +331,8 @@ fn generate_font_fingerprints(out_dir: &Path, fingerprints_path: &Path) {
    let mut values = Vec::new();

    for font_entry in fonts {
-        let sha256_hex = font_entry.get("sha256_hex")
+        let sha256_hex = font_entry
+            .get("sha256_hex")
            .and_then(|v| v.as_str())
            .expect("sha256_hex must be a string");

@ -330,14 +343,18 @@ fn generate_font_fingerprints(out_dir: &Path, fingerprints_path: &Path) {

        // Validate SHA-256 hex (64 hex chars = 32 bytes)
        if sha256_hex.len() != 64 {
-            panic!("SHA-256 hex must be 64 characters, got {}", sha256_hex.len());
+            panic!(
+                "SHA-256 hex must be 64 characters, got {}",
+                sha256_hex.len()
+            );
        }

        // Convert hex string to [u8; 32] bytes
        let hash_bytes: [u8; 32] = hex_decode_to_array(sha256_hex);

        // Get entries
-        let entries = font_entry.get("entries")
+        let entries = font_entry
+            .get("entries")
            .and_then(|v| v.as_array())
            .expect("entries must be an array");

@ -347,8 +364,14 @@ fn generate_font_fingerprints(out_dir: &Path, fingerprints_path: &Path) {
        let mut entry_values = Vec::new();
        for entry in entries {
            let arr = entry.as_array().expect("entry must be an array");
-            let gid = arr.get(0).and_then(|v| v.as_u64()).expect("gid must be a number") as u16;
-            let codepoint = arr.get(1).and_then(|v| v.as_u64()).expect("codepoint must be a number") as u32;
+            let gid = arr
+                .get(0)
+                .and_then(|v| v.as_u64())
+                .expect("gid must be a number") as u16;
+            let codepoint = arr
+                .get(1)
+                .and_then(|v| v.as_u64())
+                .expect("codepoint must be a number") as u32;

            // Validate codepoint is a valid Unicode scalar value
            if !is_valid_unicode_scalar(codepoint) {
@ -358,7 +381,8 @@ fn generate_font_fingerprints(out_dir: &Path, fingerprints_path: &Path) {
            entry_values.push(format!("({}, {})", gid, codepoint));
        }

-        entries_arrays.push_str(&format!(r#"
+        entries_arrays.push_str(&format!(
+            r#"
 static {}: &[(u16, u32)] = &[{}];
 "#,
            ident,
@ -366,9 +390,7 @@ static {}: &[(u16, u32)] = &[{}];
        ));

        // Build the phf map key as a byte array literal
-        let key_bytes: Vec<String> = hash_bytes.iter()
-            .map(|b| format!("0x{:02x}", b))
-            .collect();
+        let key_bytes: Vec<String> = hash_bytes.iter().map(|b| format!("0x{:02x}", b)).collect();

        let key = format!("[{}]", key_bytes.join(", "));
        let value = format!("&{}", ident);
@ -382,7 +404,8 @@ static {}: &[(u16, u32)] = &[{}];
        map_builder.entry(key.as_str(), value.as_str());
    }

-    let rust_code = format!(r#"
+    let rust_code = format!(
+        r#"
 // Auto-generated font fingerprint phf map.
 // Do not edit manually.
 // Source: build/font-fingerprints.json
@ -415,8 +438,7 @@ fn hex_decode_to_array(hex: &str) -> [u8; 32] {
    let mut bytes = [0u8; 32];
    for i in 0..32 {
        let byte_str = &hex[i * 2..i * 2 + 2];
-        bytes[i] = u8::from_str_radix(byte_str, 16)
-            .expect("Invalid hex string");
+        bytes[i] = u8::from_str_radix(byte_str, 16).expect("Invalid hex string");
    }
    bytes
 }
@ -450,7 +472,8 @@ fn generate_collection_cmap(out_dir: &Path, base_dir: &Path, json_name: &str, mo
    // Check if the JSON file exists
    if !json_path.exists() {
        // Generate a stub implementation
-        let rust_code = format!(r#"
+        let rust_code = format!(
+            r#"
 // Auto-generated {collection} CID to Unicode mapping.
 //
 // Source: {json_name}.json (not found - stub implementation)
@ -469,13 +492,12 @@ pub fn cid_to_unicode(cid: u32) -> Option<&'static [char]> {{
            json_name = json_name,
        );

-        fs::write(&out_path, rust_code)
-            .expect(&format!("Failed to write {}", out_path.display()));
+        fs::write(&out_path, rust_code).expect(&format!("Failed to write {}", out_path.display()));
        return;
    }

-    let json_content = fs::read_to_string(&json_path)
-        .expect(&format!("Failed to read {}", json_path.display()));
+    let json_content =
+        fs::read_to_string(&json_path).expect(&format!("Failed to read {}", json_path.display()));

    let data: serde_json::Value = serde_json::from_str(&json_content)
        .expect(&format!("Failed to parse {}", json_path.display()));
@ -486,7 +508,8 @@ pub fn cid_to_unicode(cid: u32) -> Option<&'static [char]> {{

    if let Some(mappings) = data.as_object() {
        for (cid_str, unicode_value) in mappings {
-            let cid: u32 = cid_str.parse()
+            let cid: u32 = cid_str
+                .parse()
                .expect(&format!("Invalid CID key: {}", cid_str));

            // Parse the Unicode value
@ -497,11 +520,13 @@ pub fn cid_to_unicode(cid: u32) -> Option<&'static [char]> {{
                let array_ident = format!("CID_{}_{}", module_name.to_uppercase(), cid);

                // Build the array
-                let char_literals: Vec<String> = chars.iter()
+                let char_literals: Vec<String> = chars
+                    .iter()
                    .map(|c| format!("'\\u{{{:04X}}}'", *c as u32))
                    .collect();

-                arrays.push_str(&format!(r#"
+                arrays.push_str(&format!(
+                    r#"
 static {}: &[char] = &[{}];
 "#,
                    array_ident,
@ -514,7 +539,8 @@ static {}: &[char] = &[{}];
        }
    }

-    let rust_code = format!(r#"
+    let rust_code = format!(
+        r#"
 // Auto-generated {collection} CID to Unicode mapping.
 //
 // Source: {json_name}.json
@ -542,8 +568,7 @@ pub fn cid_to_unicode(cid: u32) -> Option<&'static [char]> {{
        map = map_builder.build(),
    );

-    fs::write(&out_path, rust_code)
-        .expect(&format!("Failed to write {}", out_path.display()));
+    fs::write(&out_path, rust_code).expect(&format!("Failed to write {}", out_path.display()));
 }

 /// Parse a Unicode value from JSON to a Vec<char>.
--- a/crates/pdftract-core/examples/check_sizes.rs
+++ b/crates/pdftract-core/examples/check_sizes.rs
@ -1,8 +1,11 @@
-use std::sync::Arc;
 use indexmap::IndexMap;
+use std::sync::Arc;

 fn main() {
-    println!("IndexMap<Arc<str>, ()>: {}", std::mem::size_of::<IndexMap<Arc<str>, ()>>());
+    println!(
+        "IndexMap<Arc<str>, ()>: {}",
+        std::mem::size_of::<IndexMap<Arc<str>, ()>>()
+    );
    println!("Vec<u8>: {}", std::mem::size_of::<Vec<u8>>());
    println!("Vec<()>: {}", std::mem::size_of::<Vec<()>>());
    println!("Arc<str>: {}", std::mem::size_of::<Arc<str>>());
--- a/crates/pdftract-core/examples/test_forward_scan.rs
+++ b/crates/pdftract-core/examples/test_forward_scan.rs
@ -1,9 +1,9 @@
 // Simple test to verify forward_scan_xref functionality
 // This is a standalone test file to verify the forward scan implementation

-use std::collections::HashMap;
-use pdftract_core::parser::xref::{XrefEntry, XrefSection, forward_scan_xref};
 use pdftract_core::parser::stream::MemorySource;
+use pdftract_core::parser::xref::{forward_scan_xref, XrefEntry, XrefSection};
+use std::collections::HashMap;

 fn main() {
    println!("Testing forward_scan_xref implementation...\n");
@ -44,7 +44,10 @@ fn main() {
    let source = MemorySource::new(pdf_data.to_vec());
    let result = forward_scan_xref(&source, false);

-    println!("  Found {} objects (including the one after truncated xref)", result.len());
+    println!(
+        "  Found {} objects (including the one after truncated xref)",
+        result.len()
+    );
    assert!(result.len() >= 4, "Expected at least 4 objects");
    println!("  ✓ PASSED\n");

@ -57,8 +60,13 @@ fn main() {

    println!("  Found {} objects (should be 0)", result.len());
    assert_eq!(result.len(), 0, "Expected 0 objects for linearized file");
-    println!("  Has LINEARIZED_NO_FORWARD_SCAN diagnostic: {}",
-             result.diagnostics.iter().any(|d| matches!(d.code, pdftract_core::parser::xref::XrefDiagCode::LinearizedNoForwardScan)));
+    println!(
+        "  Has LINEARIZED_NO_FORWARD_SCAN diagnostic: {}",
+        result.diagnostics.iter().any(|d| matches!(
+            d.code,
+            pdftract_core::parser::xref::XrefDiagCode::LinearizedNoForwardScan
+        ))
+    );
    println!("  ✓ PASSED\n");

    // Test 4: Multi-revision - last occurrence wins
@ -88,9 +96,16 @@ fn main() {
    let source = MemorySource::new(pdf_data.to_vec());
    let result = forward_scan_xref(&source, false);

-    let has_repaired_diagnostic = result.diagnostics.iter()
-        .any(|d| matches!(d.code, pdftract_core::parser::xref::XrefDiagCode::XrefRepaired));
-    println!("  Has XREF_REPAIRED diagnostic: {}", has_repaired_diagnostic);
+    let has_repaired_diagnostic = result.diagnostics.iter().any(|d| {
+        matches!(
+            d.code,
+            pdftract_core::parser::xref::XrefDiagCode::XrefRepaired
+        )
+    });
+    println!(
+        "  Has XREF_REPAIRED diagnostic: {}",
+        has_repaired_diagnostic
+    );
    assert!(has_repaired_diagnostic, "Expected XREF_REPAIRED diagnostic");
    println!("  ✓ PASSED\n");

--- a/crates/pdftract-core/examples/test_lzw_api.rs
+++ b/crates/pdftract-core/examples/test_lzw_api.rs
@ -1,26 +1,32 @@
-use lzw::{MsbReader, Decoder, DecoderEarlyChange};
+use lzw::{Decoder, DecoderEarlyChange, MsbReader};

 fn main() {
    // Test basic encoding/decoding
    let data = b"hello world!";
-    
+
    // Encode with early change
    let mut encoder = lzw::EncoderEarlyChange::new(lzw::MsbWriter::new(), 8);
    let encoded_early: Vec<u8> = encoder.encode_bytes(data).0;
    println!("Encoded (early change): {:02x?}", encoded_early);
-    
+
    // Decode with early change
    let mut decoder = DecoderEarlyChange::new(MsbReader::new(), 8);
    let (consumed, decoded) = decoder.decode_bytes(&encoded_early).unwrap();
-    println!("Decoded (early change): {:?}", std::str::from_utf8(decoded).unwrap());
-    
+    println!(
+        "Decoded (early change): {:?}",
+        std::str::from_utf8(decoded).unwrap()
+    );
+
    // Encode with late change
    let mut encoder2 = lzw::Encoder::new(lzw::MsbWriter::new(), 8);
    let encoded_late: Vec<u8> = encoder2.encode_bytes(data).0;
    println!("Encoded (late change): {:02x?}", encoded_late);
-    
+
    // Decode with late change
    let mut decoder2 = Decoder::new(MsbReader::new(), 8);
    let (consumed2, decoded2) = decoder2.decode_bytes(&encoded_late).unwrap();
-    println!("Decoded (late change): {:?}", std::str::from_utf8(decoded2).unwrap());
+    println!(
+        "Decoded (late change): {:?}",
+        std::str::from_utf8(decoded2).unwrap()
+    );
 }
--- a/crates/pdftract-core/examples/test_trailer.rs
+++ b/crates/pdftract-core/examples/test_trailer.rs
@ -1,5 +1,5 @@
-use pdftract_core::parser::xref;
 use pdftract_core::parser::stream::{MemorySource, PdfSource};
+use pdftract_core::parser::xref;
 use std::fs::File;
 use std::io::Read;

@ -12,7 +12,10 @@ fn main() {

    // Find startxref BEFORE moving buffer
    let search_bytes = &buffer[buffer.len().saturating_sub(1024)..];
-    let pos = search_bytes.windows(9).rposition(|w| w == b"startxref").unwrap();
+    let pos = search_bytes
+        .windows(9)
+        .rposition(|w| w == b"startxref")
+        .unwrap();
    let start = buffer.len().saturating_sub(1024) + pos + 9;

    // Skip whitespace
@ -31,21 +34,24 @@ fn main() {

    // Now create source
    let source = MemorySource::new(buffer);
-    
+
    println!("startxref offset: {}", start_offset);
-    
+
    let xref_section = xref::load_xref_with_prev_chain(&source, start_offset);
-    
+
    println!("Has trailer: {}", xref_section.trailer.is_some());
-    
+
    if let Some(trailer) = &xref_section.trailer {
        println!("Trailer keys: {:?}", trailer.keys().collect::<Vec<_>>());
        println!("Root entry: {:?}", trailer.get("Root"));
        println!("Size entry: {:?}", trailer.get("Size"));
    }
-    
+
    println!("Diagnostics count: {}", xref_section.diagnostics.len());
    for diag in &xref_section.diagnostics {
-        println!("  - {}: {} at byte_offset {:?}", diag.code, diag.message, diag.byte_offset);
+        println!(
+            "  - {}: {} at byte_offset {:?}",
+            diag.code, diag.message, diag.byte_offset
+        );
    }
 }
--- a/crates/pdftract-core/src/attachment/associated_files.rs
+++ b/crates/pdftract-core/src/attachment/associated_files.rs
@ -20,9 +20,9 @@
 //! - "EncryptedPayload": The file is an encrypted payload
 //! - "Unspecified": No specific relationship (default)

+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::parser::object::ObjRef;
 use crate::parser::xref::XrefResolver;
-use crate::diagnostics::{Diagnostic, DiagCode};

 /// Result type for /AF parsing.
 pub type Result<T> = std::result::Result<T, Vec<Diagnostic>>;
@ -119,7 +119,11 @@ pub fn walk_af_array(
            None => {
                diagnostics.push(Diagnostic::with_dynamic_no_offset(
                    DiagCode::StructInvalidType,
-                    format!("/AF[{}] is not a reference (type: {})", idx, entry_obj.type_name()),
+                    format!(
+                        "/AF[{}] is not a reference (type: {})",
+                        idx,
+                        entry_obj.type_name()
+                    ),
                ));
                continue;
            }
@ -179,19 +183,21 @@ fn extract_af_relationship(
        None => {
            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::StructInvalidType,
-                format!("Filespec {} is not a dictionary (type: {})", filespec_ref, filespec_obj.type_name()),
+                format!(
+                    "Filespec {} is not a dictionary (type: {})",
+                    filespec_ref,
+                    filespec_obj.type_name()
+                ),
            ));
            return Err(diagnostics);
        }
    };

    // Extract /AFRelationship (optional)
-    let relationship = filespec_dict
-        .get("/AFRelationship")
-        .and_then(|obj| {
-            // /AFRelationship is typically a Name object
-            obj.as_name().map(|s| s.to_string())
-        });
+    let relationship = filespec_dict.get("/AFRelationship").and_then(|obj| {
+        // /AFRelationship is typically a Name object
+        obj.as_name().map(|s| s.to_string())
+    });

    Ok(relationship)
 }
@ -203,11 +209,7 @@ mod tests {
    use indexmap::IndexMap;

    /// Helper to create a test Filespec dictionary.
-    fn make_filespec(
-        resolver: &XrefResolver,
-        obj_ref: ObjRef,
-        relationship: Option<&str>,
-    ) {
+    fn make_filespec(resolver: &XrefResolver, obj_ref: ObjRef, relationship: Option<&str>) {
        let mut dict = IndexMap::new();
        dict.insert(intern("/Type"), PdfObject::Name(intern("Filespec")));
        dict.insert(intern("/F"), PdfObject::Name(intern("test.pdf")));
@ -326,7 +328,9 @@ mod tests {
        assert!(result.is_err());

        let diagnostics = result.unwrap_err();
-        assert!(diagnostics.iter().any(|d| d.message.contains("not an array")));
+        assert!(diagnostics
+            .iter()
+            .any(|d| d.message.contains("not an array")));
    }

    #[test]
@ -350,15 +354,14 @@ mod tests {
        assert!(result.is_err());

        let diagnostics = result.unwrap_err();
-        assert!(diagnostics.iter().any(|d| d.message.contains("not a reference")));
+        assert!(diagnostics
+            .iter()
+            .any(|d| d.message.contains("not a reference")));
    }

    #[test]
    fn test_associated_file_entry_new() {
-        let entry = AssociatedFileEntry::new(
-            Some("Data".to_string()),
-            ObjRef::new(42, 0),
-        );
+        let entry = AssociatedFileEntry::new(Some("Data".to_string()), ObjRef::new(42, 0));

        assert_eq!(entry.relationship, Some("Data".to_string()));
        assert_eq!(entry.filespec_ref, ObjRef::new(42, 0));
@ -428,7 +431,10 @@ mod tests {
        assert_eq!(entries[2].filespec_ref, fs3);

        assert_eq!(entries[0].relationship, Some("Unspecified".to_string()));
-        assert_eq!(entries[1].relationship, Some("EncryptedPayload".to_string()));
+        assert_eq!(
+            entries[1].relationship,
+            Some("EncryptedPayload".to_string())
+        );
        assert_eq!(entries[2].relationship, Some("Source".to_string()));
    }

@ -465,10 +471,7 @@ mod tests {
        assert_eq!(entries.len(), relationships.len());

        for (idx, entry) in entries.iter().enumerate() {
-            assert_eq!(
-                entry.relationship.as_deref(),
-                Some(relationships[idx])
-            );
+            assert_eq!(entry.relationship.as_deref(), Some(relationships[idx]));
        }
    }
 }
--- a/crates/pdftract-core/src/attachment/mod.rs
+++ b/crates/pdftract-core/src/attachment/mod.rs
@ -9,4 +9,4 @@
 pub mod associated_files;

 // Re-export key types for convenience
-pub use associated_files::{AssociatedFileEntry, walk_af_array};
+pub use associated_files::{walk_af_array, AssociatedFileEntry};
--- a/crates/pdftract-core/src/cache/compression.rs
+++ b/crates/pdftract-core/src/cache/compression.rs
@ -129,7 +129,9 @@ pub fn decode(data: &[u8]) -> io::Result<Vec<u8>> {
    let mut result = Vec::with_capacity(data.len().min(MAX_DECOMPRESSED_SIZE));
    {
        let mut decoder = zstd::Decoder::new(data)?;
-        decoder.take(MAX_DECOMPRESSED_SIZE as u64).read_to_end(&mut result)?;
+        decoder
+            .take(MAX_DECOMPRESSED_SIZE as u64)
+            .read_to_end(&mut result)?;
    }

    // Check if we hit the bomb limit
@ -466,7 +468,10 @@ mod tests {
        let mut result = Vec::with_capacity(SMALL_LIMIT);
        {
            let decoder = zstd::Decoder::new(&*compressed).unwrap();
-            decoder.take(SMALL_LIMIT as u64).read_to_end(&mut result).unwrap();
+            decoder
+                .take(SMALL_LIMIT as u64)
+                .read_to_end(&mut result)
+                .unwrap();
        }

        // Verify we truncated at the limit
--- a/crates/pdftract-core/src/cache/key.rs
+++ b/crates/pdftract-core/src/cache/key.rs
@ -151,9 +151,7 @@ fn canonical_json_value(value: &Value) -> Value {
            }
            Value::Object(sorted.into_iter().collect())
        }
-        Value::Array(arr) => {
-            Value::Array(arr.iter().map(canonical_json_value).collect())
-        }
+        Value::Array(arr) => Value::Array(arr.iter().map(canonical_json_value).collect()),
        // Numbers: preserve integer representation, canonicalize floats
        Value::Number(n) => {
            if n.is_i64() || n.is_u64() {
@ -253,7 +251,10 @@ mod tests {
        let json_str = canonical.to_string();
        let ev_pos = json_str.find("extraction_version").unwrap();
        let receipts_pos = json_str.find("receipts").unwrap();
-        assert!(ev_pos < receipts_pos, "Keys should be sorted lexicographically");
+        assert!(
+            ev_pos < receipts_pos,
+            "Keys should be sorted lexicographically"
+        );
    }

    #[test]
@ -335,8 +336,8 @@ mod tests {
        let key2 = CacheKey::new("fp", &opts);

        // Same key should hash the same
-        use std::hash::{Hash, Hasher};
        use std::collections::hash_map::DefaultHasher;
+        use std::hash::{Hash, Hasher};

        let mut h1 = DefaultHasher::new();
        key1.hash(&mut h1);
@ -361,8 +362,11 @@ mod tests {
        assert!(key.opts_hash.chars().all(|c| c.is_ascii_hexdigit()));

        // hex::encode produces lowercase hex (0-9, a-f), verify no uppercase letters
-        assert!(key.opts_hash.chars().all(|c| !c.is_ascii_uppercase()),
-            "Hash should be lowercase hex: {}", key.opts_hash);
+        assert!(
+            key.opts_hash.chars().all(|c| !c.is_ascii_uppercase()),
+            "Hash should be lowercase hex: {}",
+            key.opts_hash
+        );
    }

    #[test]
@ -376,8 +380,10 @@ mod tests {
        let key1 = CacheKey::new("fp", &opts1);
        let key2 = CacheKey::new("fp", &opts2);

-        assert_eq!(key1.opts_hash, key2.opts_hash,
-            "Same logical request should produce same key");
+        assert_eq!(
+            key1.opts_hash, key2.opts_hash,
+            "Same logical request should produce same key"
+        );
    }

    #[test]
@ -388,8 +394,10 @@ mod tests {
        let key_off = CacheKey::new("fp", &opts_off);
        let key_lite = CacheKey::new("fp", &opts_lite);

-        assert_ne!(key_off.opts_hash, key_lite.opts_hash,
-            "Different logical requests should produce different keys");
+        assert_ne!(
+            key_off.opts_hash, key_lite.opts_hash,
+            "Different logical requests should produce different keys"
+        );
    }

    // Acceptance criteria tests for Phase 6.9.2
@ -408,8 +416,10 @@ mod tests {
        let key1 = CacheKey::new("fp", &opts1);
        let key2 = CacheKey::new("fp", &opts2);

-        assert_eq!(key1.opts_hash, key2.opts_hash,
-            "Same effective values should produce same hash");
+        assert_eq!(
+            key1.opts_hash, key2.opts_hash,
+            "Same effective values should produce same hash"
+        );
    }

    #[test]
@ -421,8 +431,10 @@ mod tests {
        let key_off = CacheKey::new("fp", &opts_off);
        let key_lite = CacheKey::new("fp", &opts_lite);

-        assert_ne!(key_off.opts_hash, key_lite.opts_hash,
-            "Toggling receipts from off to lite should change hash");
+        assert_ne!(
+            key_off.opts_hash, key_lite.opts_hash,
+            "Toggling receipts from off to lite should change hash"
+        );
    }

    #[test]
@ -442,8 +454,10 @@ mod tests {
            hex::encode(hash)
        };

-        assert_ne!(key_v1, key_v2,
-            "Different pdftract version should produce different hash");
+        assert_ne!(
+            key_v1, key_v2,
+            "Different pdftract version should produce different hash"
+        );
    }

    #[test]
@ -463,8 +477,10 @@ mod tests {
        let canon1 = canonical_json(&val1);
        let canon2 = canonical_json(&val2);

-        assert_eq!(canon1, canon2,
-            "Different insertion orders should produce same canonical JSON");
+        assert_eq!(
+            canon1, canon2,
+            "Different insertion orders should produce same canonical JSON"
+        );

        // Keys should be sorted
        assert!(canon1.contains("\"a\":2"));
@ -489,8 +505,7 @@ mod tests {
        let canon1 = canonical_json(&val1);
        let canon2 = canonical_json(&val2);

-        assert_eq!(canon1, canon2,
-            "0.5 and 0.500 should serialize identically");
+        assert_eq!(canon1, canon2, "0.5 and 0.500 should serialize identically");

        // Both should serialize to 0.5 (shortest representation)
        assert!(canon1.contains("\"x\":0.5"));
@ -499,11 +514,7 @@ mod tests {
    #[test]
    fn test_acceptance_float_canonical_edge_cases() {
        // Test various float representations
-        let test_cases = vec![
-            (1.0, "1.00"),
-            (0.1, "0.100"),
-            (1.5, "1.500"),
-        ];
+        let test_cases = vec![(1.0, "1.00"), (0.1, "0.100"), (1.5, "1.500")];

        for (val1, val2_str) in test_cases {
            let mut map1 = Map::new();
@ -519,8 +530,11 @@ mod tests {
            let canon1 = canonical_json(&val1_json);
            let canon2 = canonical_json(&val2_json);

-            assert_eq!(canon1, canon2,
-                "{} and {} should serialize identically", val1, val2_str);
+            assert_eq!(
+                canon1, canon2,
+                "{} and {} should serialize identically",
+                val1, val2_str
+            );
        }
    }

@ -540,8 +554,10 @@ mod tests {
        let opts3 = ExtractionOptions::with_receipts(ReceiptsMode::Lite);
        let key3 = CacheKey::new("fp", &opts3);

-        assert_ne!(key1.opts_hash, key3.opts_hash,
-            "Invariant: same logical request → same key, different request → different key");
+        assert_ne!(
+            key1.opts_hash, key3.opts_hash,
+            "Invariant: same logical request → same key, different request → different key"
+        );
    }

    #[test]
@ -562,8 +578,7 @@ mod tests {
        let canon1 = canonical_json(&Value::Object(outer1));
        let canon2 = canonical_json(&Value::Object(outer2));

-        assert_eq!(canon1, canon2,
-            "Nested objects should have sorted keys");
+        assert_eq!(canon1, canon2, "Nested objects should have sorted keys");
    }

    #[test]
--- a/crates/pdftract-core/src/cache/layout.rs
+++ b/crates/pdftract-core/src/cache/layout.rs
@ -3,8 +3,8 @@
 //! This module implements the two-byte-prefix directory scheme that keeps
 //! any single directory under 65K entries even at millions of cached entries.

-use std::path::{Path, PathBuf};
 use serde::{Deserialize, Serialize};
+use std::path::{Path, PathBuf};

 /// Current cache schema version.
 ///
@ -86,7 +86,9 @@ pub fn entry_path(
    compressed_size: usize,
 ) -> PathBuf {
    // Strip the "pdftract-v1:" prefix to get the raw hex fingerprint
-    let fp = fingerprint.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(fingerprint);
+    let fp = fingerprint
+        .strip_prefix(FINGERPRINT_PREFIX)
+        .unwrap_or(fingerprint);

    // Validate fingerprint is at least 4 chars (for the two-byte prefixes)
    assert!(
@ -121,7 +123,9 @@ pub fn entry_path(
 ///
 /// Path in the format `<cache_dir>/<fp[0:2]>/<fp[2:4]>/<full_fp>`
 pub fn fingerprint_dir(cache_dir: &Path, fingerprint: &str) -> PathBuf {
-    let fp = fingerprint.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(fingerprint);
+    let fp = fingerprint
+        .strip_prefix(FINGERPRINT_PREFIX)
+        .unwrap_or(fingerprint);
    assert!(
        fp.len() >= 4,
        "Fingerprint must be at least 4 characters long, got: {}",
@ -225,7 +229,8 @@ pub fn load_index(cache_dir: &Path) -> Result<Option<CacheIndex>, anyhow::Error>
        return Err(anyhow::anyhow!(
            "Cache schema version mismatch: expected {}, got {}. \
             Please clear the cache with 'pdftract cache clear' and re-populate.",
-            CURRENT_SCHEMA_VERSION, index.schema_version
+            CURRENT_SCHEMA_VERSION,
+            index.schema_version
        ));
    }

@ -297,9 +302,11 @@ mod tests {
    use super::*;
    use tempfile::TempDir;

-    const TEST_FINGERPRINT: &str = "pdftract-v1:e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
+    const TEST_FINGERPRINT: &str =
+        "pdftract-v1:e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
    const TEST_FINGERPRINT_SHORT: &str = "pdftract-v1:e7a1";
-    const TEST_OPTS_HASH: &str = "9b21c0ffee0000000000000000000000000000000000000000000000000000000";
+    const TEST_OPTS_HASH: &str =
+        "9b21c0ffee0000000000000000000000000000000000000000000000000000000";

    #[test]
    fn test_entry_path_basic() {
@ -333,10 +340,7 @@ mod tests {
        assert_eq!(path2.parent(), Some(fp_dir.as_path()));

        // But different filenames
-        assert_ne!(
-            path1.file_name(),
-            path2.file_name()
-        );
+        assert_ne!(path1.file_name(), path2.file_name());
    }

    #[test]
@ -354,12 +358,24 @@ mod tests {
        // Check via components: skip root + cache, first prefix is e7
        let mut components1 = path1.components().skip(2);
        let mut components2 = path2.components().skip(2);
-        assert_eq!(components1.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("e7"))));
-        assert_eq!(components2.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("e7"))));
+        assert_eq!(
+            components1.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("e7")))
+        );
+        assert_eq!(
+            components2.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("e7")))
+        );

        // But different second-level directories
-        assert_eq!(components1.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("a1"))));
-        assert_eq!(components2.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("b2"))));
+        assert_eq!(
+            components1.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("a1")))
+        );
+        assert_eq!(
+            components2.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("b2")))
+        );
    }

    #[test]
@ -367,7 +383,8 @@ mod tests {
        let cache_dir = Path::new("/cache");
        let fp_dir = fingerprint_dir(cache_dir, TEST_FINGERPRINT);

-        let expected = "/cache/e7/a1/e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
+        let expected =
+            "/cache/e7/a1/e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
        assert_eq!(fp_dir, PathBuf::from(expected));
    }

@ -378,14 +395,21 @@ mod tests {

        // Should use the available chars: e7/a1/e7a1/...
        let mut components = path.components().skip(2);
-        assert_eq!(components.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("e7"))));
-        assert_eq!(components.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("a1"))));
+        assert_eq!(
+            components.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("e7")))
+        );
+        assert_eq!(
+            components.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("a1")))
+        );
    }

    #[test]
    fn test_parse_opts_hash_from_filename() {
        // Valid filename
-        let filename = "e7a1f3deadbeef00000000000000000000000000000000000000000000000000-12387.json.zst";
+        let filename =
+            "e7a1f3deadbeef00000000000000000000000000000000000000000000000000-12387.json.zst";
        let opts_hash = parse_opts_hash_from_filename(filename);
        assert_eq!(
            opts_hash,
@ -404,12 +428,14 @@ mod tests {

    #[test]
    fn test_parse_size_from_filename() {
-        let filename = "e7a1f3deadbeef00000000000000000000000000000000000000000000000000-12387.json.zst";
+        let filename =
+            "e7a1f3deadbeef00000000000000000000000000000000000000000000000000-12387.json.zst";
        let size = parse_size_from_filename(filename);
        assert_eq!(size, Some(12387));

        // Different size
-        let filename2 = "e7a1f3deadbeef00000000000000000000000000000000000000000000000000-999.json.zst";
+        let filename2 =
+            "e7a1f3deadbeef00000000000000000000000000000000000000000000000000-999.json.zst";
        let size2 = parse_size_from_filename(filename2);
        assert_eq!(size2, Some(999));

@ -525,7 +551,11 @@ mod tests {
        // Convert to string and check length
        let path_str = path.to_str().unwrap();
        // POSIX max path length is typically 4096
-        assert!(path_str.len() < 4096, "Path length {} exceeds 4096", path_str.len());
+        assert!(
+            path_str.len() < 4096,
+            "Path length {} exceeds 4096",
+            path_str.len()
+        );

        // Our paths should be much shorter in practice
        // Typical case: /cache + 2 + 2 + 64 + 64 + ~20 = ~154 bytes
@ -554,8 +584,14 @@ mod tests {

        // Should still work: /cache/e7/a1/e7a1f3...
        let mut components = path.components().skip(2);
-        assert_eq!(components.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("e7"))));
-        assert_eq!(components.next(), Some(std::path::Component::Normal(std::ffi::OsStr::new("a1"))));
+        assert_eq!(
+            components.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("e7")))
+        );
+        assert_eq!(
+            components.next(),
+            Some(std::path::Component::Normal(std::ffi::OsStr::new("a1")))
+        );
    }

    #[test]
--- a/crates/pdftract-core/src/cache/lru.rs
+++ b/crates/pdftract-core/src/cache/lru.rs
@ -4,7 +4,9 @@
 //! file for touch-time tracking. Eviction is triggered on cache writes when
 //! the total compressed size exceeds the configured limit (default 1 GiB).

-use crate::cache::layout::{entry_path, parse_opts_hash_from_filename, parse_size_from_filename, sentinel_path};
+use crate::cache::layout::{
+    entry_path, parse_opts_hash_from_filename, parse_size_from_filename, sentinel_path,
+};
 use std::collections::HashMap;
 use std::fs::{File, OpenOptions};
 use std::io::Write;
@ -138,7 +140,9 @@ impl Lru {
            .unwrap_or(0);

        // Strip the prefix to match filesystem layout
-        let fp_normalized = fingerprint.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(fingerprint);
+        let fp_normalized = fingerprint
+            .strip_prefix(FINGERPRINT_PREFIX)
+            .unwrap_or(fingerprint);

        // Build the touch record: "<timestamp> <fingerprint>/<opts_hash>\n"
        let record = format!("{} {}/{}\n", timestamp, fp_normalized, opts_hash);
@ -220,29 +224,31 @@ impl Lru {
            .filter(|e| {
                e.path().is_dir()
                    && e.file_name().to_string_lossy().len() == 2
-                    && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
+                    && e.file_name()
+                        .to_string_lossy()
+                        .chars()
+                        .all(|c| c.is_ascii_hexdigit())
            })
        {
            let prefix1_dir = prefix1_entry.path();

            // Walk the second-level prefix directories
-            for prefix2_entry in prefix1_dir.read_dir()?
-                .filter_map(|e| e.ok())
-                .filter(|e| {
-                    e.path().is_dir()
-                        && e.file_name().to_string_lossy().len() == 2
-                        && e.file_name()
-                            .to_string_lossy()
-                            .chars()
-                            .all(|c| c.is_ascii_hexdigit())
-                })
-            {
+            for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+                e.path().is_dir()
+                    && e.file_name().to_string_lossy().len() == 2
+                    && e.file_name()
+                        .to_string_lossy()
+                        .chars()
+                        .all(|c| c.is_ascii_hexdigit())
+            }) {
                let prefix2_dir = prefix2_entry.path();

                // Walk the fingerprint directories
-                for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                    e.path().is_dir()
-                }) {
+                for fp_entry in prefix2_dir
+                    .read_dir()?
+                    .filter_map(|e| e.ok())
+                    .filter(|e| e.path().is_dir())
+                {
                    let fp_dir = fp_entry.path();

                    // Walk the entry files
@ -276,10 +282,8 @@ impl Lru {
        // Check if sentinel exists and exceeds rotation threshold
        if let Ok(metadata) = sentinel_file.metadata() {
            if metadata.len() > SENTINEL_ROTATION_SIZE {
-                let old_path = sentinel_file.with_extension(&format!(
-                    "touched{}",
-                    SENTINEL_OLD_SUFFIX
-                ));
+                let old_path =
+                    sentinel_file.with_extension(&format!("touched{}", SENTINEL_OLD_SUFFIX));

                // Move current to .old (replace existing .old)
                let _ = std::fs::remove_file(&old_path); // Ignore error if doesn't exist
@ -314,27 +318,22 @@ impl Lru {
            .filter_map(|e| e.ok())
            .filter(|e| {
                let name = e.file_name().to_string_lossy().to_string();
-                e.path().is_dir()
-                    && name.len() == 2
-                    && name.chars().all(|c| c.is_ascii_hexdigit())
+                e.path().is_dir() && name.len() == 2 && name.chars().all(|c| c.is_ascii_hexdigit())
            })
        {
            let prefix1_dir = prefix1_entry.path();

-            for prefix2_entry in prefix1_dir.read_dir()?
-                .filter_map(|e| e.ok())
-                .filter(|e| {
-                    let name = e.file_name().to_string_lossy().to_string();
-                    e.path().is_dir()
-                        && name.len() == 2
-                        && name.chars().all(|c| c.is_ascii_hexdigit())
-                })
-            {
+            for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+                let name = e.file_name().to_string_lossy().to_string();
+                e.path().is_dir() && name.len() == 2 && name.chars().all(|c| c.is_ascii_hexdigit())
+            }) {
                let prefix2_dir = prefix2_entry.path();

-                for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                    e.path().is_dir()
-                }) {
+                for fp_entry in prefix2_dir
+                    .read_dir()?
+                    .filter_map(|e| e.ok())
+                    .filter(|e| e.path().is_dir())
+                {
                    let fp_dir = fp_entry.path();

                    // Extract fingerprint from path (last component)
@ -347,7 +346,10 @@ impl Lru {
                    for entry in fp_dir.read_dir()?.filter_map(|e| e.ok()) {
                        let path = entry.path();
                        if path.is_file() {
-                            let filename_opt = path.file_name().and_then(|n| n.to_str()).map(|s| s.to_string());
+                            let filename_opt = path
+                                .file_name()
+                                .and_then(|n| n.to_str())
+                                .map(|s| s.to_string());
                            if let Some(filename) = filename_opt {
                                if let (Some(opts_hash), Some(size)) = (
                                    parse_opts_hash_from_filename(&filename),
@ -441,10 +443,7 @@ impl Lru {
        }

        // Read the old sentinel file (.old) if it exists
-        let old_sentinel = sentinel_file.with_extension(&format!(
-            "touched{}",
-            SENTINEL_OLD_SUFFIX
-        ));
+        let old_sentinel = sentinel_file.with_extension(&format!("touched{}", SENTINEL_OLD_SUFFIX));
        if let Ok(contents) = std::fs::read_to_string(&old_sentinel) {
            for line in contents.lines().rev() {
                let parts: Vec<&str> = line.splitn(2, ' ').collect();
@ -499,27 +498,29 @@ impl Lru {
            .filter(|e| {
                e.path().is_dir()
                    && e.file_name().to_string_lossy().len() == 2
-                    && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
+                    && e.file_name()
+                        .to_string_lossy()
+                        .chars()
+                        .all(|c| c.is_ascii_hexdigit())
            })
        {
            let prefix1_dir = prefix1_entry.path();

-            for prefix2_entry in prefix1_dir.read_dir()?
-                .filter_map(|e| e.ok())
-                .filter(|e| {
-                    e.path().is_dir()
-                        && e.file_name().to_string_lossy().len() == 2
-                        && e.file_name()
-                            .to_string_lossy()
-                            .chars()
-                            .all(|c| c.is_ascii_hexdigit())
-                })
-            {
+            for prefix2_entry in prefix1_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
+                e.path().is_dir()
+                    && e.file_name().to_string_lossy().len() == 2
+                    && e.file_name()
+                        .to_string_lossy()
+                        .chars()
+                        .all(|c| c.is_ascii_hexdigit())
+            }) {
                let prefix2_dir = prefix2_entry.path();

-                for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                    e.path().is_dir()
-                }) {
+                for fp_entry in prefix2_dir
+                    .read_dir()?
+                    .filter_map(|e| e.ok())
+                    .filter(|e| e.path().is_dir())
+                {
                    let fp_dir = fp_entry.path();

                    // Check if the fingerprint directory is empty
@ -563,7 +564,7 @@ impl Lru {
            Err(e) if e.kind() == std::io::ErrorKind::NotFound => {
                // Sentinel doesn't exist yet (no entries touched), nothing to truncate
                return Ok(());
-            },
+            }
            Err(e) => return Err(e),
        };
        let lines: Vec<&str> = contents.lines().collect();
@ -588,10 +589,13 @@ mod tests {
    use std::fs;
    use tempfile::TempDir;

-    const TEST_FINGERPRINT: &str = "pdftract-v1:e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
-    const TEST_FINGERPRINT_2: &str = "pdftract-v1:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
+    const TEST_FINGERPRINT: &str =
+        "pdftract-v1:e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
+    const TEST_FINGERPRINT_2: &str =
+        "pdftract-v1:bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb";
    const TEST_OPTS_HASH: &str = "9b21c0ffee000000000000000000000000000000000000000000000000000000"; // 64 chars
-    const TEST_OPTS_HASH_2: &str = "aaaaaaaa00000000000000000000000000000000000000000000000000000000"; // 64 chars
+    const TEST_OPTS_HASH_2: &str =
+        "aaaaaaaa00000000000000000000000000000000000000000000000000000000"; // 64 chars

    /// Create a test cache entry file.
    fn create_test_entry(cache_dir: &Path, fp: &str, opts: &str, size: usize) -> PathBuf {
@ -626,7 +630,9 @@ mod tests {

        let contents = fs::read_to_string(&sentinel_file).unwrap();
        // Sentinel stores fingerprint without prefix
-        let fp_normalized = TEST_FINGERPRINT.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(TEST_FINGERPRINT);
+        let fp_normalized = TEST_FINGERPRINT
+            .strip_prefix(FINGERPRINT_PREFIX)
+            .unwrap_or(TEST_FINGERPRINT);
        assert!(contents.contains(&format!("{}/{}", fp_normalized, TEST_OPTS_HASH)));
    }

@ -655,7 +661,9 @@ mod tests {
        assert!(now.saturating_sub(timestamp) < 10);

        // Second part should be "fp/opts_hash" (fp without prefix)
-        let fp_normalized = TEST_FINGERPRINT.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(TEST_FINGERPRINT);
+        let fp_normalized = TEST_FINGERPRINT
+            .strip_prefix(FINGERPRINT_PREFIX)
+            .unwrap_or(TEST_FINGERPRINT);
        assert_eq!(parts[1], &format!("{}/{}", fp_normalized, TEST_OPTS_HASH));
    }

@ -725,7 +733,10 @@ mod tests {
        // Verify touch was written
        let sentinel_file = sentinel_path(cache_dir);
        let sentinel_contents = fs::read_to_string(&sentinel_file).unwrap();
-        assert!(sentinel_contents.contains(TEST_OPTS_HASH), "Sentinel should contain opts_hash");
+        assert!(
+            sentinel_contents.contains(TEST_OPTS_HASH),
+            "Sentinel should contain opts_hash"
+        );

        // Trigger eviction
        lru.maybe_evict().unwrap();
@ -798,7 +809,11 @@ mod tests {
        }

        // Should have at least 95 parseable records (allowing for some edge cases)
-        assert!(parseable_count >= 95, "Expected at least 95 parseable records, got {}", parseable_count);
+        assert!(
+            parseable_count >= 95,
+            "Expected at least 95 parseable records, got {}",
+            parseable_count
+        );
    }

    #[test]
@ -823,7 +838,16 @@ mod tests {
                .open(&sentinel_file)
                .unwrap();
            for _ in 0..5 {
-                writeln!(file, "{} {}", SystemTime::now().duration_since(UNIX_EPOCH).unwrap().as_secs(), large_data).unwrap();
+                writeln!(
+                    file,
+                    "{} {}",
+                    SystemTime::now()
+                        .duration_since(UNIX_EPOCH)
+                        .unwrap()
+                        .as_secs(),
+                    large_data
+                )
+                .unwrap();
            }
        }

@ -835,10 +859,7 @@ mod tests {
        lru.touch(TEST_FINGERPRINT_2, TEST_OPTS_HASH_2).unwrap();

        // Old sentinel should exist
-        let old_sentinel = sentinel_file.with_extension(&format!(
-            "touched{}",
-            SENTINEL_OLD_SUFFIX
-        ));
+        let old_sentinel = sentinel_file.with_extension(&format!("touched{}", SENTINEL_OLD_SUFFIX));
        assert!(old_sentinel.exists());

        // New sentinel should be smaller
@ -891,15 +912,31 @@ mod tests {
        lru.touch(TEST_FINGERPRINT_2, TEST_OPTS_HASH).unwrap(); // newest

        // Build LRU order (use fingerprints without prefix to match filesystem layout)
-        let fp1 = TEST_FINGERPRINT.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(TEST_FINGERPRINT);
-        let fp2 = TEST_FINGERPRINT_2.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(TEST_FINGERPRINT_2);
+        let fp1 = TEST_FINGERPRINT
+            .strip_prefix(FINGERPRINT_PREFIX)
+            .unwrap_or(TEST_FINGERPRINT);
+        let fp2 = TEST_FINGERPRINT_2
+            .strip_prefix(FINGERPRINT_PREFIX)
+            .unwrap_or(TEST_FINGERPRINT_2);
        let entries = vec![
-            (fp1.to_string(), TEST_OPTS_HASH.to_string(), 1000,
-             entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, 1000)),
-            (fp1.to_string(), TEST_OPTS_HASH_2.to_string(), 2000,
-             entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH_2, 2000)),
-            (fp2.to_string(), TEST_OPTS_HASH.to_string(), 3000,
-             entry_path(cache_dir, TEST_FINGERPRINT_2, TEST_OPTS_HASH, 3000)),
+            (
+                fp1.to_string(),
+                TEST_OPTS_HASH.to_string(),
+                1000,
+                entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, 1000),
+            ),
+            (
+                fp1.to_string(),
+                TEST_OPTS_HASH_2.to_string(),
+                2000,
+                entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH_2, 2000),
+            ),
+            (
+                fp2.to_string(),
+                TEST_OPTS_HASH.to_string(),
+                3000,
+                entry_path(cache_dir, TEST_FINGERPRINT_2, TEST_OPTS_HASH, 3000),
+            ),
        ];

        let lru_order = lru.build_lru_order(&entries).unwrap();
@ -1007,14 +1044,16 @@ mod tests {

        // Helper to generate valid 64-char hex opts hashes with a counter
        // Replace the last 4 chars of the base hash with hex counter
-        let gen_opts = |i: u32| -> String {
-            format!("{}{:04x}", &TEST_OPTS_HASH[..60], i)
-        };
+        let gen_opts = |i: u32| -> String { format!("{}{:04x}", &TEST_OPTS_HASH[..60], i) };

        // Helper to generate valid 64-char hex fingerprints with a counter
        // Replace the last 4 chars of the base fingerprint with hex counter
        let gen_fp = |i: u32| -> String {
-            format!("{}{:04x}", &TEST_FINGERPRINT[FINGERPRINT_PREFIX.len()..60], i)
+            format!(
+                "{}{:04x}",
+                &TEST_FINGERPRINT[FINGERPRINT_PREFIX.len()..60],
+                i
+            )
        };

        // Create 1000 entries totaling 100 MB (over limit)
@ -1083,7 +1122,9 @@ mod tests {
    // Helper function to get fingerprint dir (copied from layout module)
    fn fingerprint_dir(cache_dir: &Path, fingerprint: &str) -> PathBuf {
        const FINGERPRINT_PREFIX: &str = "pdftract-v1:";
-        let fp = fingerprint.strip_prefix(FINGERPRINT_PREFIX).unwrap_or(fingerprint);
+        let fp = fingerprint
+            .strip_prefix(FINGERPRINT_PREFIX)
+            .unwrap_or(fingerprint);
        let prefix1 = &fp[0..2.min(fp.len())];
        let prefix2 = &fp[2..4.min(fp.len())];
        cache_dir.join(prefix1).join(prefix2).join(fp)
--- a/crates/pdftract-core/src/cache/mod.rs
+++ b/crates/pdftract-core/src/cache/mod.rs
@ -22,16 +22,18 @@
 //! - [`compression`] — Zstandard compression/decompression for cache entries
 //! - [`metadata`] — Cache index.json and metadata handling (TODO: 6.9.3)

+pub mod compression;
 pub mod key;
 pub mod layout;
-pub mod compression;
-pub mod multi_process;
 pub mod lru;
+pub mod multi_process;

 pub use key::CacheKey;
-pub use layout::{entry_path, CacheIndex, CURRENT_SCHEMA_VERSION, increment_hit_counter, increment_miss_counter};
-pub use multi_process::{Reader, Writer, cleanup_stale_temp_files};
+pub use layout::{
+    entry_path, increment_hit_counter, increment_miss_counter, CacheIndex, CURRENT_SCHEMA_VERSION,
+};
 pub use lru::Lru;
+pub use multi_process::{cleanup_stale_temp_files, Reader, Writer};

 use crate::extract::ExtractionResult;
 use crate::options::ExtractionOptions;
@ -44,7 +46,10 @@ use std::time::{SystemTime, UNIX_EPOCH};
 #[derive(Debug)]
 pub enum CacheLookupResult {
    /// Cache hit: entry found and deserialized successfully
-    Hit { result: ExtractionResult, age_seconds: u64 },
+    Hit {
+        result: ExtractionResult,
+        age_seconds: u64,
+    },
    /// Cache miss: entry not found or corrupt (will be overwritten)
    Miss,
    /// Cache skipped: cache not configured or disabled
@ -126,7 +131,10 @@ pub fn extract_with_cache(
                Ok(result) => {
                    // Cache hit - increment counter and touch the entry
                    let _ = increment_hit_counter(cache_dir);
-                    let lru = Lru::new(cache_dir, cache_size_bytes.unwrap_or(lru::DEFAULT_CACHE_SIZE_BYTES));
+                    let lru = Lru::new(
+                        cache_dir,
+                        cache_size_bytes.unwrap_or(lru::DEFAULT_CACHE_SIZE_BYTES),
+                    );
                    let _ = lru.touch(&fingerprint, &key.opts_hash);
                    return Ok((result, "hit".to_string(), Some(age_seconds)));
                }
@ -154,7 +162,8 @@ pub fn extract_with_cache(
            match compression::encode(&json_data) {
                Ok(compressed) => {
                    let writer = Writer::new(cache_dir);
-                    let _ = writer.write(&fingerprint, &key.opts_hash, compressed.len(), &compressed);
+                    let _ =
+                        writer.write(&fingerprint, &key.opts_hash, compressed.len(), &compressed);

                    // Update index entry count and total bytes
                    if let Ok(mut index) = layout::load_index(cache_dir) {
@ -165,7 +174,10 @@ pub fn extract_with_cache(
                    }

                    // Trigger LRU eviction if needed
-                    let lru = Lru::new(cache_dir, cache_size_bytes.unwrap_or(lru::DEFAULT_CACHE_SIZE_BYTES));
+                    let lru = Lru::new(
+                        cache_dir,
+                        cache_size_bytes.unwrap_or(lru::DEFAULT_CACHE_SIZE_BYTES),
+                    );
                    let _ = lru.maybe_evict();
                }
                Err(_) => {
--- a/crates/pdftract-core/src/cache/multi_process.rs
+++ b/crates/pdftract-core/src/cache/multi_process.rs
@ -373,14 +373,14 @@ pub fn cleanup_stale_temp_files(cache_dir: &Path) -> io::Result<()> {
    let _cleaned = 0;

    // Walk the two-byte prefix directories
-    for prefix1_entry in fs::read_dir(cache_dir)?
-        .filter_map(|e| e.ok())
-        .filter(|e| {
-            e.path().is_dir()
-                && e.file_name().to_string_lossy().len() == 2
-                && e.file_name().to_string_lossy().chars().all(|c| c.is_ascii_hexdigit())
-        })
-    {
+    for prefix1_entry in fs::read_dir(cache_dir)?.filter_map(|e| e.ok()).filter(|e| {
+        e.path().is_dir()
+            && e.file_name().to_string_lossy().len() == 2
+            && e.file_name()
+                .to_string_lossy()
+                .chars()
+                .all(|c| c.is_ascii_hexdigit())
+    }) {
        let prefix1_dir = prefix1_entry.path();

        // Walk the second-level prefix directories
@ -391,14 +391,15 @@ pub fn cleanup_stale_temp_files(cache_dir: &Path) -> io::Result<()> {
                    .to_string_lossy()
                    .chars()
                    .all(|c| c.is_ascii_hexdigit())
-        })
-        {
+        }) {
            let prefix2_dir = prefix2_entry.path();

            // Walk the fingerprint directories
-            for fp_entry in prefix2_dir.read_dir()?.filter_map(|e| e.ok()).filter(|e| {
-                e.path().is_dir()
-            }) {
+            for fp_entry in prefix2_dir
+                .read_dir()?
+                .filter_map(|e| e.ok())
+                .filter(|e| e.path().is_dir())
+            {
                let fp_dir = fp_entry.path();

                // Walk the entry files
@ -413,7 +414,8 @@ pub fn cleanup_stale_temp_files(cache_dir: &Path) -> io::Result<()> {
                                if let Ok(metadata) = path.metadata() {
                                    if let Ok(modified) = metadata.modified() {
                                        if let Ok(duration) = modified.duration_since(UNIX_EPOCH) {
-                                            let age_seconds = now.saturating_sub(duration.as_secs());
+                                            let age_seconds =
+                                                now.saturating_sub(duration.as_secs());

                                            if age_seconds > TEMP_FILE_MAX_AGE_SECONDS {
                                                // Delete stale temp file
@ -441,7 +443,8 @@ mod tests {
    use std::time::Duration;
    use tempfile::TempDir;

-    const TEST_FINGERPRINT: &str = "pdftract-v1:e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
+    const TEST_FINGERPRINT: &str =
+        "pdftract-v1:e7a1f3deadbeef00000000000000000000000000000000000000000000000000";
    const TEST_OPTS_HASH: &str = "9b21c0ffee000000000000000000000000000000000000000000000000000000";
    const TEST_DATA: &[u8] = b"test cache entry data";

@ -458,12 +461,19 @@ mod tests {
        let compressed = compress_data(TEST_DATA);

        writer
-            .write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len(), &compressed)
+            .write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed.len(),
+                &compressed,
+            )
            .unwrap();

        // Verify the entry exists
        let reader = Reader::new(cache_dir);
-        let result = reader.read(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len()).unwrap();
+        let result = reader
+            .read(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len())
+            .unwrap();
        assert_eq!(result, TEST_DATA);
    }

@ -493,7 +503,12 @@ mod tests {

        // Write entry
        writer
-            .write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len(), &compressed)
+            .write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed.len(),
+                &compressed,
+            )
            .unwrap();

        // Now it exists
@ -509,12 +524,22 @@ mod tests {
        let compressed = compress_data(TEST_DATA);

        // Parent directories don't exist yet
-        let entry = entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len());
+        let entry = entry_path(
+            cache_dir,
+            TEST_FINGERPRINT,
+            TEST_OPTS_HASH,
+            compressed.len(),
+        );
        assert!(!entry.exists());

        // Write should create parent directories
        writer
-            .write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len(), &compressed)
+            .write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed.len(),
+                &compressed,
+            )
            .unwrap();

        assert!(entry.exists());
@ -535,19 +560,32 @@ mod tests {

        let handle1 = thread::spawn(move || {
            let writer = Writer::new(&cache_dir1);
-            writer.write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed_size, &compressed1)
+            writer.write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed_size,
+                &compressed1,
+            )
        });

        let handle2 = thread::spawn(move || {
            let writer = Writer::new(&cache_dir2);
-            writer.write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed_size, &compressed2)
+            writer.write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed_size,
+                &compressed2,
+            )
        });

        // Both should succeed (no deadlock)
        let result1 = handle1.join().unwrap();
        let result2 = handle2.join().unwrap();

-        assert!(result1.is_ok() || result2.is_ok(), "At least one writer should succeed");
+        assert!(
+            result1.is_ok() || result2.is_ok(),
+            "At least one writer should succeed"
+        );

        // The final entry should be valid (one of the two)
        let reader = Reader::new(&cache_dir);
@ -594,9 +632,9 @@ mod tests {
            // Need to find the actual compressed size
            let entry_path_buf = entry_path(&cache_dir, &fp, &opts, 0);
            let entry_dir = entry_path_buf.parent().unwrap();
-            let _found = fs::read_dir(entry_dir).unwrap().any(|e| {
-                e.ok().filter(|f| f.path().is_file()).is_some()
-            });
+            let _found = fs::read_dir(entry_dir)
+                .unwrap()
+                .any(|e| e.ok().filter(|f| f.path().is_file()).is_some());

            assert!(_found, "Entry {} should exist", i);
        }
@ -612,10 +650,20 @@ mod tests {

        // Write a valid entry
        writer
-            .write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len(), &compressed)
+            .write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed.len(),
+                &compressed,
+            )
            .unwrap();

-        let entry = entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len());
+        let entry = entry_path(
+            cache_dir,
+            TEST_FINGERPRINT,
+            TEST_OPTS_HASH,
+            compressed.len(),
+        );

        // Corrupt the entry by truncating it
        {
@ -647,7 +695,12 @@ mod tests {
        let compressed = compress_data(TEST_DATA);

        // Create a temp file manually
-        let entry = entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len());
+        let entry = entry_path(
+            cache_dir,
+            TEST_FINGERPRINT,
+            TEST_OPTS_HASH,
+            compressed.len(),
+        );
        let temp_path = writer.temp_path(&entry);

        // Create parent directory first
@ -678,7 +731,12 @@ mod tests {
        let compressed = compress_data(TEST_DATA);

        // Create a recent temp file
-        let entry = entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len());
+        let entry = entry_path(
+            cache_dir,
+            TEST_FINGERPRINT,
+            TEST_OPTS_HASH,
+            compressed.len(),
+        );
        let temp_path = writer.temp_path(&entry);

        // Create parent directory first
@ -723,7 +781,12 @@ mod tests {

        let writer = Writer::new(cache_dir);
        let compressed = compress_data(TEST_DATA);
-        let entry = entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len());
+        let entry = entry_path(
+            cache_dir,
+            TEST_FINGERPRINT,
+            TEST_OPTS_HASH,
+            compressed.len(),
+        );

        // Generate multiple temp paths
        let path1 = writer.temp_path(&entry);
@ -754,7 +817,12 @@ mod tests {

        // This should work normally
        writer
-            .write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len(), &compressed)
+            .write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed.len(),
+                &compressed,
+            )
            .unwrap();

        // Verify the entry exists
@ -838,7 +906,8 @@ mod tests {
                thread::spawn(move || {
                    for iter in 0..NUM_ITERATIONS {
                        for (key_idx, (fp, opts)) in keys.iter().enumerate() {
-                            let data = format!("process {} iteration {} key {}", proc_id, iter, key_idx);
+                            let data =
+                                format!("process {} iteration {} key {}", proc_id, iter, key_idx);
                            let compressed = compress_data(data.as_bytes());
                            let size = compressed.len();

@ -871,9 +940,9 @@ mod tests {
            let entry_path_buf = entry_path(&cache_dir, fp, opts, 0);
            let fp_dir = entry_path_buf.parent().unwrap();
            if fp_dir.exists() {
-                let _found = fs::read_dir(fp_dir).unwrap().any(|e| {
-                    e.ok().filter(|f| f.path().is_file()).is_some()
-                });
+                let _found = fs::read_dir(fp_dir)
+                    .unwrap()
+                    .any(|e| e.ok().filter(|f| f.path().is_file()).is_some());
                // At least one entry should exist for this key
                // (may have multiple versions due to concurrent writes)
            }
@ -923,12 +992,22 @@ mod tests {

        let handle1 = thread::spawn(move || {
            let writer = Writer::new(&cache_dir1);
-            writer.write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed_size, &compressed1)
+            writer.write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed_size,
+                &compressed1,
+            )
        });

        let handle2 = thread::spawn(move || {
            let writer = Writer::new(&cache_dir2);
-            writer.write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed_size, &compressed2)
+            writer.write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed_size,
+                &compressed2,
+            )
        });

        // Both should succeed without deadlock
@ -941,7 +1020,10 @@ mod tests {
        // Final entry should be valid
        let reader = Reader::new(&cache_dir);
        let result = reader.read(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed_size);
-        assert!(result.is_ok(), "Entry should be readable after concurrent writes");
+        assert!(
+            result.is_ok(),
+            "Entry should be readable after concurrent writes"
+        );
    }

    #[test]
@ -960,7 +1042,12 @@ mod tests {
                let compressed = compressed.clone();
                thread::spawn(move || {
                    let writer = Writer::new(&cache_dir);
-                    writer.write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed_size, &compressed)
+                    writer.write(
+                        TEST_FINGERPRINT,
+                        TEST_OPTS_HASH,
+                        compressed_size,
+                        &compressed,
+                    )
                })
            })
            .collect();
@ -1006,11 +1093,21 @@ mod tests {
        let compressed = compress_data(TEST_DATA);

        writer
-            .write(TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len(), &compressed)
+            .write(
+                TEST_FINGERPRINT,
+                TEST_OPTS_HASH,
+                compressed.len(),
+                &compressed,
+            )
            .unwrap();

        // Corrupt the entry
-        let entry = entry_path(cache_dir, TEST_FINGERPRINT, TEST_OPTS_HASH, compressed.len());
+        let entry = entry_path(
+            cache_dir,
+            TEST_FINGERPRINT,
+            TEST_OPTS_HASH,
+            compressed.len(),
+        );
        fs::write(&entry, b"corrupted data").unwrap();

        // Read should detect corruption, delete entry, and return error
--- a/crates/pdftract-core/src/classify.rs
+++ b/crates/pdftract-core/src/classify.rs
@ -25,8 +25,8 @@
 //! 4. After all signals run: tally votes weighted by strength; pick highest-weight class
 //! 5. If no signal voted, default to Vector with confidence 0.5

-use std::collections::BTreeSet;
 use serde::{Deserialize, Serialize};
+use std::collections::BTreeSet;

 /// Page context containing all metrics needed for classification.
 ///
@ -360,7 +360,8 @@ impl PageClassifier {
        }

        // Weight each class by sum of strengths
-        let mut class_weights: std::collections::HashMap<PageClass, f32> = std::collections::HashMap::new();
+        let mut class_weights: std::collections::HashMap<PageClass, f32> =
+            std::collections::HashMap::new();
        let mut total_weight = 0.0;

        for vote in &votes {
@ -960,7 +961,10 @@ mod tests {
        set2.insert(2);

        // Iteration order should be the same
-        assert_eq!(set1.iter().collect::<Vec<_>>(), set2.iter().collect::<Vec<_>>());
+        assert_eq!(
+            set1.iter().collect::<Vec<_>>(),
+            set2.iter().collect::<Vec<_>>()
+        );
    }

    #[test]
@ -1022,9 +1026,12 @@ mod tests {
        // Verify all scanned cells are from rows 2-7 only
        for flat in scanned_cells {
            let cell = CellIndex::from_flat(*flat);
-            assert!(cell.row >= 2 && cell.row <= 7,
+            assert!(
+                cell.row >= 2 && cell.row <= 7,
                "scanned cell at flat {} should be in rows 2-7, got row {}",
-                flat, cell.row);
+                flat,
+                cell.row
+            );
        }
    }

@ -1432,7 +1439,10 @@ mod tests {

        assert_eq!(result1.class, result2.class);
        assert_eq!(result1.confidence, result2.confidence);
-        assert_eq!(result1.hybrid_cells.is_some(), result2.hybrid_cells.is_some());
+        assert_eq!(
+            result1.hybrid_cells.is_some(),
+            result2.hybrid_cells.is_some()
+        );
    }

    #[test]
@ -1440,9 +1450,9 @@ mod tests {
        // Verify all confidence values are in [0.0, 1.0]
        let test_cases = vec![
            // (text_ops, raw_chars, valid_chars, image_cov, density)
-            (0, 0, 0, 0.0, 0.0),      // blank
-            (0, 0, 0, 0.95, 0.0),     // scanned
-            (100, 1000, 100, 0.1, 0.1), // low validity
+            (0, 0, 0, 0.0, 0.0),         // blank
+            (0, 0, 0, 0.95, 0.0),        // scanned
+            (100, 1000, 100, 0.1, 0.1),  // low validity
            (500, 3000, 2900, 0.0, 0.9), // high validity vector
            (200, 1500, 1400, 0.7, 0.5), // ambiguous
        ];
@ -1459,7 +1469,12 @@ mod tests {
            assert!(
                result.confidence >= 0.0 && result.confidence <= 1.0,
                "confidence {} out of range for case ({}, {}, {}, {}, {})",
-                result.confidence, text_ops, raw, valid, img_cov, density
+                result.confidence,
+                text_ops,
+                raw,
+                valid,
+                img_cov,
+                density
            );
        }
    }
@ -1585,9 +1600,17 @@ mod tests {
                grid_cells: Some(std::array::from_fn(|i| {
                    let row = i / 8;
                    if row < 2 {
-                        CellData { text_op_count: 15, image_coverage: 0.05, char_validity: 0.95 }
+                        CellData {
+                            text_op_count: 15,
+                            image_coverage: 0.05,
+                            char_validity: 0.95,
+                        }
                    } else {
-                        CellData { text_op_count: 0, image_coverage: 0.90, char_validity: 0.0 }
+                        CellData {
+                            text_op_count: 0,
+                            image_coverage: 0.90,
+                            char_validity: 0.0,
+                        }
                    }
                })),
            },
--- a/crates/pdftract-core/src/content_stream.rs
+++ b/crates/pdftract-core/src/content_stream.rs
@ -673,8 +673,14 @@ mod tests {
        // Verify both modes complete successfully
        // The actual 10% speedup comes from skipping ToUnicode lookup
        // which is implemented in the process_string function
-        assert!(normal_duration.as_nanos() > 0, "Normal mode should complete");
-        assert!(hint_duration.as_nanos() > 0, "PositionHint mode should complete");
+        assert!(
+            normal_duration.as_nanos() > 0,
+            "Normal mode should complete"
+        );
+        assert!(
+            hint_duration.as_nanos() > 0,
+            "PositionHint mode should complete"
+        );

        // In practice, PositionHint is faster because it skips ToUnicode lookup.
        // This test verifies the code paths work correctly; for actual
--- a/crates/pdftract-core/src/document.rs
+++ b/crates/pdftract-core/src/document.rs
@ -9,14 +9,16 @@
 //! `PageIter` which yields pages lazily without materializing the entire page tree.
 //! Use `PdfExtractor::pages()` to get an iterator that extracts each page on-demand.

-use crate::fingerprint::{CatalogFlags, ContentStreamData, FingerprintInput, PageFingerprintData, compute_fingerprint};
+use crate::fingerprint::{
+    compute_fingerprint, CatalogFlags, ContentStreamData, FingerprintInput, PageFingerprintData,
+};
 use crate::parser::catalog::{parse_catalog, Catalog};
-use crate::parser::pages::{flatten_page_tree, PageDict, LazyPageIter};
+use crate::parser::pages::{flatten_page_tree, LazyPageIter, PageDict};
 use crate::parser::stream::{FileSource, PdfSource};
-use crate::parser::xref::{XrefResolver, load_xref_with_prev_chain, XrefSection};
+use crate::parser::xref::{load_xref_with_prev_chain, XrefResolver, XrefSection};
 use crate::receipts::verifier::SpanData;
-use anyhow::{Context, Result, anyhow};
-use serde::{Serialize, Deserialize};
+use anyhow::{anyhow, Context, Result};
+use serde::{Deserialize, Serialize};
 use std::path::Path;

 /// Parse a PDF file and return the document components needed for verification.
@ -35,14 +37,19 @@ use std::path::Path;
 /// # Returns
 ///
 /// A tuple of (fingerprint, catalog, pages, resolver)
-pub fn parse_pdf_file(pdf_path: &std::path::Path) -> Result<(String, Catalog, Vec<crate::parser::pages::PageDict>, XrefResolver)> {
+pub fn parse_pdf_file(
+    pdf_path: &std::path::Path,
+) -> Result<(
+    String,
+    Catalog,
+    Vec<crate::parser::pages::PageDict>,
+    XrefResolver,
+)> {
    // Open the PDF file
-    let source = FileSource::open(pdf_path)
-        .context("Failed to open PDF file")?;
+    let source = FileSource::open(pdf_path).context("Failed to open PDF file")?;

    // Find the startxref offset
-    let startxref_offset = find_startxref(&source)
-        .context("Failed to find startxref offset")?;
+    let startxref_offset = find_startxref(&source).context("Failed to find startxref offset")?;

    // Load the xref table
    let xref_section = load_xref_with_prev_chain(&source, startxref_offset);
@ -51,29 +58,30 @@ pub fn parse_pdf_file(pdf_path: &std::path::Path) -> Result<(String, Catalog, Ve
    let resolver = XrefResolver::from_section(xref_section.clone());

    // Get the root reference from trailer
-    let root_ref = xref_section.trailer
+    let root_ref = xref_section
+        .trailer
        .as_ref()
        .and_then(|trailer| trailer.get("Root"))
        .and_then(|obj| obj.as_ref())
        .ok_or_else(|| anyhow!("No /Root reference in trailer"))?;

    // Parse the catalog
-    let catalog = parse_catalog(&resolver, root_ref)
-        .map_err(|diagnostics| {
-            let msg = diagnostics.first()
-                .map(|d| d.message.as_ref())
-                .unwrap_or("unknown error");
-            anyhow!("Failed to parse catalog: {}", msg)
-        })?;
+    let catalog = parse_catalog(&resolver, root_ref).map_err(|diagnostics| {
+        let msg = diagnostics
+            .first()
+            .map(|d| d.message.as_ref())
+            .unwrap_or("unknown error");
+        anyhow!("Failed to parse catalog: {}", msg)
+    })?;

    // Flatten the page tree
-    let pages = flatten_page_tree(&resolver, catalog.pages_ref)
-        .map_err(|diagnostics| {
-            let msg = diagnostics.first()
-                .map(|d| d.message.as_ref())
-                .unwrap_or("unknown error");
-            anyhow!("Failed to flatten page tree: {}", msg)
-        })?;
+    let pages = flatten_page_tree(&resolver, catalog.pages_ref).map_err(|diagnostics| {
+        let msg = diagnostics
+            .first()
+            .map(|d| d.message.as_ref())
+            .unwrap_or("unknown error");
+        anyhow!("Failed to flatten page tree: {}", msg)
+    })?;

    // Build fingerprint input
    let fingerprint_input = build_fingerprint_input(&catalog, &pages, &xref_section);
@ -92,11 +100,13 @@ fn find_startxref(source: &dyn PdfSource) -> Result<u64> {
    let scan_start = len.saturating_sub(1024);
    let scan_end = len;

-    let tail_data = source.read_at(scan_start as u64, scan_end - scan_start)
+    let tail_data = source
+        .read_at(scan_start as u64, scan_end - scan_start)
        .context("Failed to read PDF tail")?;

    // Find "startxref" in the tail data
-    let startxref_pos = tail_data.windows(9)
+    let startxref_pos = tail_data
+        .windows(9)
        .rposition(|w| w == b"startxref")
        .ok_or_else(|| anyhow!("startxref not found in PDF"))?;

@ -105,21 +115,25 @@ fn find_startxref(source: &dyn PdfSource) -> Result<u64> {
    let offset_data = &tail_data[startxref_pos + 9..];

    // Skip leading whitespace (space, \r, \n, \t)
-    let offset_start = offset_data.iter()
+    let offset_start = offset_data
+        .iter()
        .position(|&b| !matches!(b, b' ' | b'\r' | b'\n' | b'\t'))
        .unwrap_or(offset_data.len());

    let offset_data_trimmed = &offset_data[offset_start..];

    // Find the newline after the offset
-    let newline_pos = offset_data_trimmed.iter()
+    let newline_pos = offset_data_trimmed
+        .iter()
        .position(|&b| b == b'\n' || b == b'\r')
        .unwrap_or(offset_data_trimmed.len());

    let offset_str = std::str::from_utf8(&offset_data_trimmed[..newline_pos])
        .context("startxref offset is not valid UTF-8")?;

-    let offset: u64 = offset_str.trim().parse()
+    let offset: u64 = offset_str
+        .trim()
+        .parse()
        .context("startxref offset is not a valid number")?;

    Ok(offset)
@ -133,24 +147,31 @@ fn build_fingerprint_input(
 ) -> FingerprintInput {
    let page_count = pages.len() as u32;

-    let fingerprint_pages = pages.iter().map(|page| {
-        PageFingerprintData {
-            content_streams: page.contents.iter()
-                .map(|&obj_ref| ContentStreamData::Indirect(obj_ref))
-                .collect(),
-            resources: None, // TODO: convert ResourceDict to PdfDict
-            media_box: page.media_box,
-            crop_box: page.crop_box,
-            rotate: page.rotate,
-        }
-    }).collect();
+    let fingerprint_pages = pages
+        .iter()
+        .map(|page| {
+            PageFingerprintData {
+                content_streams: page
+                    .contents
+                    .iter()
+                    .map(|&obj_ref| ContentStreamData::Indirect(obj_ref))
+                    .collect(),
+                resources: None, // TODO: convert ResourceDict to PdfDict
+                media_box: page.media_box,
+                crop_box: page.crop_box,
+                rotate: page.rotate,
+            }
+        })
+        .collect();

    // Build catalog flags
    let catalog_flags = CatalogFlags {
        is_encrypted: false, // TODO: detect encryption
        contains_javascript: catalog.open_action.is_some() || catalog.aa.is_some(),
        contains_xfa: false, // TODO: detect XFA
-        ocg_present: catalog.oc_properties.as_ref()
+        ocg_present: catalog
+            .oc_properties
+            .as_ref()
            .map(|props| props.present)
            .unwrap_or(false),
    };
@ -186,8 +207,11 @@ pub fn extract_spans_from_page(

    // Check page index bounds
    if page_index >= pages.len() {
-        return Err(anyhow!("Page index {} out of bounds (document has {} pages)",
-            page_index, pages.len()));
+        return Err(anyhow!(
+            "Page index {} out of bounds (document has {} pages)",
+            page_index,
+            pages.len()
+        ));
    }

    let page = &pages[page_index];
@ -260,12 +284,11 @@ impl PdfExtractor {
        let path = pdf_path.as_ref();

        // Open the PDF file
-        let source = FileSource::open(path)
-            .context("Failed to open PDF file")?;
+        let source = FileSource::open(path).context("Failed to open PDF file")?;

        // Find the startxref offset
-        let startxref_offset = find_startxref(&source)
-            .context("Failed to find startxref offset")?;
+        let startxref_offset =
+            find_startxref(&source).context("Failed to find startxref offset")?;

        // Load the xref table
        let xref_section = load_xref_with_prev_chain(&source, startxref_offset);
@ -274,20 +297,21 @@ impl PdfExtractor {
        let resolver = XrefResolver::from_section(xref_section.clone());

        // Get the root reference from trailer
-        let root_ref = xref_section.trailer
+        let root_ref = xref_section
+            .trailer
            .as_ref()
            .and_then(|trailer| trailer.get("Root"))
            .and_then(|obj| obj.as_ref())
            .ok_or_else(|| anyhow!("No /Root reference in trailer"))?;

        // Parse the catalog
-        let catalog = parse_catalog(&resolver, root_ref)
-            .map_err(|diagnostics| {
-                let msg = diagnostics.first()
-                    .map(|d| d.message.as_ref())
-                    .unwrap_or("unknown error");
-                anyhow!("Failed to parse catalog: {}", msg)
-            })?;
+        let catalog = parse_catalog(&resolver, root_ref).map_err(|diagnostics| {
+            let msg = diagnostics
+                .first()
+                .map(|d| d.message.as_ref())
+                .unwrap_or("unknown error");
+            anyhow!("Failed to parse catalog: {}", msg)
+        })?;

        // Build fingerprint input (without full page tree for lazy extraction)
        let fingerprint = compute_fingerprint_lazy(&catalog, &xref_section);
@ -406,12 +430,17 @@ impl PdfExtractor {
    /// This method extracts one page without materializing the entire document.
    /// Content streams are decoded and the result is returned.
    pub fn extract_page(&self, page_index: usize) -> Result<PageExtraction> {
-        let pages = self.pages.as_ref()
+        let pages = self
+            .pages
+            .as_ref()
            .ok_or_else(|| anyhow!("Pages not materialized. Call materialize_pages() first."))?;

        if page_index >= pages.len() {
-            return Err(anyhow!("Page index {} out of bounds (document has {} pages)",
-                page_index, pages.len()));
+            return Err(anyhow!(
+                "Page index {} out of bounds (document has {} pages)",
+                page_index,
+                pages.len()
+            ));
        }

        let page = &pages[page_index];
@ -489,7 +518,8 @@ impl<'a> Iterator for PageIter<'a> {
            match LazyPageIter::new(&self.extractor.resolver, self.extractor.catalog.pages_ref) {
                Ok(iter) => self.lazy_iter = Some(iter),
                Err(diagnostics) => {
-                    let msg = diagnostics.first()
+                    let msg = diagnostics
+                        .first()
                        .map(|d| d.message.as_ref())
                        .unwrap_or("unknown error");
                    return Some(Err(anyhow!("Failed to create lazy page iterator: {}", msg)));
@ -518,11 +548,16 @@ impl<'a> Iterator for PageIter<'a> {
                Some(result)
            }
            Some(Err(diagnostics)) => {
-                let msg = diagnostics.first()
+                let msg = diagnostics
+                    .first()
                    .map(|d| d.message.as_ref())
                    .unwrap_or("unknown error");
                self.index += 1;
-                Some(Err(anyhow!("Error extracting page {}: {}", self.index - 1, msg)))
+                Some(Err(anyhow!(
+                    "Error extracting page {}: {}",
+                    self.index - 1,
+                    msg
+                )))
            }
            None => None,
        }
@ -547,7 +582,9 @@ pub(crate) fn compute_fingerprint_lazy(catalog: &Catalog, _xref_section: &XrefSe
            is_encrypted: false,
            contains_javascript: catalog.open_action.is_some() || catalog.aa.is_some(),
            contains_xfa: false,
-            ocg_present: catalog.oc_properties.as_ref()
+            ocg_present: catalog
+                .oc_properties
+                .as_ref()
                .map(|props| props.present)
                .unwrap_or(false),
        },
@ -559,8 +596,8 @@ pub(crate) fn compute_fingerprint_lazy(catalog: &Catalog, _xref_section: &XrefSe
 #[cfg(test)]
 mod tests {
    use super::*;
-    use std::io::Write;
    use std::fs::File;
+    use std::io::Write;

    /// Create a minimal valid PDF for testing.
    fn create_minimal_pdf(path: &std::path::Path) -> Result<()> {
--- a/crates/pdftract-core/src/dpi.rs
+++ b/crates/pdftract-core/src/dpi.rs
@ -21,8 +21,8 @@
 //! images are already binary at scan resolution; rendering at 300 DPI throws away
 //! no data but wastes ~9x the CPU.

-use crate::options::ExtractionOptions;
 use crate::classify::PageContext;
+use crate::options::ExtractionOptions;

 /// PDF 1.x filter name for image streams.
 ///
@ -206,10 +206,7 @@ fn compute_median_font_size(font_sizes: &[f32]) -> f32 {
    }

    // Clamp font sizes to reasonable bounds to prevent outliers
-    let mut clamped: Vec<f32> = font_sizes
-        .iter()
-        .map(|&s| s.clamp(4.0, 72.0))
-        .collect();
+    let mut clamped: Vec<f32> = font_sizes.iter().map(|&s| s.clamp(4.0, 72.0)).collect();

    // Use nth_element for O(n) median selection
    let len = clamped.len();
@ -238,8 +235,14 @@ mod tests {

    #[test]
    fn test_pdf1_filter_from_name() {
-        assert_eq!(Pdf1Filter::from_name("JBIG2Decode"), Pdf1Filter::Jbig2Decode);
-        assert_eq!(Pdf1Filter::from_name("/JBIG2Decode"), Pdf1Filter::Jbig2Decode);
+        assert_eq!(
+            Pdf1Filter::from_name("JBIG2Decode"),
+            Pdf1Filter::Jbig2Decode
+        );
+        assert_eq!(
+            Pdf1Filter::from_name("/JBIG2Decode"),
+            Pdf1Filter::Jbig2Decode
+        );
        assert_eq!(Pdf1Filter::from_name("DCTDecode"), Pdf1Filter::DctDecode);
        assert_eq!(Pdf1Filter::from_name("DCT"), Pdf1Filter::DctDecode);
        assert_eq!(Pdf1Filter::from_name("Fl"), Pdf1Filter::FlateDecode);
@ -404,8 +407,8 @@ mod tests {
        // With 30 footnotes vs 20 body text, median should be in fine-print range
        let mut font_sizes: Vec<f32> = (0..30).map(|_| 6.0).collect(); // footnotes
        font_sizes.extend((0..20).map(|_| 10.0)); // body text
-        // Sorted: 30x 6.0, then 20x 10.0 -> median is at index 25 (0-indexed)
-        // That's the 26th element, which is 6.0
+                                                  // Sorted: 30x 6.0, then 20x 10.0 -> median is at index 25 (0-indexed)
+                                                  // That's the 26th element, which is 6.0
        let dpi = select_dpi(&page, &filters, Some(&font_sizes), &options);
        assert_eq!(dpi, 400);
    }
--- a/crates/pdftract-core/src/fingerprint/canonicalize.rs
+++ b/crates/pdftract-core/src/fingerprint/canonicalize.rs
@ -15,7 +15,7 @@
 //! - **Resource dicts**: Dictionary keys are sorted lexicographically for
 //!   deterministic serialization regardless of insertion order

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::parser::lexer::{Lexer, Token};
 use std::collections::BTreeMap;
 use std::sync::Arc;
@ -355,10 +355,19 @@ pub fn hash_resource_dict_canonical(resources: Option<&PdfDict>) -> [u8; 32] {

    if let Some(resources) = resources {
        // Namespaces to iterate in lexical order
-        let namespaces = ["/Font", "/XObject", "/ExtGState", "/ColorSpace", "/Pattern", "/Shading", "/Properties"];
-        let mut sorted_namespaces: Vec<_> = namespaces.iter().filter_map(|&ns| {
-            resources.get(ns).and_then(|v| v.as_dict()).map(|d| (ns, d))
-        }).collect();
+        let namespaces = [
+            "/Font",
+            "/XObject",
+            "/ExtGState",
+            "/ColorSpace",
+            "/Pattern",
+            "/Shading",
+            "/Properties",
+        ];
+        let mut sorted_namespaces: Vec<_> = namespaces
+            .iter()
+            .filter_map(|&ns| resources.get(ns).and_then(|v| v.as_dict()).map(|d| (ns, d)))
+            .collect();

        // Sort namespaces lexicographically (they're already mostly sorted, but ensure)
        sorted_namespaces.sort_by_key(|&(ns, _)| ns);
@ -416,7 +425,7 @@ mod tests {

        // Test edge cases from plan
        assert_eq!(canonicalize_f64(0.00005, &mut diags), 0); // 0.5 rounds to even (0)
-        // Note: 0.00015 * 10000 = 1.4999... due to float representation, so rounds to 1
+                                                              // Note: 0.00015 * 10000 = 1.4999... due to float representation, so rounds to 1
        assert_eq!(canonicalize_f64(0.00015, &mut diags), 1); // 1.4999... rounds to 1

        // Test negative banker's rounding
@ -579,7 +588,10 @@ mod tests {
        let hash1 = hash_resource_dict_canonical(Some(&resources1));
        let hash2 = hash_resource_dict_canonical(Some(&resources2));

-        assert_eq!(hash1, hash2, "Resource dict hash should be independent of insertion order");
+        assert_eq!(
+            hash1, hash2,
+            "Resource dict hash should be independent of insertion order"
+        );
    }

    #[test]
--- a/crates/pdftract-core/src/fingerprint/mod.rs
+++ b/crates/pdftract-core/src/fingerprint/mod.rs
@ -103,10 +103,18 @@ impl CatalogFlags {
    /// Encode the flags into a single byte.
    fn encode(&self) -> u8 {
        let mut byte = 0u8;
-        if self.is_encrypted { byte |= 1 << 0; }
-        if self.contains_javascript { byte |= 1 << 1; }
-        if self.contains_xfa { byte |= 1 << 2; }
-        if self.ocg_present { byte |= 1 << 3; }
+        if self.is_encrypted {
+            byte |= 1 << 0;
+        }
+        if self.contains_javascript {
+            byte |= 1 << 1;
+        }
+        if self.contains_xfa {
+            byte |= 1 << 2;
+        }
+        if self.ocg_present {
+            byte |= 1 << 3;
+        }
        byte
    }
 }
@ -193,9 +201,7 @@ fn hash_content_streams(streams: &[ContentStreamData], resolver: &XrefResolver)
                    _ => Vec::new(),
                }
            }
-            ContentStreamData::Direct(bytes) => {
-                normalize_content_bytes(bytes)
-            }
+            ContentStreamData::Direct(bytes) => normalize_content_bytes(bytes),
        };
        hasher.update(&bytes);
    }
@ -409,24 +415,22 @@ fn hash_extgstate(gs_obj: &PdfObject) -> [u8; 32] {
 /// - Rotate as 4-byte BE i32
 ///
 /// NaN/Inf values are canonicalized to 0 and emit STRUCT_INVALID_GEOMETRY diagnostics.
-fn hash_page_geometry(
-    media_box: &[f64; 4],
-    crop_box: Option<&[f64; 4]>,
-    rotate: i32,
-) -> [u8; 32] {
+fn hash_page_geometry(media_box: &[f64; 4], crop_box: Option<&[f64; 4]>, rotate: i32) -> [u8; 32] {
    let mut hasher = Sha256::new();
    let mut diagnostics: Option<Vec<Diagnostic>> = None;

    // MediaBox: 4 coordinates, 8 bytes each = 32 bytes
    for coord in media_box {
-        let canonical = crate::fingerprint::canonicalize::canonicalize_f64(*coord, &mut diagnostics);
+        let canonical =
+            crate::fingerprint::canonicalize::canonicalize_f64(*coord, &mut diagnostics);
        hasher.update(&canonical.to_be_bytes());
    }

    // CropBox: if present, same format
    if let Some(crop) = crop_box {
        for coord in crop {
-            let canonical = crate::fingerprint::canonicalize::canonicalize_f64(*coord, &mut diagnostics);
+            let canonical =
+                crate::fingerprint::canonicalize::canonicalize_f64(*coord, &mut diagnostics);
            hasher.update(&canonical.to_be_bytes());
        }
    }
@ -491,11 +495,7 @@ fn hash_structure_tree(struct_ref: ObjRef, resolver: &XrefResolver) -> [u8; 32]
 }

 /// Recursively hash structure tree elements.
-fn hash_structure_elements(
-    dict: &PdfDict,
-    hasher: &mut Sha256,
-    resolver: &XrefResolver,
-) {
+fn hash_structure_elements(dict: &PdfDict, hasher: &mut Sha256, resolver: &XrefResolver) {
    // Extract and hash relevant keys: /S, /Lang, /Alt, /ActualText
    let keys_to_hash = ["S", "Lang", "Alt", "ActualText"];

@ -533,7 +533,13 @@ fn hash_structure_elements(
 fn serialize_pdf_object_canonical(obj: &PdfObject) -> Vec<u8> {
    match obj {
        PdfObject::Null => b"null".to_vec(),
-        PdfObject::Bool(b) => if *b { b"true".to_vec() } else { b"false".to_vec() },
+        PdfObject::Bool(b) => {
+            if *b {
+                b"true".to_vec()
+            } else {
+                b"false".to_vec()
+            }
+        }
        PdfObject::Integer(i) => i.to_string().into_bytes(),
        PdfObject::Real(r) => {
            // Serialize with consistent precision
@ -578,9 +584,7 @@ fn serialize_pdf_object_canonical(obj: &PdfObject) -> Vec<u8> {
            result.extend_from_slice(b" stream");
            result
        }
-        PdfObject::Indirect(i) => {
-            format!("{} {} obj", i.id.object, i.id.generation).into_bytes()
-        }
+        PdfObject::Indirect(i) => format!("{} {} obj", i.id.object, i.id.generation).into_bytes(),
    }
 }

@ -665,7 +669,7 @@ mod tests {
    fn test_round_to_fixed_4dp_critical_cases() {
        // Test edge cases from plan
        assert_eq!(round_to_fixed_4dp(0.00005), 0); // 0.5 rounds to even (0)
-        // Note: 0.00015 * 10000 = 1.4999... due to float representation, so rounds to 1
+                                                    // Note: 0.00015 * 10000 = 1.4999... due to float representation, so rounds to 1
        assert_eq!(round_to_fixed_4dp(0.00015), 1); // 1.4999... rounds to 1

        // Test negative banker's rounding
@ -678,24 +682,42 @@ mod tests {
        assert_eq!(serialize_pdf_object_canonical(&PdfObject::Null), b"null");

        // Boolean
-        assert_eq!(serialize_pdf_object_canonical(&PdfObject::Bool(true)), b"true");
-        assert_eq!(serialize_pdf_object_canonical(&PdfObject::Bool(false)), b"false");
+        assert_eq!(
+            serialize_pdf_object_canonical(&PdfObject::Bool(true)),
+            b"true"
+        );
+        assert_eq!(
+            serialize_pdf_object_canonical(&PdfObject::Bool(false)),
+            b"false"
+        );

        // Integer
-        assert_eq!(serialize_pdf_object_canonical(&PdfObject::Integer(42)), b"42");
+        assert_eq!(
+            serialize_pdf_object_canonical(&PdfObject::Integer(42)),
+            b"42"
+        );

        // Real
        let real_bytes = serialize_pdf_object_canonical(&PdfObject::Real(3.14159));
        assert!(real_bytes.starts_with(b"3.14159"));

        // String
-        assert_eq!(serialize_pdf_object_canonical(&PdfObject::String(Box::new(vec![b'H', b'i']))), b"(Hi)");
+        assert_eq!(
+            serialize_pdf_object_canonical(&PdfObject::String(Box::new(vec![b'H', b'i']))),
+            b"(Hi)"
+        );

        // Escaped string
-        assert_eq!(serialize_pdf_object_canonical(&PdfObject::String(Box::new(vec![b'(', b')']))), b"(\\(\\))");
+        assert_eq!(
+            serialize_pdf_object_canonical(&PdfObject::String(Box::new(vec![b'(', b')']))),
+            b"(\\(\\))"
+        );

        // Name
-        assert_eq!(serialize_pdf_object_canonical(&PdfObject::Name(Arc::from("Type"))), b"/Type");
+        assert_eq!(
+            serialize_pdf_object_canonical(&PdfObject::Name(Arc::from("Type"))),
+            b"/Type"
+        );

        // Reference
        let ref_obj = PdfObject::Ref(ObjRef::new(42, 0));
@ -830,7 +852,10 @@ mod tests {
        let fp1 = compute_fingerprint(&input1, &resolver);
        let fp2 = compute_fingerprint(&input2, &resolver);

-        assert_ne!(fp1, fp2, "Different page counts should produce different fingerprints");
+        assert_ne!(
+            fp1, fp2,
+            "Different page counts should produce different fingerprints"
+        );
    }

    #[test]
@ -868,7 +893,10 @@ mod tests {
        let fp1 = compute_fingerprint(&input1, &resolver);
        let fp2 = compute_fingerprint(&input2, &resolver);

-        assert_ne!(fp1, fp2, "Different geometry should produce different fingerprints");
+        assert_ne!(
+            fp1, fp2,
+            "Different geometry should produce different fingerprints"
+        );
    }

    #[test]
@ -909,7 +937,10 @@ mod tests {
        let fp1 = compute_fingerprint(&input1, &resolver);
        let fp2 = compute_fingerprint(&input2, &resolver);

-        assert_ne!(fp1, fp2, "Different catalog flags should produce different fingerprints");
+        assert_ne!(
+            fp1, fp2,
+            "Different catalog flags should produce different fingerprints"
+        );
    }

    #[test]
@ -941,7 +972,11 @@ mod tests {
        let fingerprint = compute_fingerprint(&input, &resolver);

        let regex = Regex::new(r"^pdftract-v1:[0-9a-f]{64}$").unwrap();
-        assert!(regex.is_match(&fingerprint), "Fingerprint '{}' must match INV-13 format", fingerprint);
+        assert!(
+            regex.is_match(&fingerprint),
+            "Fingerprint '{}' must match INV-13 format",
+            fingerprint
+        );
    }

    #[test]
@ -955,20 +990,26 @@ mod tests {
            let resolver = XrefResolver::new();
            let input = FingerprintInput {
                page_count,
-                pages: (0..page_count).map(|_| PageFingerprintData {
-                    content_streams: vec![],
-                    resources: None,
-                    media_box: [0.0, 0.0, 612.0, 792.0],
-                    crop_box: None,
-                    rotate: 0,
-                }).collect(),
+                pages: (0..page_count)
+                    .map(|_| PageFingerprintData {
+                        content_streams: vec![],
+                        resources: None,
+                        media_box: [0.0, 0.0, 612.0, 792.0],
+                        crop_box: None,
+                        rotate: 0,
+                    })
+                    .collect(),
                struct_tree_root_ref: None,
                is_tagged: false,
                catalog_flags: CatalogFlags::default(),
            };

            let fingerprint = compute_fingerprint(&input, &resolver);
-            assert!(regex.is_match(&fingerprint), "Fingerprint '{}' must match INV-13 format", fingerprint);
+            assert!(
+                regex.is_match(&fingerprint),
+                "Fingerprint '{}' must match INV-13 format",
+                fingerprint
+            );
        }
    }

@ -1016,7 +1057,10 @@ mod tests {
        let hash1 = hash_resource_dict(Some(&resources1), &resolver);
        let hash2 = hash_resource_dict(Some(&resources2), &resolver);

-        assert_eq!(hash1, hash2, "Resource dict hash should be independent of insertion order");
+        assert_eq!(
+            hash1, hash2,
+            "Resource dict hash should be independent of insertion order"
+        );
    }

    #[test]
@ -1029,13 +1073,15 @@ mod tests {
        let resolver = XrefResolver::new();
        let input = FingerprintInput {
            page_count,
-            pages: (0..page_count).map(|_| PageFingerprintData {
-                content_streams: vec![],
-                resources: None,
-                media_box: [0.0, 0.0, 612.0, 792.0],
-                crop_box: None,
-                rotate: 0,
-            }).collect(),
+            pages: (0..page_count)
+                .map(|_| PageFingerprintData {
+                    content_streams: vec![],
+                    resources: None,
+                    media_box: [0.0, 0.0, 612.0, 792.0],
+                    crop_box: None,
+                    rotate: 0,
+                })
+                .collect(),
            struct_tree_root_ref: None,
            is_tagged: false,
            catalog_flags: CatalogFlags::default(),
@ -1046,6 +1092,10 @@ mod tests {
        let duration = start.elapsed();

        // Performance requirement: < 100 ms for 100-page PDF
-        assert!(duration.as_millis() < 100, "Fingerprint computation for 100-page PDF took {} ms, should be < 100 ms", duration.as_millis());
+        assert!(
+            duration.as_millis() < 100,
+            "Fingerprint computation for 100-page PDF took {} ms, should be < 100 ms",
+            duration.as_millis()
+        );
    }
 }
--- a/crates/pdftract-core/src/font/agl.rs
+++ b/crates/pdftract-core/src/font/agl.rs
@ -106,14 +106,18 @@ fn parse_algorithmic(name: &str) -> Option<char> {
    if let Some(rest) = name.strip_prefix("uni") {
        // uniXXXX - exactly 4 hex digits
        if rest.len() == 4 && rest.chars().all(|c| c.is_ascii_hexdigit()) {
-            return u32::from_str_radix(rest, 16).ok().and_then(|c| char::from_u32(c));
+            return u32::from_str_radix(rest, 16)
+                .ok()
+                .and_then(|c| char::from_u32(c));
        }
    }

    if let Some(rest) = name.strip_prefix('u') {
        // uXXXXXX - up to 6 hex digits
        if rest.len() <= 6 && rest.chars().all(|c| c.is_ascii_hexdigit()) {
-            return u32::from_str_radix(rest, 16).ok().and_then(|c| char::from_u32(c));
+            return u32::from_str_radix(rest, 16)
+                .ok()
+                .and_then(|c| char::from_u32(c));
        }
    }

--- a/crates/pdftract-core/src/font/cjk_encoding.rs
+++ b/crates/pdftract-core/src/font/cjk_encoding.rs
@ -275,7 +275,7 @@ mod tests {
    fn test_malformed_no_panic() {
        // Test various malformed inputs that should not panic
        let malformed_inputs: Vec<&[u8]> = vec![
-            &[0xFF], // Invalid lead byte in Shift-JIS
+            &[0xFF],       // Invalid lead byte in Shift-JIS
            &[0x80, 0x80], // Invalid sequence in GB18030
            &[0xFE, 0xFF], // Invalid in Big5
            &[0xFF, 0xFF], // Invalid in EUC-KR
--- a/crates/pdftract-core/src/font/cmap.rs
+++ b/crates/pdftract-core/src/font/cmap.rs
@ -19,7 +19,7 @@

 use std::collections::HashMap;

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::parser::lexer::Lexer;
 use crate::parser::lexer::Token;

@ -49,7 +49,9 @@ impl std::fmt::Display for CMapError {
            CMapError::UnexpectedToken(msg) => write!(f, "unexpected token: {}", msg),
            CMapError::InvalidHexString(msg) => write!(f, "invalid hex string: {}", msg),
            CMapError::InvalidRange => write!(f, "invalid range: lo > hi"),
-            CMapError::ArrayLengthMismatch => write!(f, "bfrange array length does not match range"),
+            CMapError::ArrayLengthMismatch => {
+                write!(f, "bfrange array length does not match range")
+            }
            CMapError::MissingKeyword(kw) => write!(f, "missing expected keyword: {}", kw),
            CMapError::EmptyCMap => write!(f, "CMap contains no mappings"),
        }
@ -686,7 +688,9 @@ mod tests {

        assert_eq!(map.len(), 1);
        assert!(!diags.is_empty());
-        assert!(diags.iter().any(|d| d.message.as_ref().contains("odd number of bytes")));
+        assert!(diags
+            .iter()
+            .any(|d| d.message.as_ref().contains("odd number of bytes")));
    }

    #[test]
--- a/crates/pdftract-core/src/font/embedded.rs
+++ b/crates/pdftract-core/src/font/embedded.rs
@ -6,7 +6,7 @@

 use std::sync::Arc;

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::font::FontKind;
 use crate::parser::object::types::{PdfDict, PdfObject};
 use crate::parser::stream::{decode_stream, ExtractionOptions};
@ -132,9 +132,7 @@ impl OpenTypeMetrics {
            .cmap
            .map(|cmap| {
                // Try to find a valid Unicode subtable
-                cmap.subtables
-                    .into_iter()
-                    .any(|st| st.is_unicode())
+                cmap.subtables.into_iter().any(|st| st.is_unicode())
            })
            .unwrap_or(false);

@ -159,9 +157,7 @@ impl FontMetrics for OpenTypeMetrics {

        let face_ref = self.face.as_face_ref();
        // Use Face's built-in glyph_index which handles cmap lookup
-        face_ref
-            .glyph_index(ch)
-            .map(|id| id.0)
+        face_ref.glyph_index(ch).map(|id| id.0)
    }

    fn advance(&self, glyph_id: u16) -> Option<u16> {
@ -214,12 +210,11 @@ impl Type1Metrics {
    pub fn from_descriptor(descriptor: &PdfDict, font_dict: &PdfDict) -> FontResult<Self> {
        // Extract /Widths array from font dict
        let widths = match font_dict.get("/Widths") {
-            Some(PdfObject::Array(arr)) => {
-                arr.iter()
-                    .filter_map(|obj| obj.as_int())
-                    .map(|i| i as u16)
-                    .collect()
-            }
+            Some(PdfObject::Array(arr)) => arr
+                .iter()
+                .filter_map(|obj| obj.as_int())
+                .map(|i| i as u16)
+                .collect(),
            _ => return Err(FontError::InvalidFontData("missing /Widths array".into())),
        };

@ -445,18 +440,16 @@ impl EmbeddedFont {
                    }
                }
            }
-            FontKind::Type1 => {
-                match Type1Metrics::from_descriptor(descriptor, font_dict) {
-                    Ok(t1_metrics) => Arc::new(t1_metrics),
-                    Err(e) => {
-                        diagnostics.push(Diagnostic::with_dynamic_no_offset(
-                            DiagCode::FontParseFailed,
-                            format!("Type1 font load failed: {}", e),
-                        ));
-                        Arc::new(Type1Metrics::empty())
-                    }
+            FontKind::Type1 => match Type1Metrics::from_descriptor(descriptor, font_dict) {
+                Ok(t1_metrics) => Arc::new(t1_metrics),
+                Err(e) => {
+                    diagnostics.push(Diagnostic::with_dynamic_no_offset(
+                        DiagCode::FontParseFailed,
+                        format!("Type1 font load failed: {}", e),
+                    ));
+                    Arc::new(Type1Metrics::empty())
                }
-            }
+            },
            _ => Arc::new(EmptyFontMetrics),
        };

@ -543,12 +536,15 @@ mod tests {
    fn test_type1_metrics_from_descriptor() {
        // Create a FontDescriptor-like dict
        let mut descriptor = PdfDict::new();
-        descriptor.insert(intern("/FontBBox"), PdfObject::Array(Box::new(vec![
-            PdfObject::Integer(-100),
-            PdfObject::Integer(-200),
-            PdfObject::Integer(1000),
-            PdfObject::Integer(900),
-        ])));
+        descriptor.insert(
+            intern("/FontBBox"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Integer(-100),
+                PdfObject::Integer(-200),
+                PdfObject::Integer(1000),
+                PdfObject::Integer(900),
+            ])),
+        );

        // Create a font dict with /Widths
        let mut font_dict = PdfDict::new();
@ -560,7 +556,10 @@ mod tests {
                PdfObject::Integer(700),
            ])),
        );
-        font_dict.insert(intern("/Encoding"), PdfObject::Name(intern("/WinAnsiEncoding")));
+        font_dict.insert(
+            intern("/Encoding"),
+            PdfObject::Name(intern("/WinAnsiEncoding")),
+        );

        let metrics = Type1Metrics::from_descriptor(&descriptor, &font_dict).unwrap();

@ -625,12 +624,15 @@ mod tests {
    fn test_embedded_font_load_from_dict() {
        // Create a minimal font dict with FontDescriptor
        let mut descriptor = PdfDict::new();
-        descriptor.insert(intern("/FontBBox"), PdfObject::Array(Box::new(vec![
-            PdfObject::Integer(-100),
-            PdfObject::Integer(-200),
-            PdfObject::Integer(1000),
-            PdfObject::Integer(900),
-        ])));
+        descriptor.insert(
+            intern("/FontBBox"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Integer(-100),
+                PdfObject::Integer(-200),
+                PdfObject::Integer(1000),
+                PdfObject::Integer(900),
+            ])),
+        );

        // For this test, we'll use a Type1-style descriptor without a stream
        // to test the fallback path
@ -679,7 +681,7 @@ mod tests {
        // Uncommon characters might not be in the base font
        // (This depends on the specific fixture)
        let result = metrics.glyph_id_for('\u{1F600}'); // Emoji
-        // May or may not be present, but shouldn't panic
+                                                        // May or may not be present, but shouldn't panic
        let _ = result;
    }

@ -700,16 +702,32 @@ mod tests {
        // Test common Latin characters
        for ch in "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".chars() {
            let gid = metrics.glyph_id_for(ch);
-            assert!(gid.is_some(), "Character '{}' should be mapped in Latin font", ch);
+            assert!(
+                gid.is_some(),
+                "Character '{}' should be mapped in Latin font",
+                ch
+            );

            // Verify advance width exists for mapped glyphs
            let advance = metrics.advance(gid.unwrap());
-            assert!(advance.is_some(), "Advance should exist for glyph ID {}", gid.unwrap());
-            assert!(advance.unwrap() > 0, "Advance should be positive for glyph ID {}", gid.unwrap());
+            assert!(
+                advance.is_some(),
+                "Advance should exist for glyph ID {}",
+                gid.unwrap()
+            );
+            assert!(
+                advance.unwrap() > 0,
+                "Advance should be positive for glyph ID {}",
+                gid.unwrap()
+            );

            // Verify bbox exists
            let bbox = metrics.bbox(gid.unwrap());
-            assert!(bbox.is_some(), "Bbox should exist for glyph ID {}", gid.unwrap());
+            assert!(
+                bbox.is_some(),
+                "Bbox should exist for glyph ID {}",
+                gid.unwrap()
+            );
        }
    }

@ -733,7 +751,10 @@ mod tests {
        // Verify that advance widths are in font units (less than UPEM for typical glyphs)
        let gid_a = metrics.glyph_id_for('A').unwrap();
        let advance_a = metrics.advance(gid_a).unwrap();
-        assert!(advance_a <= upem, "Advance should be in font units (≤ UPEM)");
+        assert!(
+            advance_a <= upem,
+            "Advance should be in font units (≤ UPEM)"
+        );
    }

    #[test]
@ -750,7 +771,10 @@ mod tests {
        // The error should be InvalidFontData
        match result {
            Err(FontError::InvalidFontData(msg)) => {
-                assert!(msg.contains("ttf-parser error"), "Error should mention ttf-parser");
+                assert!(
+                    msg.contains("ttf-parser error"),
+                    "Error should mention ttf-parser"
+                );
            }
            _ => panic!("Expected InvalidFontData error"),
        }
@ -782,12 +806,15 @@ mod tests {
        // Acceptance criteria: Type1 font program: gracefully wrap with limited
        // capability; do not crash on missing CharStrings parser.
        let mut descriptor = PdfDict::new();
-        descriptor.insert(intern("/FontBBox"), PdfObject::Array(Box::new(vec![
-            PdfObject::Integer(-100),
-            PdfObject::Integer(-200),
-            PdfObject::Integer(1000),
-            PdfObject::Integer(900),
-        ])));
+        descriptor.insert(
+            intern("/FontBBox"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Integer(-100),
+                PdfObject::Integer(-200),
+                PdfObject::Integer(1000),
+                PdfObject::Integer(900),
+            ])),
+        );

        let mut font_dict = PdfDict::new();
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type1")));
@ -832,19 +859,25 @@ mod tests {
        let metrics = OpenTypeMetrics::from_data(font_data, 0).unwrap();

        // DejaVuSans has a Unicode cmap
-        assert!(metrics.has_valid_cmap(), "DejaVuSans should have valid Unicode cmap");
+        assert!(
+            metrics.has_valid_cmap(),
+            "DejaVuSans should have valid Unicode cmap"
+        );
    }

    #[test]
    fn test_embedded_font_returns_diagnostics() {
        // Verify that EmbeddedFont collects and returns diagnostics
        let mut descriptor = PdfDict::new();
-        descriptor.insert(intern("/FontBBox"), PdfObject::Array(Box::new(vec![
-            PdfObject::Integer(0),
-            PdfObject::Integer(0),
-            PdfObject::Integer(1000),
-            PdfObject::Integer(1000),
-        ])));
+        descriptor.insert(
+            intern("/FontBBox"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Integer(0),
+                PdfObject::Integer(0),
+                PdfObject::Integer(1000),
+                PdfObject::Integer(1000),
+            ])),
+        );

        let mut font_dict = PdfDict::new();
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type1")));
--- a/crates/pdftract-core/src/font/encoding.rs
+++ b/crates/pdftract-core/src/font/encoding.rs
@ -14,7 +14,7 @@
 use std::sync::Arc;

 use crate::diagnostics::{DiagCode, Diagnostic};
-use crate::parser::object::types::{PdfObject, PdfDict};
+use crate::parser::object::types::{PdfDict, PdfObject};

 include!(concat!(env!("OUT_DIR"), "/named_encodings.rs"));

@ -135,7 +135,9 @@ pub struct DifferencesOverlay {
 impl DifferencesOverlay {
    /// Create an empty overlay.
    pub fn new() -> Self {
-        Self { entries: Vec::new() }
+        Self {
+            entries: Vec::new(),
+        }
    }

    /// Parse a /Differences array into an overlay.
@ -344,7 +346,8 @@ impl FontEncoding {
        }

        // Fall back to base encoding
-        self.base.and_then(|enc| enc.glyph_name(code).map(|s| Arc::from(s)))
+        self.base
+            .and_then(|enc| enc.glyph_name(code).map(|s| Arc::from(s)))
    }

    /// Check if this encoding has a differences overlay.
@ -388,15 +391,36 @@ mod tests {

    #[test]
    fn test_from_name() {
-        assert_eq!(NamedEncoding::from_name("WinAnsiEncoding"), Some(NamedEncoding::WinAnsi));
-        assert_eq!(NamedEncoding::from_name("MacRomanEncoding"), Some(NamedEncoding::MacRoman));
-        assert_eq!(NamedEncoding::from_name("MacExpertEncoding"), Some(NamedEncoding::MacExpert));
-        assert_eq!(NamedEncoding::from_name("StandardEncoding"), Some(NamedEncoding::Standard));
-        assert_eq!(NamedEncoding::from_name("SymbolEncoding"), Some(NamedEncoding::Symbol));
-        assert_eq!(NamedEncoding::from_name("ZapfDingbatsEncoding"), Some(NamedEncoding::ZapfDingbats));
+        assert_eq!(
+            NamedEncoding::from_name("WinAnsiEncoding"),
+            Some(NamedEncoding::WinAnsi)
+        );
+        assert_eq!(
+            NamedEncoding::from_name("MacRomanEncoding"),
+            Some(NamedEncoding::MacRoman)
+        );
+        assert_eq!(
+            NamedEncoding::from_name("MacExpertEncoding"),
+            Some(NamedEncoding::MacExpert)
+        );
+        assert_eq!(
+            NamedEncoding::from_name("StandardEncoding"),
+            Some(NamedEncoding::Standard)
+        );
+        assert_eq!(
+            NamedEncoding::from_name("SymbolEncoding"),
+            Some(NamedEncoding::Symbol)
+        );
+        assert_eq!(
+            NamedEncoding::from_name("ZapfDingbatsEncoding"),
+            Some(NamedEncoding::ZapfDingbats)
+        );

        // Test with leading slash
-        assert_eq!(NamedEncoding::from_name("/WinAnsiEncoding"), Some(NamedEncoding::WinAnsi));
+        assert_eq!(
+            NamedEncoding::from_name("/WinAnsiEncoding"),
+            Some(NamedEncoding::WinAnsi)
+        );

        // Test unknown encoding
        assert_eq!(NamedEncoding::from_name("UnknownEncoding"), None);
@ -513,7 +537,10 @@ mod tests {

        assert_eq!(overlay.get(255), Some(Arc::from("a")));
        assert_eq!(diagnostics.len(), 1);
-        assert_eq!(diagnostics[0].code, DiagCode::FontEncodingDifferenceOutOfRange);
+        assert_eq!(
+            diagnostics[0].code,
+            DiagCode::FontEncodingDifferenceOutOfRange
+        );
    }

    #[test]
@ -529,7 +556,10 @@ mod tests {

        assert_eq!(overlay.get(0), Some(Arc::from("a")));
        assert_eq!(diagnostics.len(), 1);
-        assert_eq!(diagnostics[0].code, DiagCode::FontEncodingDifferenceOutOfRange);
+        assert_eq!(
+            diagnostics[0].code,
+            DiagCode::FontEncodingDifferenceOutOfRange
+        );
    }

    #[test]
@ -602,7 +632,9 @@ mod tests {
    fn test_font_encoding_unknown_glyph_name() {
        // Differences can contain arbitrary glyph names not in AGL
        let mut differences = DifferencesOverlay::new();
-        differences.entries.push((0x20, Arc::from("ArbitraryCustomGlyph")));
+        differences
+            .entries
+            .push((0x20, Arc::from("ArbitraryCustomGlyph")));

        let enc = FontEncoding {
            base: None,
@ -610,7 +642,10 @@ mod tests {
        };

        // Should return the custom name, not None
-        assert_eq!(enc.glyph_name_for(0x20), Some(Arc::from("ArbitraryCustomGlyph")));
+        assert_eq!(
+            enc.glyph_name_for(0x20),
+            Some(Arc::from("ArbitraryCustomGlyph"))
+        );
    }

    #[test]
--- a/crates/pdftract-core/src/font/fingerprint.rs
+++ b/crates/pdftract-core/src/font/fingerprint.rs
@ -56,9 +56,7 @@ impl FontFingerprint {
        let mut hasher = Sha256::new();
        hasher.update(font_program_bytes);
        let hash = hasher.finalize();
-        Self {
-            hash: hash.into(),
-        }
+        Self { hash: hash.into() }
    }

    /// Get the underlying hash bytes.
@ -90,10 +88,7 @@ impl FontFingerprint {
 ///
 /// The hash is computed on the first call and cached in an Arc for subsequent
 /// calls. Do NOT call this function repeatedly for the same font without caching.
-pub fn lookup_font_fingerprint(
-    font_program_bytes: &[u8],
-    gid: u16,
-) -> Option<char> {
+pub fn lookup_font_fingerprint(font_program_bytes: &[u8], gid: u16) -> Option<char> {
    // Compute the fingerprint
    let fingerprint = FontFingerprint::compute(font_program_bytes);

@ -101,7 +96,8 @@ pub fn lookup_font_fingerprint(
    let entries = FONT_FINGERPRINTS.get(fingerprint.as_bytes())?;

    // Find the glyph ID in the entries
-    let codepoint = entries.iter()
+    let codepoint = entries
+        .iter()
        .find(|(entry_gid, _)| *entry_gid == gid)
        .map(|(_, cp)| *cp)?;

@ -146,7 +142,8 @@ impl CachedFingerprint {
        }

        let entries = FONT_FINGERPRINTS.get(self.fingerprint.as_bytes())?;
-        let codepoint = entries.iter()
+        let codepoint = entries
+            .iter()
            .find(|(entry_gid, _)| *entry_gid == gid)
            .map(|(_, cp)| *cp)?;

@ -216,7 +213,10 @@ mod tests {
        let cached1 = CachedFingerprint::from_font_program(data);
        let cached2 = CachedFingerprint::from_font_program(data);

-        assert_eq!(cached1.fingerprint().as_bytes(), cached2.fingerprint().as_bytes());
+        assert_eq!(
+            cached1.fingerprint().as_bytes(),
+            cached2.fingerprint().as_bytes()
+        );
        assert_eq!(cached1.is_known(), cached2.is_known());
    }

--- a/crates/pdftract-core/src/font/predefined_cmap.rs
+++ b/crates/pdftract-core/src/font/predefined_cmap.rs
@ -40,7 +40,11 @@ pub enum CharacterCollection {

 impl PredefinedCMap {
    /// Create a new predefined CMap.
-    const fn new(name: &'static str, is_vertical: bool, collection: Option<CharacterCollection>) -> Self {
+    const fn new(
+        name: &'static str,
+        is_vertical: bool,
+        collection: Option<CharacterCollection>,
+    ) -> Self {
        Self {
            name,
            is_vertical,
@ -172,20 +176,52 @@ pub fn from_name(name: &str) -> Option<PredefinedCMap> {
        "Identity-V" => Some(PredefinedCMap::new("Identity-V", true, None)),

        // Adobe-Japan1 (Japanese)
-        "UniJIS-UTF16-H" => Some(PredefinedCMap::new("UniJIS-UTF16-H", false, Some(CharacterCollection::Japan1))),
-        "UniJIS-UTF16-V" => Some(PredefinedCMap::new("UniJIS-UTF16-V", true, Some(CharacterCollection::Japan1))),
+        "UniJIS-UTF16-H" => Some(PredefinedCMap::new(
+            "UniJIS-UTF16-H",
+            false,
+            Some(CharacterCollection::Japan1),
+        )),
+        "UniJIS-UTF16-V" => Some(PredefinedCMap::new(
+            "UniJIS-UTF16-V",
+            true,
+            Some(CharacterCollection::Japan1),
+        )),

        // Adobe-GB1 (Simplified Chinese)
-        "UniGB-UTF16-H" => Some(PredefinedCMap::new("UniGB-UTF16-H", false, Some(CharacterCollection::GB1))),
-        "UniGB-UTF16-V" => Some(PredefinedCMap::new("UniGB-UTF16-V", true, Some(CharacterCollection::GB1))),
+        "UniGB-UTF16-H" => Some(PredefinedCMap::new(
+            "UniGB-UTF16-H",
+            false,
+            Some(CharacterCollection::GB1),
+        )),
+        "UniGB-UTF16-V" => Some(PredefinedCMap::new(
+            "UniGB-UTF16-V",
+            true,
+            Some(CharacterCollection::GB1),
+        )),

        // Adobe-CNS1 (Traditional Chinese)
-        "UniCNS-UTF16-H" => Some(PredefinedCMap::new("UniCNS-UTF16-H", false, Some(CharacterCollection::CNS1))),
-        "UniCNS-UTF16-V" => Some(PredefinedCMap::new("UniCNS-UTF16-V", true, Some(CharacterCollection::CNS1))),
+        "UniCNS-UTF16-H" => Some(PredefinedCMap::new(
+            "UniCNS-UTF16-H",
+            false,
+            Some(CharacterCollection::CNS1),
+        )),
+        "UniCNS-UTF16-V" => Some(PredefinedCMap::new(
+            "UniCNS-UTF16-V",
+            true,
+            Some(CharacterCollection::CNS1),
+        )),

        // Adobe-Korea1 (Korean)
-        "UniKS-UTF16-H" => Some(PredefinedCMap::new("UniKS-UTF16-H", false, Some(CharacterCollection::Korea1))),
-        "UniKS-UTF16-V" => Some(PredefinedCMap::new("UniKS-UTF16-V", true, Some(CharacterCollection::Korea1))),
+        "UniKS-UTF16-H" => Some(PredefinedCMap::new(
+            "UniKS-UTF16-H",
+            false,
+            Some(CharacterCollection::Korea1),
+        )),
+        "UniKS-UTF16-V" => Some(PredefinedCMap::new(
+            "UniKS-UTF16-V",
+            true,
+            Some(CharacterCollection::Korea1),
+        )),

        _ => None,
    }
@ -318,11 +354,16 @@ mod tests {
    fn test_all_predefined_names() {
        // Verify all 10 predefined CMap names resolve
        let names = [
-            "Identity-H", "Identity-V",
-            "UniJIS-UTF16-H", "UniJIS-UTF16-V",
-            "UniGB-UTF16-H", "UniGB-UTF16-V",
-            "UniCNS-UTF16-H", "UniCNS-UTF16-V",
-            "UniKS-UTF16-H", "UniKS-UTF16-V",
+            "Identity-H",
+            "Identity-V",
+            "UniJIS-UTF16-H",
+            "UniJIS-UTF16-V",
+            "UniGB-UTF16-H",
+            "UniGB-UTF16-V",
+            "UniCNS-UTF16-H",
+            "UniCNS-UTF16-V",
+            "UniKS-UTF16-H",
+            "UniKS-UTF16-V",
        ];

        for name in names {
--- a/crates/pdftract-core/src/font/type0.rs
+++ b/crates/pdftract-core/src/font/type0.rs
@ -7,7 +7,7 @@
 use std::collections::BTreeMap;
 use std::sync::Arc;

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::font::embedded::{EmbeddedFont, OpenTypeMetrics};
 use crate::font::FontKind;
 use crate::parser::object::types::{PdfDict, PdfObject};
@ -230,7 +230,13 @@ impl Type0Font {

        // Load CIDToGIDMap for CIDFontType2
        let cid_to_gid_map = if subtype == FontKind::CIDFontType2 {
-            Some(Self::load_cid_to_gid_map(cidfont_dict, source, opts, doc_counter, &mut diagnostics)?)
+            Some(Self::load_cid_to_gid_map(
+                cidfont_dict,
+                source,
+                opts,
+                doc_counter,
+                &mut diagnostics,
+            )?)
        } else {
            None
        };
@ -432,8 +438,12 @@ impl Type0Font {
                font_dict.insert(
                    crate::parser::object::types::intern("/Subtype"),
                    match subtype {
-                        FontKind::CIDFontType0 => PdfObject::Name(crate::parser::object::types::intern("/CIDFontType0")),
-                        FontKind::CIDFontType2 => PdfObject::Name(crate::parser::object::types::intern("/CIDFontType2")),
+                        FontKind::CIDFontType0 => {
+                            PdfObject::Name(crate::parser::object::types::intern("/CIDFontType0"))
+                        }
+                        FontKind::CIDFontType2 => {
+                            PdfObject::Name(crate::parser::object::types::intern("/CIDFontType2"))
+                        }
                        _ => return Err(Type0Error::UnsupportedSubtype(format!("{:?}", subtype))),
                    },
                );
@ -716,9 +726,7 @@ mod tests {
        font_dict.insert(intern("/BaseFont"), PdfObject::Name(intern("Type0Font")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -745,9 +753,7 @@ mod tests {
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type0")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -781,9 +787,7 @@ mod tests {
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type0")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -809,9 +813,7 @@ mod tests {
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type0")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -880,9 +882,7 @@ mod tests {
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type0")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -917,9 +917,7 @@ mod tests {
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type0")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -947,9 +945,7 @@ mod tests {
        font_dict.insert(intern("/Subtype"), PdfObject::Name(intern("/Type0")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let source = MemorySource::new(vec![]);
@ -996,9 +992,7 @@ mod tests {
        font_dict.insert(intern("/BaseFont"), PdfObject::Name(intern("Type0Font")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let opts = ExtractionOptions::default();
@ -1057,9 +1051,7 @@ mod tests {
        font_dict.insert(intern("/BaseFont"), PdfObject::Name(intern("Type0Font")));
        font_dict.insert(
            intern("/DescendantFonts"),
-            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(
-                cidfont_dict,
-            ))])),
+            PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(cidfont_dict))])),
        );

        let opts = ExtractionOptions::default();
@ -1073,7 +1065,9 @@ mod tests {

        // Check that the CIDTOGIDMAP_TRUNCATED diagnostic was emitted
        let diagnostics = font.diagnostics();
-        assert!(diagnostics.iter().any(|d| d.code == DiagCode::FontCidtogidmapTruncated));
+        assert!(diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::FontCidtogidmapTruncated));

        // Verify the array has 2 elements (5 bytes / 2 = 2 GIDs, trailing byte discarded)
        if let Some(CIDToGIDMap::Array(arr)) = &font.descendant.cid_to_gid_map {
--- a/crates/pdftract-core/src/graphics_state.rs
+++ b/crates/pdftract-core/src/graphics_state.rs
@ -14,7 +14,7 @@
 //!   x' = a*x + c*y + e
 //!   y' = b*x + d*y + f

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};

 /// Maximum depth of graphics state stack (prevents stack overflow).
 const MAX_GSTATE_DEPTH: usize = 32;
@ -73,8 +73,12 @@ impl Matrix3x3 {
    /// Check if this is the identity matrix.
    #[inline]
    pub fn is_identity(&self) -> bool {
-        self.a == 1.0 && self.b == 0.0 && self.c == 0.0 &&
-        self.d == 1.0 && self.e == 0.0 && self.f == 0.0
+        self.a == 1.0
+            && self.b == 0.0
+            && self.c == 0.0
+            && self.d == 1.0
+            && self.e == 0.0
+            && self.f == 0.0
    }

    /// Multiply this matrix by another (this * other).
--- a/crates/pdftract-core/src/hybrid.rs
+++ b/crates/pdftract-core/src/hybrid.rs
@ -22,7 +22,7 @@
 //!
 //! IoU = area(A ∩ B) / area(A ∪ B)

-use crate::classify::{CellIndex, PageClassification, PageClass};
+use crate::classify::{CellIndex, PageClass, PageClassification};
 use image::{GrayImage, ImageBuffer, Luma};
 use std::collections::BTreeSet;

@ -42,13 +42,15 @@ pub struct Span {
    pub text: String,
 }

-/// Source of a span - either vector extraction or OCR.
+/// Source of a span - either vector extraction, OCR, or assisted OCR.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub enum SpanSource {
    /// Text extracted from content stream (Phase 3).
    Vector,
    /// Text extracted via OCR (Phase 5).
    Ocr,
+    /// Text extracted via assisted OCR with position validation (Phase 5.5).
+    OcrAssisted,
 }

 impl Span {
@ -72,6 +74,11 @@ impl Span {
        Self::new(bbox, confidence, SpanSource::Ocr, text)
    }

+    /// Create a span with assisted OCR source (position-validated).
+    pub fn ocr_assisted(bbox: [f64; 4], confidence: f32, text: String) -> Self {
+        Self::new(bbox, confidence, SpanSource::OcrAssisted, text)
+    }
+
    /// Get the width of the span's bbox.
    #[inline]
    pub fn width(&self) -> f64 {
@ -191,11 +198,15 @@ pub fn merge_vector_and_ocr_spans(vector_spans: &[Span], ocr_spans: &[Span]) ->

        // Primary sort: Y (top to bottom = descending Y in PDF coordinates)
        // Note: In PDF coordinates, Y=0 is at the bottom, so higher Y means higher on page
-        b_center_y.partial_cmp(&a_center_y).unwrap_or(std::cmp::Ordering::Equal)
+        b_center_y
+            .partial_cmp(&a_center_y)
+            .unwrap_or(std::cmp::Ordering::Equal)
            .then_with(|| {
                let a_center_x = (a.bbox[0] + a.bbox[2]) / 2.0;
                let b_center_x = (b.bbox[0] + b.bbox[2]) / 2.0;
-                a_center_x.partial_cmp(&b_center_x).unwrap_or(std::cmp::Ordering::Equal)
+                a_center_x
+                    .partial_cmp(&b_center_x)
+                    .unwrap_or(std::cmp::Ordering::Equal)
            })
    });

@ -279,11 +290,10 @@ pub fn get_hybrid_cells(classification: &PageClassification) -> Vec<CellIndex> {
    }

    match &classification.hybrid_cells {
-        Some(cells) => {
-            cells.iter()
-                .map(|&flat| CellIndex::from_flat(flat))
-                .collect()
-        }
+        Some(cells) => cells
+            .iter()
+            .map(|&flat| CellIndex::from_flat(flat))
+            .collect(),
        None => Vec::new(),
    }
 }
@ -323,7 +333,8 @@ pub fn compute_cell_crops(
    let cell_width = page_width / 8.0;
    let cell_height = page_height / 8.0;

-    cells.iter()
+    cells
+        .iter()
        .map(|cell| {
            // Cell coordinates in PDF space
            // col 0 = left, row 0 = top
@ -357,7 +368,12 @@ pub trait OcrCallback: Send + Sync {
    /// # Returns
    ///
    /// A vector of OCR spans found in this cell, or an error if OCR fails.
-    fn ocr_cell(&self, cell_image: &GrayImage, cell: CellIndex, dpi: u32) -> Result<Vec<Span>, String>;
+    fn ocr_cell(
+        &self,
+        cell_image: &GrayImage,
+        cell: CellIndex,
+        dpi: u32,
+    ) -> Result<Vec<Span>, String>;
 }

 /// Mock OCR callback for testing that tracks call counts.
@ -369,8 +385,14 @@ struct MockOcrCallback {

 #[cfg(test)]
 impl OcrCallback for MockOcrCallback {
-    fn ocr_cell(&self, _cell_image: &GrayImage, _cell: CellIndex, _dpi: u32) -> Result<Vec<Span>, String> {
-        self.call_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
+    fn ocr_cell(
+        &self,
+        _cell_image: &GrayImage,
+        _cell: CellIndex,
+        _dpi: u32,
+    ) -> Result<Vec<Span>, String> {
+        self.call_count
+            .fetch_add(1, std::sync::atomic::Ordering::SeqCst);
        Ok(self.output_spans.clone())
    }
 }
@ -441,13 +463,7 @@ pub fn process_hybrid_page(
    // For each hybrid cell: crop and run OCR
    for cell in hybrid_cells {
        // Crop the cell from the rendered page
-        let cell_image = crop_cell_from_page(
-            page_image,
-            page_width_pt,
-            page_height_pt,
-            cell,
-            dpi,
-        );
+        let cell_image = crop_cell_from_page(page_image, page_width_pt, page_height_pt, cell, dpi);

        // Run OCR on this cell
        match ocr_callback.ocr_cell(&cell_image, cell, dpi) {
@ -510,7 +526,12 @@ mod tests {

    #[test]
    fn test_span_new() {
-        let span = Span::new([10.0, 20.0, 50.0, 40.0], 0.9, SpanSource::Vector, "test".to_string());
+        let span = Span::new(
+            [10.0, 20.0, 50.0, 40.0],
+            0.9,
+            SpanSource::Vector,
+            "test".to_string(),
+        );
        assert_eq!(span.bbox, [10.0, 20.0, 50.0, 40.0]);
        assert_eq!(span.confidence, 0.9);
        assert_eq!(span.source, SpanSource::Vector);
@ -541,12 +562,12 @@ mod tests {

    #[test]
    fn test_merge_no_overlap() {
-        let vector = vec![
-            Span::vector([0.0, 0.0, 10.0, 10.0], 0.9, "vector".to_string()),
-        ];
-        let ocr = vec![
-            Span::ocr([20.0, 20.0, 30.0, 30.0], 0.8, "ocr".to_string()),
-        ];
+        let vector = vec![Span::vector(
+            [0.0, 0.0, 10.0, 10.0],
+            0.9,
+            "vector".to_string(),
+        )];
+        let ocr = vec![Span::ocr([20.0, 20.0, 30.0, 30.0], 0.8, "ocr".to_string())];

        let result = merge_vector_and_ocr_spans(&vector, &ocr);
        assert_eq!(result.len(), 2);
@ -555,9 +576,11 @@ mod tests {
    #[test]
    fn test_merge_iou_06_vector_kept() {
        // IoU = 0.6 > 0.5, vector confidence >= 0.5 -> vector kept, OCR dropped
-        let vector = vec![
-            Span::vector([0.0, 0.0, 100.0, 100.0], 0.9, "vector text".to_string()),
-        ];
+        let vector = vec![Span::vector(
+            [0.0, 0.0, 100.0, 100.0],
+            0.9,
+            "vector text".to_string(),
+        )];
        let ocr = vec![
            // OCR overlaps by 60%: intersection 60x100, union (10000 + 10000 - 6000) = 14000
            // bbox [40, 0, 100, 100] overlaps [0, 0, 100, 100] by 60x100
@ -573,9 +596,11 @@ mod tests {
    #[test]
    fn test_merge_iou_03_both_kept() {
        // IoU = 0.3 < 0.5 -> both kept
-        let vector = vec![
-            Span::vector([0.0, 0.0, 100.0, 100.0], 0.9, "vector".to_string()),
-        ];
+        let vector = vec![Span::vector(
+            [0.0, 0.0, 100.0, 100.0],
+            0.9,
+            "vector".to_string(),
+        )];
        let ocr = vec![
            // OCR overlaps by 30%: [70, 0, 100, 100] overlaps [0, 0, 100, 100] by 30x100
            Span::ocr([70.0, 0.0, 100.0, 100.0], 0.7, "ocr".to_string()),
@ -591,16 +616,20 @@ mod tests {
    #[test]
    fn test_merge_iou_06_low_vector_confidence_ocr_kept() {
        // IoU = 0.6 > 0.5, but vector confidence < 0.5 -> OCR kept
-        let vector = vec![
-            Span::vector([0.0, 0.0, 100.0, 100.0], 0.2, "bad vector".to_string()),
-        ];
-        let ocr = vec![
-            Span::ocr([40.0, 0.0, 100.0, 100.0], 0.7, "ocr text".to_string()),
-        ];
+        let vector = vec![Span::vector(
+            [0.0, 0.0, 100.0, 100.0],
+            0.2,
+            "bad vector".to_string(),
+        )];
+        let ocr = vec![Span::ocr(
+            [40.0, 0.0, 100.0, 100.0],
+            0.7,
+            "ocr text".to_string(),
+        )];

        let result = merge_vector_and_ocr_spans(&vector, &ocr);
        assert_eq!(result.len(), 2); // Both kept because vector confidence is low
-        // Verify both are present
+                                     // Verify both are present
        assert!(result.iter().any(|s| s.source == SpanSource::Vector));
        assert!(result.iter().any(|s| s.source == SpanSource::Ocr));
    }
@ -621,10 +650,7 @@ mod tests {

    #[test]
    fn test_get_hybrid_cells_non_hybrid() {
-        let classification = PageClassification::new(
-            crate::classify::PageClass::Vector,
-            0.9,
-        );
+        let classification = PageClassification::new(crate::classify::PageClass::Vector, 0.9);
        assert!(get_hybrid_cells(&classification).is_empty());
    }

@ -648,7 +674,7 @@ mod tests {
    #[test]
    fn test_compute_cell_crops() {
        let mut cells = BTreeSet::new();
-        cells.insert(0);  // row 0, col 0 (top-left)
+        cells.insert(0); // row 0, col 0 (top-left)
        cells.insert(63); // row 7, col 7 (bottom-right)

        let classification = PageClassification::hybrid(0.75, cells);
@ -691,7 +717,7 @@ mod tests {

        // Cell should be 1/8 of page dimensions
        assert_eq!(cell.width(), 100); // 800 / 8
-        assert_eq!(cell.height(), 75);  // 600 / 8
+        assert_eq!(cell.height(), 75); // 600 / 8
    }

    #[test]
@ -712,9 +738,11 @@ mod tests {

    #[test]
    fn test_merge_multiple_ocr_spans() {
-        let vector = vec![
-            Span::vector([0.0, 0.0, 100.0, 100.0], 0.9, "vector".to_string()),
-        ];
+        let vector = vec![Span::vector(
+            [0.0, 0.0, 100.0, 100.0],
+            0.9,
+            "vector".to_string(),
+        )];
        let ocr = vec![
            Span::ocr([200.0, 0.0, 300.0, 100.0], 0.8, "ocr1".to_string()),
            Span::ocr([400.0, 0.0, 500.0, 100.0], 0.8, "ocr2".to_string()),
@ -756,7 +784,11 @@ mod tests {
        // Create mock OCR callback that tracks call count
        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
        let mock_spans = vec![
-            Span::ocr([50.0, 100.0, 200.0, 120.0], 0.8, "Scanned Text 1".to_string()),
+            Span::ocr(
+                [50.0, 100.0, 200.0, 120.0],
+                0.8,
+                "Scanned Text 1".to_string(),
+            ),
            Span::ocr([50.0, 50.0, 200.0, 70.0], 0.8, "Scanned Text 2".to_string()),
        ];
        let mock_ocr = MockOcrCallback {
@ -780,8 +812,11 @@ mod tests {

        // Verify OCR was called exactly 48 times (6 rows * 8 cols)
        // NOT 64 times (full page)
-        assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 48,
-            "OCR should run only on scanned cells (48), not entire page (64)");
+        assert_eq!(
+            call_count.load(std::sync::atomic::Ordering::SeqCst),
+            48,
+            "OCR should run only on scanned cells (48), not entire page (64)"
+        );

        // Verify result contains both vector and OCR spans
        assert!(result.iter().any(|s| s.source == SpanSource::Vector));
@ -806,9 +841,11 @@ mod tests {
        let classification = PageClassification::hybrid(0.75, cells);

        // Create vector spans that overlap with OCR region
-        let vector_spans = vec![
-            Span::vector([50.0, 50.0, 150.0, 70.0], 0.9, "Vector Text".to_string()),
-        ];
+        let vector_spans = vec![Span::vector(
+            [50.0, 50.0, 150.0, 70.0],
+            0.9,
+            "Vector Text".to_string(),
+        )];

        // Create mock OCR that produces overlapping text (IoU > 0.5)
        // OCR bbox [40, 40, 160, 80] overlaps vector bbox [50, 50, 150, 70]
@ -820,9 +857,11 @@ mod tests {
        // Intersection = [50, 50, 150, 70] = 100 * 20 = 2000
        // Union = (110*30) + (100*20) - 2000 = 3300 + 2000 - 2000 = 3300
        // IoU = 2000 / 3300 = 0.606 > 0.5
-        let mock_spans = vec![
-            Span::ocr([45.0, 45.0, 155.0, 75.0], 0.7, "OCR Text".to_string()),
-        ];
+        let mock_spans = vec![Span::ocr(
+            [45.0, 45.0, 155.0, 75.0],
+            0.7,
+            "OCR Text".to_string(),
+        )];
        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
        let mock_ocr = MockOcrCallback {
            call_count,
@ -845,7 +884,11 @@ mod tests {

        // With IoU > 0.5 and vector confidence >= 0.5, vector should win
        // Result should have only 1 span (the vector span)
-        assert_eq!(result.len(), 1, "Should have only 1 span after merge (vector wins)");
+        assert_eq!(
+            result.len(),
+            1,
+            "Should have only 1 span after merge (vector wins)"
+        );
        assert_eq!(result[0].source, SpanSource::Vector);
        assert_eq!(result[0].text, "Vector Text");
    }
@ -860,14 +903,18 @@ mod tests {
        let classification = PageClassification::hybrid(0.75, cells);

        // Vector span with low confidence
-        let vector_spans = vec![
-            Span::vector([50.0, 50.0, 150.0, 70.0], 0.2, "Bad Vector".to_string()),
-        ];
+        let vector_spans = vec![Span::vector(
+            [50.0, 50.0, 150.0, 70.0],
+            0.2,
+            "Bad Vector".to_string(),
+        )];

        // OCR span with high confidence, overlapping vector
-        let mock_spans = vec![
-            Span::ocr([45.0, 45.0, 155.0, 75.0], 0.7, "Good OCR".to_string()),
-        ];
+        let mock_spans = vec![Span::ocr(
+            [45.0, 45.0, 155.0, 75.0],
+            0.7,
+            "Good OCR".to_string(),
+        )];
        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
        let mock_ocr = MockOcrCallback {
            call_count,
@ -888,7 +935,11 @@ mod tests {

        // With IoU > 0.5 but vector confidence < 0.5, OCR should be kept
        // Result should have 2 spans (both vector and OCR kept)
-        assert_eq!(result.len(), 2, "Both vector and OCR should be kept when vector confidence is low");
+        assert_eq!(
+            result.len(),
+            2,
+            "Both vector and OCR should be kept when vector confidence is low"
+        );
        assert!(result.iter().any(|s| s.source == SpanSource::Vector));
        assert!(result.iter().any(|s| s.source == SpanSource::Ocr));
    }
@ -898,9 +949,11 @@ mod tests {
        // Test that non-hybrid classifications return only vector spans

        let classification = PageClassification::new(PageClass::Vector, 0.9);
-        let vector_spans = vec![
-            Span::vector([50.0, 50.0, 150.0, 70.0], 0.9, "Vector Only".to_string()),
-        ];
+        let vector_spans = vec![Span::vector(
+            [50.0, 50.0, 150.0, 70.0],
+            0.9,
+            "Vector Only".to_string(),
+        )];

        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
        let mock_ocr = MockOcrCallback {
@ -934,9 +987,11 @@ mod tests {
        // Test hybrid classification with empty hybrid_cells

        let classification = PageClassification::hybrid(0.75, BTreeSet::new());
-        let vector_spans = vec![
-            Span::vector([50.0, 50.0, 150.0, 70.0], 0.9, "Vector".to_string()),
-        ];
+        let vector_spans = vec![Span::vector(
+            [50.0, 50.0, 150.0, 70.0],
+            0.9,
+            "Vector".to_string(),
+        )];

        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
        let mock_ocr = MockOcrCallback {
--- a/crates/pdftract-core/src/layout/caption.rs
+++ b/crates/pdftract-core/src/layout/caption.rs
@ -84,9 +84,9 @@ impl PageContext {
    /// Create a new page context with default values.
    pub fn new() -> Self {
        Self {
-            page_body_median: 12.0,  // Typical body text is ~12pt
-            line_height: 14.0,       // Typical line spacing is ~1.2x font size
-            num_columns: 1,          // Default single-column layout
+            page_body_median: 12.0, // Typical body text is ~12pt
+            line_height: 14.0,      // Typical line spacing is ~1.2x font size
+            num_columns: 1,         // Default single-column layout
        }
    }

@ -180,7 +180,11 @@ pub fn classify_page_captions(blocks: &mut [Block], ctx: &PageContext) {

        // Update previous block for next iteration
        // Note: we use a reference to the block before any modification
-        prev_block = if i < blocks.len() { Some(&blocks[i]) } else { None };
+        prev_block = if i < blocks.len() {
+            Some(&blocks[i])
+        } else {
+            None
+        };
    }
 }

@ -206,7 +210,13 @@ mod tests {
    fn test_caption_immediately_below_figure() {
        // Figure at y=[100, 200], caption at y=[90, 100] (1 line below)
        let figure = make_figure([50.0, 100.0, 150.0, 200.0], 0);
-        let caption = make_block("paragraph", "Figure 1: A chart", 9.0, [50.0, 90.0, 150.0, 100.0], 0);
+        let caption = make_block(
+            "paragraph",
+            "Figure 1: A chart",
+            9.0,
+            [50.0, 90.0, 150.0, 100.0],
+            0,
+        );

        let ctx = PageContext::with_values(12.0, 10.0, 1);

@ -217,7 +227,13 @@ mod tests {
    fn test_caption_too_far_below_figure() {
        // Figure at y=[100, 200], caption at y=[70, 80] (3 lines below = 30pt)
        let figure = make_figure([50.0, 100.0, 150.0, 200.0], 0);
-        let caption = make_block("paragraph", "Figure 1: A chart", 9.0, [50.0, 70.0, 150.0, 80.0], 0);
+        let caption = make_block(
+            "paragraph",
+            "Figure 1: A chart",
+            9.0,
+            [50.0, 70.0, 150.0, 80.0],
+            0,
+        );

        let ctx = PageContext::with_values(12.0, 10.0, 1);

@ -228,7 +244,13 @@ mod tests {
    fn test_caption_font_not_smaller() {
        // Caption with same font size as body text
        let figure = make_figure([50.0, 100.0, 150.0, 200.0], 0);
-        let not_caption = make_block("paragraph", "Figure 1: A chart", 12.0, [50.0, 90.0, 150.0, 100.0], 0);
+        let not_caption = make_block(
+            "paragraph",
+            "Figure 1: A chart",
+            12.0,
+            [50.0, 90.0, 150.0, 100.0],
+            0,
+        );

        let ctx = PageContext::with_values(12.0, 10.0, 1);

@ -239,7 +261,13 @@ mod tests {
    fn test_caption_different_column() {
        // Figure in column 0, caption in column 1 (two-column layout)
        let figure = make_figure([50.0, 100.0, 150.0, 200.0], 0);
-        let caption = make_block("paragraph", "Figure 1: A chart", 9.0, [200.0, 90.0, 300.0, 100.0], 1);
+        let caption = make_block(
+            "paragraph",
+            "Figure 1: A chart",
+            9.0,
+            [200.0, 90.0, 300.0, 100.0],
+            1,
+        );

        let ctx = PageContext::with_values(12.0, 10.0, 2);

@ -258,7 +286,13 @@ mod tests {
    #[test]
    fn test_caption_above_figure() {
        // Caption positioned above the figure (not detected in v0.1.0)
-        let caption = make_block("paragraph", "Figure 1: A chart", 9.0, [50.0, 200.0, 150.0, 210.0], 0);
+        let caption = make_block(
+            "paragraph",
+            "Figure 1: A chart",
+            9.0,
+            [50.0, 200.0, 150.0, 210.0],
+            0,
+        );
        let figure = make_figure([50.0, 100.0, 150.0, 200.0], 0);

        let ctx = PageContext::with_values(12.0, 10.0, 1);
@ -269,9 +303,21 @@ mod tests {
    #[test]
    fn test_page_classification() {
        let mut blocks = vec![
-            make_figure([50.0, 100.0, 150.0, 200.0], 0),  // Figure
-            make_block("paragraph", "Figure 1: A chart", 9.0, [50.0, 90.0, 150.0, 100.0], 0),  // Caption
-            make_block("paragraph", "Next paragraph", 12.0, [50.0, 70.0, 150.0, 80.0], 0),  // Regular text
+            make_figure([50.0, 100.0, 150.0, 200.0], 0), // Figure
+            make_block(
+                "paragraph",
+                "Figure 1: A chart",
+                9.0,
+                [50.0, 90.0, 150.0, 100.0],
+                0,
+            ), // Caption
+            make_block(
+                "paragraph",
+                "Next paragraph",
+                12.0,
+                [50.0, 70.0, 150.0, 80.0],
+                0,
+            ), // Regular text
        ];

        let ctx = PageContext::with_values(12.0, 10.0, 1);
@ -280,7 +326,7 @@ mod tests {

        assert_eq!(blocks[0].kind, "figure");
        assert_eq!(blocks[1].kind, "caption");
-        assert_eq!(blocks[2].kind, "paragraph");  // Unchanged
+        assert_eq!(blocks[2].kind, "paragraph"); // Unchanged
    }

    #[test]
--- a/crates/pdftract-core/src/layout/line.rs
+++ b/crates/pdftract-core/src/layout/line.rs
@ -254,10 +254,7 @@ mod tests {
    #[test]
    fn test_union_bboxes_nested() {
        // Small box inside larger box
-        let bboxes = vec![
-            [0.0, 0.0, 100.0, 100.0],
-            [25.0, 25.0, 75.0, 75.0],
-        ];
+        let bboxes = vec![[0.0, 0.0, 100.0, 100.0], [25.0, 25.0, 75.0, 75.0]];
        let result = union_bboxes(&bboxes);
        // Union should be the larger box
        assert_eq!(result, Some([0.0, 0.0, 100.0, 100.0]));
@ -266,10 +263,7 @@ mod tests {
    #[test]
    fn test_union_bboxes_disjoint() {
        // Two disjoint boxes
-        let bboxes = vec![
-            [0.0, 0.0, 50.0, 50.0],
-            [100.0, 100.0, 150.0, 150.0],
-        ];
+        let bboxes = vec![[0.0, 0.0, 50.0, 50.0], [100.0, 100.0, 150.0, 150.0]];
        let result = union_bboxes(&bboxes);
        assert_eq!(result, Some([0.0, 0.0, 150.0, 150.0]));
    }
--- a/crates/pdftract-core/src/layout/mod.rs
+++ b/crates/pdftract-core/src/layout/mod.rs
@ -12,6 +12,6 @@ pub mod caption;
 pub mod line;
 pub mod readability;

-pub use caption::{Block, PageContext, classify_caption, classify_page_captions};
-pub use line::{Line, LineDirection, compute_baseline, union_bboxes, HasBBox};
+pub use caption::{classify_caption, classify_page_captions, Block, PageContext};
+pub use line::{compute_baseline, union_bboxes, HasBBox, Line, LineDirection};
 pub use readability::{aggregate_page_readability, ScoredSpan};
--- a/crates/pdftract-core/src/layout/readability.rs
+++ b/crates/pdftract-core/src/layout/readability.rs
@ -234,10 +234,7 @@ mod tests {

    #[test]
    fn test_empty_strings() {
-        let spans = vec![
-            TestSpan::new("", 0.5),
-            TestSpan::new("", 0.8),
-        ];
+        let spans = vec![TestSpan::new("", 0.5), TestSpan::new("", 0.8)];
        // All empty -> total_chars = 0 -> return 0.0
        assert_eq!(aggregate_page_readability(&spans), 0.0);
    }
@ -282,10 +279,7 @@ mod tests {

    #[test]
    fn test_all_zero_scores() {
-        let spans = vec![
-            TestSpan::new("a", 0.0),
-            TestSpan::new("b", 0.0),
-        ];
+        let spans = vec![TestSpan::new("a", 0.0), TestSpan::new("b", 0.0)];
        assert_eq!(aggregate_page_readability(&spans), 0.0);
    }

@ -304,7 +298,10 @@ mod tests {
            TestSpan::new("b".repeat(10), 0.5),
        ];

-        assert_eq!(aggregate_page_readability(&spans1), aggregate_page_readability(&spans2));
+        assert_eq!(
+            aggregate_page_readability(&spans1),
+            aggregate_page_readability(&spans2)
+        );
    }

    #[test]
@ -328,8 +325,8 @@ mod tests {
    fn test_zero_width_joiner() {
        // Test zero-width joiner and combining marks
        let spans = vec![
-            TestSpan::new("café", 0.9),  // 4 chars: c a f é
-            TestSpan::new("नमस्ते", 0.8),  // 6 chars (Hindi namaste)
+            TestSpan::new("café", 0.9), // 4 chars: c a f é
+            TestSpan::new("नमस्ते", 0.8), // 6 chars (Hindi namaste)
        ];
        // Total = 10 chars, half = 5
        // Cumsum after first = 4, not > 5
--- a/crates/pdftract-core/src/markdown.rs
+++ b/crates/pdftract-core/src/markdown.rs
@ -46,8 +46,10 @@ use std::sync::OnceLock;
 fn anchor_regex() -> &'static Regex {
    static REGEX: OnceLock<Regex> = OnceLock::new();
    REGEX.get_or_init(|| {
-        Regex::new(r"<!--\s*pdftract:\s*page=(\d+)\s+block=(\d+)\s+bbox=\[([\d.,]+)\]\s+kind=(\w+)\s*-->")
-            .expect("invalid ANCHOR_REGEX")
+        Regex::new(
+            r"<!--\s*pdftract:\s*page=(\d+)\s+block=(\d+)\s+bbox=\[([\d.,]+)\]\s+kind=(\w+)\s*-->",
+        )
+        .expect("invalid ANCHOR_REGEX")
    })
 }

@ -71,7 +73,12 @@ pub struct Anchor {
 impl Anchor {
    /// Create a new anchor from components.
    pub fn new(page: usize, block: usize, bbox: [f32; 4], kind: String) -> Self {
-        Self { page, block, bbox, kind }
+        Self {
+            page,
+            block,
+            bbox,
+            kind,
+        }
    }

    /// Format this anchor as an HTML comment.
@ -90,7 +97,13 @@ impl Anchor {
    pub fn to_comment(&self) -> String {
        format!(
            "<!-- pdftract: page={} block={} bbox=[{:.1},{:.1},{:.1},{:.1}] kind={} -->",
-            self.page, self.block, self.bbox[0], self.bbox[1], self.bbox[2], self.bbox[3], self.kind
+            self.page,
+            self.block,
+            self.bbox[0],
+            self.bbox[1],
+            self.bbox[2],
+            self.bbox[3],
+            self.kind
        )
    }
 }
@ -194,7 +207,12 @@ fn parse_bbox(s: &str) -> Option<[f32; 4]> {
 /// # Returns
 ///
 /// A markdown string with optional anchor.
-pub fn block_to_markdown(block: &BlockJson, page_index: usize, block_index: usize, include_anchor: bool) -> String {
+pub fn block_to_markdown(
+    block: &BlockJson,
+    page_index: usize,
+    block_index: usize,
+    include_anchor: bool,
+) -> String {
    let mut result = String::new();

    // Add anchor comment if requested
@ -202,7 +220,12 @@ pub fn block_to_markdown(block: &BlockJson, page_index: usize, block_index: usiz
        let anchor = Anchor::new(
            page_index,
            block_index,
-            [block.bbox[0] as f32, block.bbox[1] as f32, block.bbox[2] as f32, block.bbox[3] as f32],
+            [
+                block.bbox[0] as f32,
+                block.bbox[1] as f32,
+                block.bbox[2] as f32,
+                block.bbox[3] as f32,
+            ],
            block.kind.clone(),
        );
        result.push_str(&anchor.to_comment());
@ -251,7 +274,12 @@ pub fn block_to_markdown(block: &BlockJson, page_index: usize, block_index: usiz
 /// # Returns
 ///
 /// A markdown string with all blocks from the page.
-pub fn page_to_markdown(blocks: &[BlockJson], page_index: usize, include_anchor: bool, include_page_break: bool) -> String {
+pub fn page_to_markdown(
+    blocks: &[BlockJson],
+    page_index: usize,
+    include_anchor: bool,
+    include_page_break: bool,
+) -> String {
    let mut result = String::new();

    for (block_index, block) in blocks.iter().enumerate() {
@ -288,15 +316,26 @@ mod tests {
    fn test_anchor_to_comment() {
        let anchor = Anchor::new(3, 12, [72.0, 640.5, 540.0, 672.0], "heading".to_string());
        let comment = anchor.to_comment();
-        assert_eq!(comment, "<!-- pdftract: page=3 block=12 bbox=[72.0,640.5,540.0,672.0] kind=heading -->");
+        assert_eq!(
+            comment,
+            "<!-- pdftract: page=3 block=12 bbox=[72.0,640.5,540.0,672.0] kind=heading -->"
+        );
    }

    #[test]
    fn test_anchor_to_comment_round_bbox() {
-        let anchor = Anchor::new(0, 0, [72.123, 640.567, 540.999, 672.111], "paragraph".to_string());
+        let anchor = Anchor::new(
+            0,
+            0,
+            [72.123, 640.567, 540.999, 672.111],
+            "paragraph".to_string(),
+        );
        let comment = anchor.to_comment();
        // Should be rounded to 1 decimal place
-        assert_eq!(comment, "<!-- pdftract: page=0 block=0 bbox=[72.1,640.6,541.0,672.1] kind=paragraph -->");
+        assert_eq!(
+            comment,
+            "<!-- pdftract: page=0 block=0 bbox=[72.1,640.6,541.0,672.1] kind=paragraph -->"
+        );
    }

    #[test]
@ -342,16 +381,23 @@ Some text."#;

    #[test]
    fn test_parse_anchors_whitespace_tolerant() {
-        let md = r#"<!--  pdftract:  page=0  block=0  bbox=[72.0,640.5,540.0,672.0]  kind=heading  -->"#;
+        let md =
+            r#"<!--  pdftract:  page=0  block=0  bbox=[72.0,640.5,540.0,672.0]  kind=heading  -->"#;
        let anchors = parse_anchors(md);
        assert_eq!(anchors.len(), 1);
    }

    #[test]
    fn test_parse_bbox() {
-        assert_eq!(parse_bbox("72.0,640.5,540.0,672.0"), Some([72.0, 640.5, 540.0, 672.0]));
+        assert_eq!(
+            parse_bbox("72.0,640.5,540.0,672.0"),
+            Some([72.0, 640.5, 540.0, 672.0])
+        );
        assert_eq!(parse_bbox("0,0,100,100"), Some([0.0, 0.0, 100.0, 100.0]));
-        assert_eq!(parse_bbox("72.0, 640.5, 540.0, 672.0"), Some([72.0, 640.5, 540.0, 672.0])); // with spaces
+        assert_eq!(
+            parse_bbox("72.0, 640.5, 540.0, 672.0"),
+            Some([72.0, 640.5, 540.0, 672.0])
+        ); // with spaces
        assert_eq!(parse_bbox("invalid"), None);
        assert_eq!(parse_bbox("1,2,3"), None); // too few values
        assert_eq!(parse_bbox("1,2,3,4,5"), None); // too many values
@ -369,7 +415,9 @@ Some text."#;
        };

        let md = block_to_markdown(&block, 0, 0, true);
-        assert!(md.contains("<!-- pdftract: page=0 block=0 bbox=[72.0,640.5,540.0,672.0] kind=heading -->"));
+        assert!(md.contains(
+            "<!-- pdftract: page=0 block=0 bbox=[72.0,640.5,540.0,672.0] kind=heading -->"
+        ));
        assert!(md.contains("## Chapter 1"));
    }

@ -438,16 +486,14 @@ Some text."#;

    #[test]
    fn test_roundtrip_extract_and_parse() {
-        let blocks = vec![
-            BlockJson {
-                kind: "heading".to_string(),
-                text: "Chapter 1".to_string(),
-                bbox: [72.0, 640.5, 540.0, 672.0],
-                level: Some(2),
-                table_index: None,
-                receipt: None,
-            },
-        ];
+        let blocks = vec![BlockJson {
+            kind: "heading".to_string(),
+            text: "Chapter 1".to_string(),
+            bbox: [72.0, 640.5, 540.0, 672.0],
+            level: Some(2),
+            table_index: None,
+            receipt: None,
+        }];

        let md = page_to_markdown(&blocks, 3, true, false);
        let anchors = parse_anchors(&md);
--- a/crates/pdftract-core/src/ocr.rs
+++ b/crates/pdftract-core/src/ocr.rs
@ -204,7 +204,10 @@ fn resolve_tessdata_dir() -> Option<PathBuf> {
 ///
 /// - `detect_available_languages` for pack detection logic
 /// - Phase 5.4 in the plan for OCR language pack handling
-pub fn validate_ocr_languages(requested_langs: &[String], diagnostics: &mut Vec<crate::diagnostics::Diagnostic>) -> String {
+pub fn validate_ocr_languages(
+    requested_langs: &[String],
+    diagnostics: &mut Vec<crate::diagnostics::Diagnostic>,
+) -> String {
    let available = detect_available_languages();

    // Track which requested languages are available
@ -217,12 +220,10 @@ pub fn validate_ocr_languages(requested_langs: &[String], diagnostics: &mut Vec<
        } else {
            missing_langs.push(lang);
            // Emit diagnostic for missing language
-            diagnostics.push(
-                crate::diagnostics::Diagnostic::with_dynamic_no_offset(
-                    crate::diagnostics::DiagCode::OcrLanguageUnavailable,
-                    format!("Requested OCR language pack '{}' is not installed", lang),
-                )
-            );
+            diagnostics.push(crate::diagnostics::Diagnostic::with_dynamic_no_offset(
+                crate::diagnostics::DiagCode::OcrLanguageUnavailable,
+                format!("Requested OCR language pack '{}' is not installed", lang),
+            ));
        }
    }

@ -242,12 +243,10 @@ pub fn validate_ocr_languages(requested_langs: &[String], diagnostics: &mut Vec<
            return "eng".to_string();
        } else {
            // No languages available at all - this will cause Tesseract init to fail
-            diagnostics.push(
-                crate::diagnostics::Diagnostic::with_dynamic_no_offset(
-                    crate::diagnostics::DiagCode::OcrLanguageUnavailable,
-                    "No OCR language packs available (including fallback 'eng')".to_string(),
-                )
-            );
+            diagnostics.push(crate::diagnostics::Diagnostic::with_dynamic_no_offset(
+                crate::diagnostics::DiagCode::OcrLanguageUnavailable,
+                "No OCR language packs available (including fallback 'eng')".to_string(),
+            ));
            return "eng".to_string(); // Still return eng; Tesseract will fail with clear error
        }
    }
@ -418,7 +417,8 @@ impl TessState {
            .map_err(|e| format!("Invalid language string: {}", e))?;

        let init_result = if let Some(ref path) = tessdata_path {
-            let path_str = path.to_str()
+            let path_str = path
+                .to_str()
                .ok_or_else(|| format!("Tessdata path contains invalid UTF-8: {:?}", path))?;
            let path_cstr = CString::new(path_str)
                .map_err(|e| format!("Invalid tessdata path string: {}", e))?;
@ -432,9 +432,7 @@ impl TessState {
            format!(
                "Failed to initialize Tesseract (language='{}', tessdata_path={:?}): {}. \
                 Ensure language data files are installed (see `pdftract doctor tesseract-langs`).",
-                opts.language,
-                tessdata_path,
-                e
+                opts.language, tessdata_path, e
            )
        })?;

@ -523,15 +521,16 @@ pub fn borrow_or_init(opts: &TessOpts) -> std::cell::RefMut<'static, Option<Tess
        match state_ref.as_ref() {
            // No cached instance - initialize
            None => {
-                *state_ref = Some(TessState::new(opts.clone())
-                    .expect("Tesseract initialization failed"));
+                *state_ref =
+                    Some(TessState::new(opts.clone()).expect("Tesseract initialization failed"));
            }
            // Cached instance exists - check if opts match
            Some(cached) => {
                if cached.opts() != opts {
                    // Opts changed - reinitialize
-                    *state_ref = Some(TessState::new(opts.clone())
-                        .expect("Tesseract reinitialization failed"));
+                    *state_ref = Some(
+                        TessState::new(opts.clone()).expect("Tesseract reinitialization failed"),
+                    );
                }
                // else: opts match, reuse cached instance
            }
@ -653,7 +652,11 @@ mod tests {
                let _state = borrow_or_init(&opts);
            }

-            assert_eq!(init_count(), 1, "Should have exactly 1 init (first call only)");
+            assert_eq!(
+                init_count(),
+                1,
+                "Should have exactly 1 init (first call only)"
+            );
        });

        if init_result.is_err() {
@ -724,7 +727,10 @@ mod tests {
                count
            );

-            println!("Multithreaded test: {} inits for 100 pages across rayon workers", count);
+            println!(
+                "Multithreaded test: {} inits for 100 pages across rayon workers",
+                count
+            );
        });

        if init_result.is_err() {
@ -1028,7 +1034,12 @@ impl HocrWord {

        // Step 5: Add cell origin if this is from a hybrid cell OCR
        let (pdf_x0, pdf_y0, pdf_x1, pdf_y1) = if let Some([cell_x, cell_y]) = cell_origin {
-            (pdf_x0 + cell_x, pdf_y0 + cell_y, pdf_x1 + cell_x, pdf_y1 + cell_y)
+            (
+                pdf_x0 + cell_x,
+                pdf_y0 + cell_y,
+                pdf_x1 + cell_x,
+                pdf_y1 + cell_y,
+            )
        } else {
            (pdf_x0, pdf_y0, pdf_x1, pdf_y1)
        };
@ -1220,10 +1231,7 @@ fn is_ocrx_word(element: &quick_xml::events::BytesStart) -> bool {
 }

 /// Get an attribute value from an element.
-fn get_attribute<'a>(
-    element: &'a quick_xml::events::BytesStart<'a>,
-    name: &str,
-) -> Option<String> {
+fn get_attribute<'a>(element: &'a quick_xml::events::BytesStart<'a>, name: &str) -> Option<String> {
    element
        .attributes()
        .filter_map(|a| a.ok())
@ -1250,13 +1258,17 @@ fn parse_title_attribute(title: &str) -> Result<([u32; 4], u8), String> {
                // Parse bbox coordinates: "bbox x0 y0 x1 y1"
                let coords: Vec<&str> = parts.collect();
                if coords.len() >= 4 {
-                    let x0 = coords[0].parse::<u32>()
+                    let x0 = coords[0]
+                        .parse::<u32>()
                        .map_err(|_| format!("Invalid bbox x0: {}", coords[0]))?;
-                    let y0 = coords[1].parse::<u32>()
+                    let y0 = coords[1]
+                        .parse::<u32>()
                        .map_err(|_| format!("Invalid bbox y0: {}", coords[1]))?;
-                    let x1 = coords[2].parse::<u32>()
+                    let x1 = coords[2]
+                        .parse::<u32>()
                        .map_err(|_| format!("Invalid bbox x1: {}", coords[2]))?;
-                    let y1 = coords[3].parse::<u32>()
+                    let y1 = coords[3]
+                        .parse::<u32>()
                        .map_err(|_| format!("Invalid bbox y1: {}", coords[3]))?;

                    bbox = Some([x0, y0, x1, y1]);
@ -1265,7 +1277,8 @@ fn parse_title_attribute(title: &str) -> Result<([u32; 4], u8), String> {
            Some("x_wconf") => {
                // Parse confidence: "x_wconf NNN"
                if let Some(conf_str) = parts.next() {
-                    let conf = conf_str.parse::<u8>()
+                    let conf = conf_str
+                        .parse::<u8>()
                        .map_err(|_| format!("Invalid x_wconf: {}", conf_str))?;
                    confidence = Some(conf);
                }
@ -1540,7 +1553,12 @@ mod hocr_tests {
            let y = (i / 600) * 30;
            hocr.push_str(&format!(
                "<span class='ocrx_word' title='bbox {} {} {} {}; x_wconf {}'>word{}</span>",
-                x, y, x + 50, y + 20, 85 + (i % 15), i
+                x,
+                y,
+                x + 50,
+                y + 20,
+                85 + (i % 15),
+                i
            ));
        }
        hocr.push_str("</body></html>");
@ -1553,8 +1571,11 @@ mod hocr_tests {
        assert_eq!(words.len(), 1000);

        // Should be very fast (< 10ms for 1000 words)
-        assert!(elapsed < std::time::Duration::from_millis(50),
-            "HOCR parsing took {:?}, expected < 50ms", elapsed);
+        assert!(
+            elapsed < std::time::Duration::from_millis(50),
+            "HOCR parsing took {:?}, expected < 50ms",
+            elapsed
+        );
    }

    #[test]
@ -1609,7 +1630,10 @@ mod hocr_tests {
        if let Ok(quick_xml::events::Event::Start(e)) = reader.read_event_into(&mut buf) {
            assert_eq!(get_attribute(&e, "class"), Some("ocrx_word".to_string()));
            assert_eq!(get_attribute(&e, "id"), Some("test".to_string()));
-            assert_eq!(get_attribute(&e, "title"), Some("bbox 0 0 50 20".to_string()));
+            assert_eq!(
+                get_attribute(&e, "title"),
+                Some("bbox 0 0 50 20".to_string())
+            );
            assert_eq!(get_attribute(&e, "missing"), None);
        }
    }
@ -1632,15 +1656,31 @@ mod hocr_tests {
        let bbox = word.to_pdf_bbox(300, 792.0, None, None);

        // Check X coordinates (unchanged by Y-flip)
-        assert!((bbox[0] - 0.0).abs() < 0.1, "x0 should be ~0.0, got {}", bbox[0]);
-        assert!((bbox[2] - 21.6).abs() < 0.1, "x1 should be ~21.6, got {}", bbox[2]);
+        assert!(
+            (bbox[0] - 0.0).abs() < 0.1,
+            "x0 should be ~0.0, got {}",
+            bbox[0]
+        );
+        assert!(
+            (bbox[2] - 21.6).abs() < 0.1,
+            "x1 should be ~21.6, got {}",
+            bbox[2]
+        );

        // Check Y coordinates (flipped)
        // y0 = 792 - 30*72/300 = 792 - 7.2 = 784.8 (but with padding subtract: 792 - 4.8 = 787.2)
        // Actually: y1_pt = 20 * 0.24 = 4.8, so pdf_y0 = 792 - 4.8 = 787.2
        // y0_pt = 0, so pdf_y1 = 792 - 0 = 792
-        assert!((bbox[1] - 787.2).abs() < 0.1, "y0 should be ~787.2, got {}", bbox[1]);
-        assert!((bbox[3] - 792.0).abs() < 0.1, "y1 should be ~792.0, got {}", bbox[3]);
+        assert!(
+            (bbox[1] - 787.2).abs() < 0.1,
+            "y0 should be ~787.2, got {}",
+            bbox[1]
+        );
+        assert!(
+            (bbox[3] - 792.0).abs() < 0.1,
+            "y1 should be ~792.0, got {}",
+            bbox[3]
+        );
    }

    #[test]
@ -1688,9 +1728,15 @@ mod hocr_tests {
        let bbox = word.to_pdf_bbox(300, 792.0, None, None);

        // After padding subtraction, x0 and y0 should be at 0 (page origin)
-        assert!((bbox[0] - 0.0).abs() < 0.1, "x0 should be ~0.0 after padding subtraction");
+        assert!(
+            (bbox[0] - 0.0).abs() < 0.1,
+            "x0 should be ~0.0 after padding subtraction"
+        );
        // y0 should be near page height (top of page after Y-flip)
-        assert!(bbox[1] > 780.0, "y0 should be near top of page after Y-flip");
+        assert!(
+            bbox[1] > 780.0,
+            "y0 should be near top of page after Y-flip"
+        );
    }

    #[test]
@ -1705,17 +1751,29 @@ mod hocr_tests {
        // At 300 DPI: 100px * 72/300 = 24pt
        let bbox_300 = word.to_pdf_bbox(300, 792.0, None, None);
        let width_300 = bbox_300[2] - bbox_300[0];
-        assert!((width_300 - 24.0).abs() < 0.1, "Width at 300 DPI should be ~24pt, got {}", width_300);
+        assert!(
+            (width_300 - 24.0).abs() < 0.1,
+            "Width at 300 DPI should be ~24pt, got {}",
+            width_300
+        );

        // At 200 DPI: 100px * 72/200 = 36pt
        let bbox_200 = word.to_pdf_bbox(200, 792.0, None, None);
        let width_200 = bbox_200[2] - bbox_200[0];
-        assert!((width_200 - 36.0).abs() < 0.1, "Width at 200 DPI should be ~36pt, got {}", width_200);
+        assert!(
+            (width_200 - 36.0).abs() < 0.1,
+            "Width at 200 DPI should be ~36pt, got {}",
+            width_200
+        );

        // At 400 DPI: 100px * 72/400 = 18pt
        let bbox_400 = word.to_pdf_bbox(400, 792.0, None, None);
        let width_400 = bbox_400[2] - bbox_400[0];
-        assert!((width_400 - 18.0).abs() < 0.1, "Width at 400 DPI should be ~18pt, got {}", width_400);
+        assert!(
+            (width_400 - 18.0).abs() < 0.1,
+            "Width at 400 DPI should be ~18pt, got {}",
+            width_400
+        );
    }

    #[test]
@ -1736,11 +1794,15 @@ mod hocr_tests {
        let bbox = word.to_pdf_bbox(300, 99.0, None, Some(cell_origin));

        // X should be offset by cell origin
-        assert!((bbox[0] - (229.5 + 10.0 * 72.0 / 300.0)).abs() < 1.0,
-            "x0 should include cell origin offset");
+        assert!(
+            (bbox[0] - (229.5 + 10.0 * 72.0 / 300.0)).abs() < 1.0,
+            "x0 should include cell origin offset"
+        );
        // Y should be offset by cell origin (note: cell height is 99pt)
-        assert!((bbox[1] - (594.0 + 10.0 * 72.0 / 300.0)).abs() < 1.0,
-            "y0 should include cell origin offset");
+        assert!(
+            (bbox[1] - (594.0 + 10.0 * 72.0 / 300.0)).abs() < 1.0,
+            "y0 should include cell origin offset"
+        );
    }

    #[test]
@ -1776,8 +1838,10 @@ mod hocr_tests {
        // After 90-degree rotation, the bbox should be transformed
        // The exact values depend on the rotation implementation
        // Just verify that the rotation changes the coordinates
-        assert!(bbox_rot_90[0] != bbox_no_rot[0] || bbox_rot_90[1] != bbox_no_rot[1],
-            "Rotation should change coordinates");
+        assert!(
+            bbox_rot_90[0] != bbox_no_rot[0] || bbox_rot_90[1] != bbox_no_rot[1],
+            "Rotation should change coordinates"
+        );
    }

    #[test]
@ -1825,8 +1889,14 @@ mod hocr_tests {
        let bbox_invalid = word.to_pdf_bbox(300, 792.0, Some(45), None); // 45° is not supported

        // Invalid rotation should return unchanged bbox
-        assert!((bbox_invalid[0] - bbox_no_rot[0]).abs() < 0.01, "Invalid rotation should not change x0");
-        assert!((bbox_invalid[1] - bbox_no_rot[1]).abs() < 0.01, "Invalid rotation should not change y0");
+        assert!(
+            (bbox_invalid[0] - bbox_no_rot[0]).abs() < 0.01,
+            "Invalid rotation should not change x0"
+        );
+        assert!(
+            (bbox_invalid[1] - bbox_no_rot[1]).abs() < 0.01,
+            "Invalid rotation should not change y0"
+        );
    }

    #[test]
@ -1851,8 +1921,16 @@ mod hocr_tests {

            // At 300 DPI: 40px = 9.6pt, 20px = 4.8pt
            // Allow some tolerance for floating-point errors
-            assert!((width - 9.6).abs() < 0.2, "Width should be ~9.6pt at {}° rotation", rot);
-            assert!((height - 4.8).abs() < 0.2, "Height should be ~4.8pt at {}° rotation", rot);
+            assert!(
+                (width - 9.6).abs() < 0.2,
+                "Width should be ~9.6pt at {}° rotation",
+                rot
+            );
+            assert!(
+                (height - 4.8).abs() < 0.2,
+                "Height should be ~4.8pt at {}° rotation",
+                rot
+            );
        }
    }
 }
@ -1952,11 +2030,7 @@ pub fn run_tesseract(
        .into_iter()
        .map(|word| {
            let pdf_bbox = word.to_pdf_bbox(dpi, page_height_pt, None, None);
-            crate::hybrid::Span::ocr(
-                pdf_bbox,
-                word.confidence(),
-                word.text,
-            )
+            crate::hybrid::Span::ocr(pdf_bbox, word.confidence(), word.text)
        })
        .collect();

@ -2016,11 +2090,7 @@ pub fn run_tesseract_on_cell(
        .into_iter()
        .map(|word| {
            let pdf_bbox = word.to_pdf_bbox(dpi, cell_height_pt, None, Some(cell_origin));
-            crate::hybrid::Span::ocr(
-                pdf_bbox,
-                word.confidence(),
-                word.text,
-            )
+            crate::hybrid::Span::ocr(pdf_bbox, word.confidence(), word.text)
        })
        .collect();

@ -2041,9 +2111,7 @@ mod integration_tests {

        let opts = TessOpts::default();

-        let result = std::panic::catch_unwind(|| {
-            run_tesseract(&img, 300, 792.0, &opts)
-        });
+        let result = std::panic::catch_unwind(|| run_tesseract(&img, 300, 792.0, &opts));

        if result.is_err() {
            // Tesseract not available - skip gracefully
@ -2064,9 +2132,8 @@ mod integration_tests {
        let opts = TessOpts::default();
        let cell_origin = [100.0, 200.0];

-        let result = std::panic::catch_unwind(|| {
-            run_tesseract_on_cell(&img, 300, 99.0, cell_origin, &opts)
-        });
+        let result =
+            std::panic::catch_unwind(|| run_tesseract_on_cell(&img, 300, 99.0, cell_origin, &opts));

        if result.is_err() {
            println!("Skipping test_run_tesseract_on_cell_offset: Tesseract not available");
@ -2160,7 +2227,9 @@ pub fn calculate_wer(ocr_output: &str, ground_truth: &str) -> f64 {
 /// A `Vec<String>` of normalized words.
 fn normalize_text(text: &str) -> Vec<String> {
    // Define punctuation to strip
-    let punct = ['.', ',', '!', '?', ';', ':', '"', '\'', '(', ')', '[', ']', '{', '}'];
+    let punct = [
+        '.', ',', '!', '?', ';', ':', '"', '\'', '(', ')', '[', ']', '{', '}',
+    ];

    text.to_lowercase()
        .split_whitespace()
@ -2202,9 +2271,9 @@ fn word_edit_distance(ocr: &[String], reference: &[String]) -> (usize, usize, us
                dp[i][j] = dp[i - 1][j - 1]; // No operation needed
            } else {
                dp[i][j] = [
-                    dp[i - 1][j] + 1,      // Deletion
-                    dp[i][j - 1] + 1,      // Insertion
-                    dp[i - 1][j - 1] + 1,  // Substitution
+                    dp[i - 1][j] + 1,     // Deletion
+                    dp[i][j - 1] + 1,     // Insertion
+                    dp[i - 1][j - 1] + 1, // Substitution
                ]
                .into_iter()
                .min()
@ -2241,14 +2310,285 @@ fn word_edit_distance(ocr: &[String], reference: &[String]) -> (usize, usize, us
            j -= 1;
        } else {
            // Default case (shouldn't happen in valid backtracking)
-            if i > 0 { i -= 1; }
-            if j > 0 { j -= 1; }
+            if i > 0 {
+                i -= 1;
+            }
+            if j > 0 {
+                j -= 1;
+            }
        }
    }

    (substitutions, insertions, deletions)
 }

+// ============ Assisted OCR Validation Filter (Phase 5.5.2) ============
+
+use crate::content_stream::Glyph;
+
+/// Distance threshold for assisted-OCR position validation (in PDF points).
+///
+/// If the center-to-center distance between an OCR word and the nearest
+/// vector glyph is less than this value, the OCR word is accepted with its
+/// full confidence. Otherwise, confidence is capped at 0.4.
+///
+/// 5 pt is approximately one space-character width at 12 pt font size.
+const ASSISTED_OCR_DISTANCE_PT: f64 = 5.0;
+
+/// Confidence cap for OCR words that fail position validation.
+///
+/// This value is below the 0.5 threshold used in bbox-merge (Phase 5.2.4),
+/// ensuring that unassisted OCR spans won't be preferred over legitimate
+/// vector spans.
+const ASSISTED_OCR_CONFIDENCE_CAP: f32 = 0.4;
+
+/// Minimum glyph count to justify building a KD-tree.
+///
+/// For small N (< 100), linear scan is faster due to lower overhead.
+const ASSISTED_OCR_KDTREE_THRESHOLD: usize = 100;
+
+/// Validate OCR words against vector glyph position hints.
+///
+/// This function implements the per-word validation filter for the
+/// BrokenVector assisted-OCR path (Phase 5.5.2). For each Tesseract word,
+/// it finds the nearest vector glyph bbox center and checks the distance:
+///
+/// - If distance < 5 pt: accept word with full OCR confidence
+/// - If distance >= 5 pt: cap confidence at 0.4
+///
+/// The 5pt threshold filters OCR text where positions disagree with the
+/// vector layer, indicating either OCR-of-OCR garbage or hallucinated text.
+///
+/// # Arguments
+///
+/// * `hocr_words` - OCR words from Tesseract (in PDF coordinates)
+/// * `vector_glyphs` - Position hints from Phase 3 (PositionHint mode)
+///
+/// # Returns
+///
+/// A `Vec<Span>` with `SpanSource::OcrAssisted` and adjusted confidence scores.
+/// The output preserves HOCR document order.
+///
+/// # Performance
+///
+/// - For < 100 glyphs: O(N*M) linear scan (N = OCR words, M = glyphs)
+/// - For >= 100 glyphs: Could use KD-tree for O(N*log(M)) (future optimization)
+///
+/// # Examples
+///
+/// ```ignore
+/// use pdftract_core::ocr::validate_ocr_with_position_hints;
+/// use pdftract_core::content_stream::Glyph;
+///
+/// // Position hints from Phase 3
+/// let glyphs = vec![
+///     Glyph::position_hint([100.0, 200.0, 110.0, 210.0]),
+/// ];
+///
+/// // OCR words from Tesseract (already converted to PDF coords)
+/// let mut words = vec![
+///     HocrWord { text: "hello".to_string(), bbox_px: [102, 202, 108, 208], confidence_0_100: 95 },
+/// ];
+///
+/// let spans = validate_ocr_with_position_hints(&words, &glyphs, 300, 792.0);
+/// // Word at (102, 202) is close to glyph at (100, 200) -> full confidence
+/// assert_eq!(spans[0].confidence, 0.95);
+/// ```
+///
+/// # See also
+///
+/// - Phase 5.5 pipeline step 3 (plan line 1935)
+/// - `Glyph::position_hint` for creating position-hint glyphs
+pub fn validate_ocr_with_position_hints(
+    hocr_words: &[HocrWord],
+    vector_glyphs: &[Glyph],
+    dpi: u32,
+    page_height_pt: f64,
+) -> Vec<crate::hybrid::Span> {
+    // Build list of vector glyph bbox centers for nearest-neighbor lookup
+    let glyph_centers: Vec<(f64, f64)> = vector_glyphs
+        .iter()
+        .map(|g| {
+            let bx = g.bbox;
+            ((bx[0] + bx[2]) / 2.0, (bx[1] + bx[3]) / 2.0)
+        })
+        .collect();
+
+    // For each OCR word, find nearest glyph and validate distance
+    hocr_words
+        .iter()
+        .map(|word| {
+            let pdf_bbox = word.to_pdf_bbox(dpi, page_height_pt, None, None);
+            let word_center = (
+                (pdf_bbox[0] + pdf_bbox[2]) / 2.0,
+                (pdf_bbox[1] + pdf_bbox[3]) / 2.0,
+            );
+
+            // Find nearest vector glyph center (linear scan - fast enough for N < 100)
+            let min_distance = glyph_centers
+                .iter()
+                .map(|&gx| {
+                    let dx = gx.0 - word_center.0;
+                    let dy = gx.1 - word_center.1;
+                    (dx * dx + dy * dy).sqrt()
+                })
+                .min()
+                .unwrap_or(f64::MAX); // No glyphs -> max distance
+
+            // Apply validation: cap confidence if distance >= 5pt
+            let ocr_confidence = word.confidence();
+            let adjusted_confidence = if min_distance < ASSISTED_OCR_DISTANCE_PT {
+                ocr_confidence
+            } else {
+                ocr_confidence.min(ASSISTED_OCR_CONFIDENCE_CAP)
+            };
+
+            crate::hybrid::Span::ocr_assisted(pdf_bbox, adjusted_confidence, word.text.clone())
+        })
+        .collect()
+}
+
+#[cfg(test)]
+mod assisted_ocr_tests {
+    use super::*;
+
+    #[test]
+    fn test_validation_filter_near_glyph() {
+        // OCR word center at (102, 201) is within 5pt of glyph at (100, 200)
+        let glyphs = vec![Glyph::position_hint([95.0, 195.0, 105.0, 205.0])];
+        let word = HocrWord {
+            text: "hello".to_string(),
+            bbox_px: [20, 20, 40, 40], // Will be converted to ~102, 201 at 300 DPI
+            confidence_0_100: 95,
+        };
+
+        let spans = validate_ocr_with_position_hints(&[word], &glyphs, 300, 792.0);
+
+        assert_eq!(spans.len(), 1);
+        // Should accept full confidence since distance < 5pt
+        assert!((spans[0].confidence - 0.95).abs() < f32::EPSILON);
+        assert_eq!(spans[0].source, crate::hybrid::SpanSource::OcrAssisted);
+        assert_eq!(spans[0].text, "hello");
+    }
+
+    #[test]
+    fn test_validation_filter_far_from_glyph() {
+        // OCR word center at (150, 250) is > 5pt from glyph at (100, 200)
+        let glyphs = vec![Glyph::position_hint([95.0, 195.0, 105.0, 205.0])];
+        let word = HocrWord {
+            text: "world".to_string(),
+            bbox_px: [500, 500, 550, 520], // Far from glyph
+            confidence_0_100: 95,
+        };
+
+        let spans = validate_ocr_with_position_hints(&[word], &glyphs, 300, 792.0);
+
+        assert_eq!(spans.len(), 1);
+        // Should cap confidence at 0.4 since distance >= 5pt
+        assert_eq!(spans[0].confidence, ASSISTED_OCR_CONFIDENCE_CAP);
+        assert_eq!(spans[0].source, crate::hybrid::SpanSource::OcrAssisted);
+    }
+
+    #[test]
+    fn test_validation_filter_confidence_already_below_cap() {
+        // OCR word with low confidence (30%) far from glyph should stay at 30%
+        let glyphs = vec![Glyph::position_hint([95.0, 195.0, 105.0, 205.0])];
+        let word = HocrWord {
+            text: "test".to_string(),
+            bbox_px: [500, 500, 550, 520],
+            confidence_0_100: 30,
+        };
+
+        let spans = validate_ocr_with_position_hints(&[word], &glyphs, 300, 792.0);
+
+        assert_eq!(spans.len(), 1);
+        // Should keep original confidence (already below cap)
+        assert_eq!(spans[0].confidence, 0.3);
+    }
+
+    #[test]
+    fn test_validation_filter_no_glyphs() {
+        // No position hints available -> cap all words
+        let glyphs: Vec<Glyph> = vec![];
+        let word = HocrWord {
+            text: "orphan".to_string(),
+            bbox_px: [100, 100, 150, 120],
+            confidence_0_100: 90,
+        };
+
+        let spans = validate_ocr_with_position_hints(&[word], &glyphs, 300, 792.0);
+
+        assert_eq!(spans.len(), 1);
+        // No glyphs -> max distance -> cap confidence
+        assert_eq!(spans[0].confidence, ASSISTED_OCR_CONFIDENCE_CAP);
+    }
+
+    #[test]
+    fn test_validation_filter_multiple_words_preserves_order() {
+        // Test that HOCR document order is preserved
+        let glyphs = vec![
+            Glyph::position_hint([100.0, 200.0, 110.0, 210.0]),
+            Glyph::position_hint([200.0, 200.0, 210.0, 210.0]),
+        ];
+
+        let words = vec![
+            HocrWord {
+                text: "first".to_string(),
+                bbox_px: [20, 20, 40, 40],
+                confidence_0_100: 90,
+            },
+            HocrWord {
+                text: "second".to_string(),
+                bbox_px: [500, 500, 550, 520], // Far from any glyph
+                confidence_0_100: 85,
+            },
+            HocrWord {
+                text: "third".to_string(),
+                bbox_px: [60, 20, 80, 40],
+                confidence_0_100: 95,
+            },
+        ];
+
+        let spans = validate_ocr_with_position_hints(&words, &glyphs, 300, 792.0);
+
+        assert_eq!(spans.len(), 3);
+        assert_eq!(spans[0].text, "first");
+        assert_eq!(spans[1].text, "second");
+        assert_eq!(spans[2].text, "third");
+
+        // First and third should have full confidence (near glyphs)
+        assert!((spans[0].confidence - 0.9).abs() < f32::EPSILON);
+        assert!((spans[2].confidence - 0.95).abs() < f32::EPSILON);
+
+        // Second should be capped (far from glyphs)
+        assert_eq!(spans[1].confidence, ASSISTED_OCR_CONFIDENCE_CAP);
+    }
+
+    #[test]
+    fn test_validation_filter_distance_threshold() {
+        // Test the exact 5pt boundary
+        let glyphs = vec![Glyph::position_hint([100.0, 200.0, 110.0, 210.0])];
+
+        // Word at exactly 5pt distance should be capped
+        let word_far = HocrWord {
+            text: "far".to_string(),
+            bbox_px: [1000, 1000, 1050, 1020],
+            confidence_0_100: 95,
+        };
+
+        let spans = validate_ocr_with_position_hints(&[word_far], &glyphs, 300, 792.0);
+        assert_eq!(spans[0].confidence, ASSISTED_OCR_CONFIDENCE_CAP);
+    }
+
+    #[test]
+    fn test_assisted_ocr_constants() {
+        // Verify the constants match the plan specification
+        assert_eq!(ASSISTED_OCR_DISTANCE_PT, 5.0);
+        assert_eq!(ASSISTED_OCR_CONFIDENCE_CAP, 0.4);
+        assert_eq!(ASSISTED_OCR_KDTREE_THRESHOLD, 100);
+    }
+}
+
 #[cfg(test)]
 mod wer_tests {
    use super::*;
@ -2304,13 +2644,19 @@ mod wer_tests {
    #[test]
    fn test_calculate_wer_empty_reference_nonempty_ocr() {
        let wer = calculate_wer("some text", "");
-        assert_eq!(wer, 1.0, "Non-empty OCR with empty reference should have WER = 1");
+        assert_eq!(
+            wer, 1.0,
+            "Non-empty OCR with empty reference should have WER = 1"
+        );
    }

    #[test]
    fn test_calculate_wer_empty_ocr_nonempty_reference() {
        let wer = calculate_wer("", "some text");
-        assert_eq!(wer, 1.0, "Empty OCR with non-empty reference should have WER = 1");
+        assert_eq!(
+            wer, 1.0,
+            "Empty OCR with non-empty reference should have WER = 1"
+        );
    }

    #[test]
@ -2375,7 +2721,11 @@ mod wer_tests {
    #[test]
    fn test_word_edit_distance_insertion_deletion() {
        let ocr = vec!["hello".to_string(), "there".to_string()];
-        let reference = vec!["hello".to_string(), "world".to_string(), "there".to_string()];
+        let reference = vec![
+            "hello".to_string(),
+            "world".to_string(),
+            "there".to_string(),
+        ];
        let (sub, ins, del) = word_edit_distance(&ocr, &reference);
        // "world" deleted from reference, but also could be seen as insertion
        // The algorithm counts it as:
--- a/crates/pdftract-core/src/options.rs
+++ b/crates/pdftract-core/src/options.rs
@ -3,9 +3,9 @@
 //! This module defines the options that control how PDFs are extracted,
 //! including the receipts mode for cryptographic provenance tracking.

-use serde::{Deserialize, Serialize};
 #[cfg(feature = "schemars")]
 use schemars::JsonSchema;
+use serde::{Deserialize, Serialize};

 /// Receipt generation mode.
 ///
--- a/crates/pdftract-core/src/parser/catalog.rs
+++ b/crates/pdftract-core/src/parser/catalog.rs
@ -4,10 +4,10 @@
 //! including Pages, Outlines, MarkInfo, StructTreeRoot, AcroForm, Names,
 //! Metadata, PageLabels, OCProperties, OpenAction, AA, and Version entries.

-use crate::parser::object::{ObjRef, PdfObject, intern};
-use crate::parser::xref::XrefResolver;
-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
+use crate::parser::object::{intern, ObjRef, PdfObject};
 use crate::parser::ocg::{parse_oc_properties, OcProperties};
+use crate::parser::xref::XrefResolver;

 /// Result type for catalog parsing.
 pub type Result<T> = std::result::Result<T, Vec<Diagnostic>>;
@ -150,9 +150,19 @@ impl PageLabelStyle {

        let mut result = String::new();
        let values = [
-            (1000, "M"), (900, "CM"), (500, "D"), (400, "CD"),
-            (100, "C"), (90, "XC"), (50, "L"), (40, "XL"),
-            (10, "X"), (9, "IX"), (5, "V"), (4, "IV"), (1, "I"),
+            (1000, "M"),
+            (900, "CM"),
+            (500, "D"),
+            (400, "CD"),
+            (100, "C"),
+            (90, "XC"),
+            (50, "L"),
+            (40, "XL"),
+            (10, "X"),
+            (9, "IX"),
+            (5, "V"),
+            (4, "IV"),
+            (1, "I"),
        ];

        for (val, sym) in values {
@ -208,24 +218,26 @@ impl PageLabel {
    fn parse(obj: &PdfObject) -> Option<Self> {
        let dict = obj.as_dict()?;

-        let style = dict.get("S")
+        let style = dict
+            .get("S")
            .and_then(|o| o.as_name())
            .and_then(PageLabelStyle::from_name)
            .unwrap_or(PageLabelStyle::Decimal);

-        let prefix = dict.get("P")
-            .and_then(|o| {
-                // Prefix can be either a String or a Name
-                o.as_string()
-                    .and_then(|bytes| String::from_utf8(bytes.to_vec()).ok())
-                    .or_else(|| o.as_name().map(|s| s.to_string()))
-            });
+        let prefix = dict.get("P").and_then(|o| {
+            // Prefix can be either a String or a Name
+            o.as_string()
+                .and_then(|bytes| String::from_utf8(bytes.to_vec()).ok())
+                .or_else(|| o.as_name().map(|s| s.to_string()))
+        });

-        let start = dict.get("St")
-            .and_then(|o| o.as_int())
-            .unwrap_or(1);
+        let start = dict.get("St").and_then(|o| o.as_int()).unwrap_or(1);

-        Some(PageLabel { style, prefix, start })
+        Some(PageLabel {
+            style,
+            prefix,
+            start,
+        })
    }

    /// Format a label for a given page index.
@ -332,7 +344,8 @@ impl PageLabelsTree {
    ///
    /// Returns the label for the most recent key <= page_index.
    pub fn get_label(&self, page_index: i64) -> Option<&PageLabel> {
-        self.get_label_with_start(page_index).map(|(label, _)| label)
+        self.get_label_with_start(page_index)
+            .map(|(label, _)| label)
    }

    /// Get all labels as a slice.
@ -402,7 +415,8 @@ impl Catalog {

    /// Add a diagnostic to the catalog.
    fn emit_diagnostic(&mut self, code: DiagCode, message: String) {
-        self.diagnostics.push(Diagnostic::with_dynamic_no_offset(code, message));
+        self.diagnostics
+            .push(Diagnostic::with_dynamic_no_offset(code, message));
    }
 }

@ -476,7 +490,10 @@ pub fn parse_catalog(resolver: &XrefResolver, root_ref: ObjRef) -> Result<Catalo
            // Emit STRUCT_MISSING_KEY diagnostic and return empty catalog
            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::StructMissingKey,
-                format!("STRUCT_MISSING_KEY: /Pages is not a reference (type: {})", other.type_name()),
+                format!(
+                    "STRUCT_MISSING_KEY: /Pages is not a reference (type: {})",
+                    other.type_name()
+                ),
            ));
            catalog.diagnostics = diagnostics;
            return Ok(catalog);
@ -624,11 +641,26 @@ mod tests {

    #[test]
    fn test_page_label_style_from_name() {
-        assert_eq!(PageLabelStyle::from_name("D"), Some(PageLabelStyle::Decimal));
-        assert_eq!(PageLabelStyle::from_name("R"), Some(PageLabelStyle::RomanUppercase));
-        assert_eq!(PageLabelStyle::from_name("r"), Some(PageLabelStyle::RomanLowercase));
-        assert_eq!(PageLabelStyle::from_name("A"), Some(PageLabelStyle::LettersUppercase));
-        assert_eq!(PageLabelStyle::from_name("a"), Some(PageLabelStyle::LettersLowercase));
+        assert_eq!(
+            PageLabelStyle::from_name("D"),
+            Some(PageLabelStyle::Decimal)
+        );
+        assert_eq!(
+            PageLabelStyle::from_name("R"),
+            Some(PageLabelStyle::RomanUppercase)
+        );
+        assert_eq!(
+            PageLabelStyle::from_name("r"),
+            Some(PageLabelStyle::RomanLowercase)
+        );
+        assert_eq!(
+            PageLabelStyle::from_name("A"),
+            Some(PageLabelStyle::LettersUppercase)
+        );
+        assert_eq!(
+            PageLabelStyle::from_name("a"),
+            Some(PageLabelStyle::LettersLowercase)
+        );
        assert_eq!(PageLabelStyle::from_name("X"), None);
    }

@ -687,26 +719,56 @@ mod tests {
        let mut tree = PageLabelsTree::new();

        // Page 0-2: roman numerals (i, ii, iii)
-        tree.labels.push((0, PageLabel {
-            style: PageLabelStyle::RomanLowercase,
-            prefix: None,
-            start: 1,
-        }));
+        tree.labels.push((
+            0,
+            PageLabel {
+                style: PageLabelStyle::RomanLowercase,
+                prefix: None,
+                start: 1,
+            },
+        ));

        // Page 3+: decimal (1, 2, 3, ...)
-        tree.labels.push((3, PageLabel {
-            style: PageLabelStyle::Decimal,
-            prefix: None,
-            start: 1,
-        }));
+        tree.labels.push((
+            3,
+            PageLabel {
+                style: PageLabelStyle::Decimal,
+                prefix: None,
+                start: 1,
+            },
+        ));

        // Test lookups using format_absolute for correct relative indexing
-        assert_eq!(tree.get_label_with_start(0).map(|(l, start)| l.format_absolute(0, start)), Some("i".to_string()));
-        assert_eq!(tree.get_label_with_start(1).map(|(l, start)| l.format_absolute(1, start)), Some("ii".to_string()));
-        assert_eq!(tree.get_label_with_start(2).map(|(l, start)| l.format_absolute(2, start)), Some("iii".to_string()));
-        assert_eq!(tree.get_label_with_start(3).map(|(l, start)| l.format_absolute(3, start)), Some("1".to_string()));
-        assert_eq!(tree.get_label_with_start(4).map(|(l, start)| l.format_absolute(4, start)), Some("2".to_string()));
-        assert_eq!(tree.get_label_with_start(5).map(|(l, start)| l.format_absolute(5, start)), Some("3".to_string()));
+        assert_eq!(
+            tree.get_label_with_start(0)
+                .map(|(l, start)| l.format_absolute(0, start)),
+            Some("i".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(1)
+                .map(|(l, start)| l.format_absolute(1, start)),
+            Some("ii".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(2)
+                .map(|(l, start)| l.format_absolute(2, start)),
+            Some("iii".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(3)
+                .map(|(l, start)| l.format_absolute(3, start)),
+            Some("1".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(4)
+                .map(|(l, start)| l.format_absolute(4, start)),
+            Some("2".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(5)
+                .map(|(l, start)| l.format_absolute(5, start)),
+            Some("3".to_string())
+        );
    }

    #[test]
@ -782,7 +844,10 @@ mod tests {
        // Empty catalog should have pages_ref = ObjRef::new(0, 0) from Default
        assert_eq!(catalog.pages_ref, ObjRef::new(0, 0));
        // Should have STRUCT_MISSING_KEY diagnostic
-        assert!(catalog.diagnostics.iter().any(|d| d.message.contains("STRUCT_MISSING_KEY")));
+        assert!(catalog
+            .diagnostics
+            .iter()
+            .any(|d| d.message.contains("STRUCT_MISSING_KEY")));
    }

    #[test]
@ -926,22 +991,40 @@ mod tests {
    fn test_page_labels_tree_with_prefix() {
        let mut tree = PageLabelsTree::new();

-        tree.labels.push((0, PageLabel {
-            style: PageLabelStyle::RomanLowercase,
-            prefix: Some("front-".to_string()),
-            start: 1,
-        }));
+        tree.labels.push((
+            0,
+            PageLabel {
+                style: PageLabelStyle::RomanLowercase,
+                prefix: Some("front-".to_string()),
+                start: 1,
+            },
+        ));

-        tree.labels.push((3, PageLabel {
-            style: PageLabelStyle::Decimal,
-            prefix: None,
-            start: 1,
-        }));
+        tree.labels.push((
+            3,
+            PageLabel {
+                style: PageLabelStyle::Decimal,
+                prefix: None,
+                start: 1,
+            },
+        ));

        // Test with prefix using format_absolute for correct relative indexing
-        assert_eq!(tree.get_label_with_start(0).map(|(l, start)| l.format_absolute(0, start)), Some("front-i".to_string()));
-        assert_eq!(tree.get_label_with_start(1).map(|(l, start)| l.format_absolute(1, start)), Some("front-ii".to_string()));
-        assert_eq!(tree.get_label_with_start(3).map(|(l, start)| l.format_absolute(3, start)), Some("1".to_string()));
+        assert_eq!(
+            tree.get_label_with_start(0)
+                .map(|(l, start)| l.format_absolute(0, start)),
+            Some("front-i".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(1)
+                .map(|(l, start)| l.format_absolute(1, start)),
+            Some("front-ii".to_string())
+        );
+        assert_eq!(
+            tree.get_label_with_start(3)
+                .map(|(l, start)| l.format_absolute(3, start)),
+            Some("1".to_string())
+        );
    }

    // Phase 7.1.4 Coverage Check Tests
@ -955,9 +1038,18 @@ mod tests {

    #[test]
    fn test_reading_order_algorithm_from_str() {
-        assert_eq!(ReadingOrderAlgorithm::from_str("struct_tree"), Some(ReadingOrderAlgorithm::StructTree));
-        assert_eq!(ReadingOrderAlgorithm::from_str("xy_cut"), Some(ReadingOrderAlgorithm::XyCut));
-        assert_eq!(ReadingOrderAlgorithm::from_str("docstrum"), Some(ReadingOrderAlgorithm::Docstrum));
+        assert_eq!(
+            ReadingOrderAlgorithm::from_str("struct_tree"),
+            Some(ReadingOrderAlgorithm::StructTree)
+        );
+        assert_eq!(
+            ReadingOrderAlgorithm::from_str("xy_cut"),
+            Some(ReadingOrderAlgorithm::XyCut)
+        );
+        assert_eq!(
+            ReadingOrderAlgorithm::from_str("docstrum"),
+            Some(ReadingOrderAlgorithm::Docstrum)
+        );
        assert_eq!(ReadingOrderAlgorithm::from_str("unknown"), None);
        assert_eq!(ReadingOrderAlgorithm::from_str(""), None);
    }
@ -1030,12 +1122,25 @@ mod proptests {
            Just(PdfObject::Null),
            any::<bool>().prop_map(PdfObject::Bool),
            any::<i64>().prop_map(PdfObject::Integer),
-            any::<f64>().prop_map(|f| if f.is_finite() { PdfObject::Real(f) } else { PdfObject::Real(0.0) }),
+            any::<f64>().prop_map(|f| if f.is_finite() {
+                PdfObject::Real(f)
+            } else {
+                PdfObject::Real(0.0)
+            }),
            prop::collection::vec(any::<u8>(), 0..100).prop_map(|v| PdfObject::String(Box::new(v))),
            "[a-zA-Z]{1,20}".prop_map(|s| PdfObject::Name(intern(&s))),
            prop::collection::vec(any::<u8>(), 0..100).prop_map(|bytes| {
                // Try to create a valid name from the bytes
-                let name: String = bytes.iter().map(|&b| if b.is_ascii_alphanumeric() { b as char } else { '_' }).collect();
+                let name: String = bytes
+                    .iter()
+                    .map(|&b| {
+                        if b.is_ascii_alphanumeric() {
+                            b as char
+                        } else {
+                            '_'
+                        }
+                    })
+                    .collect();
                PdfObject::Name(intern(&name))
            }),
        ]
@ -1043,14 +1148,13 @@ mod proptests {

    /// Strategy to generate arbitrary dictionaries for catalog fuzzing.
    fn arb_catalog_dict() -> impl Strategy<Value = indexmap::IndexMap<Arc<str>, PdfObject>> {
-        prop::collection::hash_map("[a-zA-Z]{1,10}", arb_pdf_object(0), 0..10)
-            .prop_map(|map| {
-                let mut index_map = indexmap::IndexMap::new();
-                for (k, v) in map {
-                    index_map.insert(k.into(), v);
-                }
-                index_map
-            })
+        prop::collection::hash_map("[a-zA-Z]{1,10}", arb_pdf_object(0), 0..10).prop_map(|map| {
+            let mut index_map = indexmap::IndexMap::new();
+            for (k, v) in map {
+                index_map.insert(k.into(), v);
+            }
+            index_map
+        })
    }

    proptest! {
--- a/crates/pdftract-core/src/parser/diagnostic.rs
+++ b/crates/pdftract-core/src/parser/diagnostic.rs
@ -101,7 +101,12 @@ impl Diagnostic {
    }

    /// Create a new diagnostic with a specific code.
-    pub fn new_with_code(code: DiagCode, severity: Severity, phase: impl Into<String>, message: impl Into<String>) -> Self {
+    pub fn new_with_code(
+        code: DiagCode,
+        severity: Severity,
+        phase: impl Into<String>,
+        message: impl Into<String>,
+    ) -> Self {
        Diagnostic {
            code,
            severity,
@ -131,7 +136,11 @@ impl Diagnostic {
    }

    /// Create an error diagnostic with a specific code.
-    pub fn error_with_code(code: DiagCode, phase: impl Into<String>, message: impl Into<String>) -> Self {
+    pub fn error_with_code(
+        code: DiagCode,
+        phase: impl Into<String>,
+        message: impl Into<String>,
+    ) -> Self {
        Diagnostic {
            code,
            severity: Severity::Error,
--- a/crates/pdftract-core/src/parser/lexer/mod.rs
+++ b/crates/pdftract-core/src/parser/lexer/mod.rs
@ -3,7 +3,7 @@
 //! This module provides the lexer that converts raw PDF byte sequences into tokens.
 //! PDF is byte-oriented; position tracking is byte-level, not character-level.

-use crate::diagnostics::{Diagnostic as Diag, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic as Diag};
 use std::str::FromStr;

 /// Token produced by the PDF lexer.
@ -386,7 +386,10 @@ impl<'a> Lexer<'a> {
    /// Internal: Skip whitespace and comments.
    fn skip_whitespace_and_comments(&mut self) {
        loop {
-            let had_whitespace = self.bytes.first().map_or(false, |&b| Self::is_pdf_whitespace(b));
+            let had_whitespace = self
+                .bytes
+                .first()
+                .map_or(false, |&b| Self::is_pdf_whitespace(b));
            let had_comment = self.bytes.first() == Some(&b'%');

            self.consume_whitespace();
@ -398,7 +401,11 @@ impl<'a> Lexer<'a> {
            }
            // If we consumed a comment, there might be more whitespace after it
            // If we consumed whitespace, there might be a comment after it
-            if self.bytes.first().map_or(true, |&b| !Self::is_pdf_whitespace(b) && b != b'%') {
+            if self
+                .bytes
+                .first()
+                .map_or(true, |&b| !Self::is_pdf_whitespace(b) && b != b'%')
+            {
                break;
            }
        }
@ -411,7 +418,9 @@ impl<'a> Lexer<'a> {
        // Check for "true"
        if self.bytes.starts_with(b"true") {
            let next_after = self.bytes.get(4);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(4);
                return Some(Token::Bool(true));
            }
@ -419,7 +428,9 @@ impl<'a> Lexer<'a> {
        // Check for "trailer"
        if self.bytes.starts_with(b"trailer") {
            let next_after = self.bytes.get(7);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(7);
                return Some(Token::Keyword(b"trailer".to_vec()));
            }
@ -432,7 +443,9 @@ impl<'a> Lexer<'a> {
        // Check for "false"
        if self.bytes.starts_with(b"false") {
            let next_after = self.bytes.get(5);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(5);
                return Some(Token::Bool(false));
            }
@ -445,7 +458,9 @@ impl<'a> Lexer<'a> {
        // Check for "xref"
        if self.bytes.starts_with(b"xref") {
            let next_after = self.bytes.get(4);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(4);
                return Some(Token::Keyword(b"xref".to_vec()));
            }
@ -458,7 +473,9 @@ impl<'a> Lexer<'a> {
        // Check for "%%EOF" - the PDF end-of-file marker
        if self.bytes.starts_with(b"%%EOF") {
            let next_after = self.bytes.get(5);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(5);
                return Some(Token::Keyword(b"%%EOF".to_vec()));
            }
@ -609,7 +626,10 @@ impl<'a> Lexer<'a> {
                    self.diagnostics.push(Diag::with_dynamic(
                        DiagCode::StructIntegerOverflow,
                        start as u64,
-                        format!("Integer '{}' exceeds i64 range, clamped to i64::MAX", num_str),
+                        format!(
+                            "Integer '{}' exceeds i64 range, clamped to i64::MAX",
+                            num_str
+                        ),
                    ));
                    self.advance(consumed);
                    Some(Token::Integer(i64::MAX))
@ -959,7 +979,9 @@ impl<'a> Lexer<'a> {
        // Check for "stream"
        if self.bytes.starts_with(b"stream") {
            let next_after = self.bytes.get(6);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(6);
                // Validate stream header: must be followed by \n or \r\n
                // PDF spec 7.3.8.1: stream keyword must be followed by \n or \r\n
@ -996,7 +1018,9 @@ impl<'a> Lexer<'a> {
        // Check for "startxref"
        if self.bytes.starts_with(b"startxref") {
            let next_after = self.bytes.get(10);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(10);
                return Some(Token::Keyword(b"startxref".to_vec()));
            }
@ -1009,7 +1033,9 @@ impl<'a> Lexer<'a> {
        // Check for "endstream"
        if self.bytes.starts_with(b"endstream") {
            let next_after = self.bytes.get(9);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(9);
                return Some(Token::EndStream);
            }
@ -1017,7 +1043,9 @@ impl<'a> Lexer<'a> {
        // Check for "endobj"
        if self.bytes.starts_with(b"endobj") {
            let next_after = self.bytes.get(7);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(7);
                return Some(Token::EndObj);
            }
@ -1030,7 +1058,9 @@ impl<'a> Lexer<'a> {
        // Check for "obj"
        if self.bytes.starts_with(b"obj") {
            let next_after = self.bytes.get(3);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(3);
                return Some(Token::Obj);
            }
@ -1042,7 +1072,9 @@ impl<'a> Lexer<'a> {
    fn lex_r_keyword(&mut self) -> Option<Token> {
        // Check for "R" (indirect reference)
        let next_after = self.bytes.get(1);
-        if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+        if next_after.map_or(true, |&b| {
+            Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+        }) {
            self.advance(1);
            Some(Token::IndirectRef)
        } else {
@ -1054,7 +1086,9 @@ impl<'a> Lexer<'a> {
        // Check for "null"
        if self.bytes.starts_with(b"null") {
            let next_after = self.bytes.get(4);
-            if next_after.map_or(true, |&b| Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)) {
+            if next_after.map_or(true, |&b| {
+                Self::is_pdf_whitespace(b) || Self::is_pdf_delimiter(b)
+            }) {
                self.advance(4);
                return Some(Token::Null);
            }
@ -1205,8 +1239,13 @@ mod tests {
        let mut lexer = Lexer::new(b"stream body");
        assert_eq!(lexer.next_token(), Some(Token::Stream));
        let diags = lexer.take_diagnostics();
-        assert!(!diags.is_empty(), "Should emit diagnostic for stream without proper line ending");
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidStreamHeader));
+        assert!(
+            !diags.is_empty(),
+            "Should emit diagnostic for stream without proper line ending"
+        );
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidStreamHeader));
    }

    #[test]
@ -1247,7 +1286,10 @@ mod tests {
    #[test]
    fn string_literal_simple_text() {
        let mut lexer = Lexer::new(b"(Hello World)");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"Hello World".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"Hello World".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1274,14 +1316,20 @@ mod tests {
    #[test]
    fn string_literal_escape_tab() {
        let mut lexer = Lexer::new(b"(col1\\tcol2)");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"col1\tcol2".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"col1\tcol2".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

    #[test]
    fn string_literal_escape_backspace() {
        let mut lexer = Lexer::new(b"(abc\\bdef)");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"abc\x08def".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"abc\x08def".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1298,21 +1346,30 @@ mod tests {
    #[test]
    fn string_literal_escape_backslash() {
        let mut lexer = Lexer::new(b"(path\\\\file)");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"path\\file".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"path\\file".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

    #[test]
    fn string_literal_escape_left_paren() {
        let mut lexer = Lexer::new(b"(\\(nested))");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"(nested)".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"(nested)".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

    #[test]
    fn string_literal_escape_right_paren() {
        let mut lexer = Lexer::new(b"(\\)not_end)");
-        assert_eq!(lexer.next_token(), Some(Token::String(b")not_end".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b")not_end".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1340,7 +1397,10 @@ mod tests {
    #[test]
    fn string_literal_octal_escape_non_octal_following() {
        let mut lexer = Lexer::new(b"(abc\\10A)");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"abc\x08A".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"abc\x08A".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1443,7 +1503,10 @@ mod tests {
    fn hex_string_mixed_case() {
        let mut lexer = Lexer::new(b"<aBcD>");
        // aB=0xAB, cD=0xCD
-        assert_eq!(lexer.next_token(), Some(Token::String(b"\xAB\xCD".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"\xAB\xCD".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1459,7 +1522,10 @@ mod tests {
    fn hex_string_odd_length_multiple_nibbles() {
        let mut lexer = Lexer::new(b"<48657>");
        // 48=0x48, 65=0x65, 7=0x70 (dangling nibble becomes HIGH nibble with LOW nibble 0)
-        assert_eq!(lexer.next_token(), Some(Token::String(b"\x48\x65\x70".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"\x48\x65\x70".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1501,7 +1567,10 @@ mod tests {
    #[test]
    fn hex_string_all_zero_bytes() {
        let mut lexer = Lexer::new(b"<000000>");
-        assert_eq!(lexer.next_token(), Some(Token::String(b"\x00\x00\x00".to_vec())));
+        assert_eq!(
+            lexer.next_token(),
+            Some(Token::String(b"\x00\x00\x00".to_vec()))
+        );
        assert_eq!(lexer.next_token(), Some(Token::Eof));
    }

@ -1579,15 +1648,16 @@ mod tests {
        use proptest::prelude::*;

        // Generate random byte sequences that start with < (but not << to avoid dict start)
-        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
-            // Ensure the input starts with '<' but NOT '<<'
-            // Insert '<' at the start, and ensure the second byte is not '<'
-            bytes.insert(0, b'<');
-            if bytes.len() > 1 && bytes[1] == b'<' {
-                bytes[1] = b'>'; // Change second byte to something non-'<'
-            }
-            bytes
-        });
+        let test_strategy =
+            prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
+                // Ensure the input starts with '<' but NOT '<<'
+                // Insert '<' at the start, and ensure the second byte is not '<'
+                bytes.insert(0, b'<');
+                if bytes.len() > 1 && bytes[1] == b'<' {
+                    bytes[1] = b'>'; // Change second byte to something non-'<'
+                }
+                bytes
+            });

        proptest!(|(bytes in test_strategy)| {
            // This should never panic
@ -1621,9 +1691,8 @@ mod tests {
        }

        // Generate valid hex strings and test roundtrip
-        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..100).prop_map(|bytes| {
-            encode_hex_string(&bytes)
-        });
+        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..100)
+            .prop_map(|bytes| encode_hex_string(&bytes));

        proptest!(|(encoded in test_strategy)| {
            let mut lexer = Lexer::new(&encoded);
@ -1650,11 +1719,12 @@ mod tests {
    fn proptest_string_never_panics_on_random_bytes() {
        use proptest::prelude::*;

-        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
-            // Ensure the input starts with '(' to trigger string lexing
-            bytes.insert(0, b'(');
-            bytes
-        });
+        let test_strategy =
+            prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
+                // Ensure the input starts with '(' to trigger string lexing
+                bytes.insert(0, b'(');
+                bytes
+            });

        proptest!(|(bytes in test_strategy)| {
            // This should never panic
@ -1670,14 +1740,17 @@ mod tests {
        // Strategy for generating valid literal strings
        // We generate bytes that can appear in a PDF string and wrap them in parens
        let test_strategy = prop::collection::vec(
-            prop::num::u8::ANY
-                .prop_filter("avoid unprintable and special chars that make testing hard", |&b| {
+            prop::num::u8::ANY.prop_filter(
+                "avoid unprintable and special chars that make testing hard",
+                |&b| {
                    // Allow most bytes, but filter out some that make roundtripping difficult
                    // We include parens but balance them manually
                    !matches!(b, 0x00 | 0x01..=0x08 | 0x0B | 0x0E..=0x1F)
-                }),
+                },
+            ),
            0..100,
-        ).prop_map(|mut bytes| {
+        )
+        .prop_map(|mut bytes| {
            // Balance parentheses: for every '(' we add a ')'
            let mut depth = 0i32;
            let mut result = Vec::new();
@ -1814,7 +1887,10 @@ mod tests {
            panic!("Expected Name token");
        }
        let diags = lexer.take_diagnostics();
-        assert!(diags.is_empty(), "Expected no diagnostics for exactly 127 bytes");
+        assert!(
+            diags.is_empty(),
+            "Expected no diagnostics for exactly 127 bytes"
+        );
    }

    #[test]
@ -1834,7 +1910,10 @@ mod tests {
            panic!("Expected Name token");
        }
        let diags = lexer.take_diagnostics();
-        assert!(diags.is_empty(), "Expected no diagnostics: 124 A's + #41 = 127 raw bytes");
+        assert!(
+            diags.is_empty(),
+            "Expected no diagnostics: 124 A's + #41 = 127 raw bytes"
+        );
    }

    #[test]
@ -1964,11 +2043,12 @@ mod tests {
    fn name_proptest_never_panics_on_random_bytes() {
        use proptest::prelude::*;

-        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
-            // Ensure the input starts with '/' to trigger name lexing
-            bytes.insert(0, b'/');
-            bytes
-        });
+        let test_strategy =
+            prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
+                // Ensure the input starts with '/' to trigger name lexing
+                bytes.insert(0, b'/');
+                bytes
+            });

        proptest!(|(bytes in test_strategy)| {
            // This should never panic
@ -1981,10 +2061,11 @@ mod tests {
    fn name_proptest_always_produces_valid_token() {
        use proptest::prelude::*;

-        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
-            bytes.insert(0, b'/');
-            bytes
-        });
+        let test_strategy =
+            prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
+                bytes.insert(0, b'/');
+                bytes
+            });

        proptest!(|(bytes in test_strategy)| {
            let mut lexer = Lexer::new(&bytes);
@ -2142,7 +2223,9 @@ mod tests {
        assert!(matches!(token, Some(Token::Integer(0)) | Some(Token::Null)));
        let diags = lexer.take_diagnostics();
        assert!(!diags.is_empty());
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidNumber));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidNumber));
    }

    #[test]
@ -2159,10 +2242,15 @@ mod tests {
        let mut lexer = Lexer::new(b"1.2.3");
        let token = lexer.next_token();
        // Should consume up to second dot and emit diagnostic
-        assert!(matches!(token, Some(Token::Integer(0)) | Some(Token::Real(_))));
+        assert!(matches!(
+            token,
+            Some(Token::Integer(0)) | Some(Token::Real(_))
+        ));
        let diags = lexer.take_diagnostics();
        assert!(!diags.is_empty());
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidNumber));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidNumber));
    }

    #[test]
@ -2173,7 +2261,9 @@ mod tests {
        assert!(matches!(token, Some(Token::Integer(0)) | Some(Token::Null)));
        let diags = lexer.take_diagnostics();
        assert!(!diags.is_empty());
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidNumber));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidNumber));
    }

    #[test]
@ -2191,16 +2281,20 @@ mod tests {
        use proptest::prelude::*;

        // Generate random byte sequences starting with numeric characters
-        let test_strategy = prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
-            // Ensure the input starts with a numeric-start character (+, -, ., 0-9)
-            if bytes.is_empty() {
-                bytes.push(b'1');
-            } else {
-                let numeric_starts = [b'+', b'-', b'.', b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8', b'9'];
-                bytes[0] = numeric_starts[bytes[0] as usize % numeric_starts.len()];
-            }
-            bytes
-        });
+        let test_strategy =
+            prop::collection::vec(prop::num::u8::ANY, 0..1000).prop_map(|mut bytes| {
+                // Ensure the input starts with a numeric-start character (+, -, ., 0-9)
+                if bytes.is_empty() {
+                    bytes.push(b'1');
+                } else {
+                    let numeric_starts = [
+                        b'+', b'-', b'.', b'0', b'1', b'2', b'3', b'4', b'5', b'6', b'7', b'8',
+                        b'9',
+                    ];
+                    bytes[0] = numeric_starts[bytes[0] as usize % numeric_starts.len()];
+                }
+                bytes
+            });

        proptest!(|(bytes in test_strategy)| {
            // This should never panic
--- a/crates/pdftract-core/src/parser/marked_content.rs
+++ b/crates/pdftract-core/src/parser/marked_content.rs
@ -17,9 +17,9 @@
 //!
 //! Coverage = claimed_mcids / total_mcids

-use crate::parser::object::PdfObject;
-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::parser::lexer::Lexer;
+use crate::parser::object::PdfObject;
 use std::collections::HashSet;

 /// Result type for marked content operations.
@ -81,7 +81,8 @@ impl McidTracker {

    /// Add a diagnostic.
    fn emit_diagnostic(&mut self, code: DiagCode, message: String) {
-        self.diagnostics.push(Diagnostic::with_dynamic_no_offset(code, message));
+        self.diagnostics
+            .push(Diagnostic::with_dynamic_no_offset(code, message));
    }

    /// Get all diagnostics emitted during tracking.
@ -184,7 +185,11 @@ impl CoverageResult {
 /// # Returns
 ///
 /// A `CoverageResult` containing the coverage ratio and fallback decision.
-pub fn compute_coverage(page_index: usize, total_mcids: usize, claimed_mcids: usize) -> CoverageResult {
+pub fn compute_coverage(
+    page_index: usize,
+    total_mcids: usize,
+    claimed_mcids: usize,
+) -> CoverageResult {
    CoverageResult::new(page_index, total_mcids, claimed_mcids)
 }

@ -412,7 +417,10 @@ mod tests {
        assert_eq!(result.claimed_mcids, 0);
        assert_eq!(result.coverage, 0.0);
        assert!(result.should_fallback); // No MCIDs = fallback
-        assert!(result.fallback_diagnostic().unwrap().contains("no marked-content sequences"));
+        assert!(result
+            .fallback_diagnostic()
+            .unwrap()
+            .contains("no marked-content sequences"));
    }

    #[test]
--- a/crates/pdftract-core/src/parser/marked_content_operators.rs
+++ b/crates/pdftract-core/src/parser/marked_content_operators.rs
@ -8,12 +8,12 @@
 //! - BDC /Tag <<props>> or BDC /Tag /PropName: begin marked content with properties
 //! - EMC: end marked content (pop top frame)

-use crate::parser::object::{PdfObject, ObjRef};
+use crate::diagnostics::{DiagCode, Diagnostic};
+use crate::parser::marked_content_stack::{MarkedContentFrame, MarkedContentStack};
+use crate::parser::object::{ObjRef, PdfObject};
 use crate::parser::resources::ResourceDict;
-use crate::parser::marked_content_stack::{MarkedContentStack, MarkedContentFrame};
-use crate::diagnostics::{Diagnostic, DiagCode};
-use std::sync::Arc;
 use indexmap::IndexMap;
+use std::sync::Arc;

 /// Parse BMC operator (begin marked content).
 ///
@ -245,10 +245,9 @@ mod tests {
    fn test_parse_bdc_with_property_name_found() {
        let mut stack = MarkedContentStack::new();
        let mut resources = ResourceDict::new();
-        resources.properties.insert(
-            Arc::from("MyProps"),
-            ObjRef::new(10, 0),
-        );
+        resources
+            .properties
+            .insert(Arc::from("MyProps"), ObjRef::new(10, 0));

        // Property name resolution requires full resolver, so this returns None
        assert!(parse_bdc(
@ -366,7 +365,12 @@ mod tests {
        // Outer BDC with MCID
        let mut props1 = IndexMap::new();
        props1.insert(intern("/MCID"), PdfObject::Integer(1));
-        parse_bdc(&mut stack, Arc::from("P"), &PdfObject::Dict(Box::new(props1)), &ResourceDict::new());
+        parse_bdc(
+            &mut stack,
+            Arc::from("P"),
+            &PdfObject::Dict(Box::new(props1)),
+            &ResourceDict::new(),
+        );

        // Inner BMC
        parse_bmc(&mut stack, Arc::from("Span"));
@ -400,7 +404,12 @@ mod tests {
        let mut props = IndexMap::new();
        props.insert(intern("/MCID"), PdfObject::Integer(5));

-        parse_bdc(&mut stack, Arc::from("/P"), &PdfObject::Dict(Box::new(props)), &ResourceDict::new());
+        parse_bdc(
+            &mut stack,
+            Arc::from("/P"),
+            &PdfObject::Dict(Box::new(props)),
+            &ResourceDict::new(),
+        );

        assert_eq!(stack.depth(), 1);
        assert_eq!(stack.innermost_frame().unwrap().tag, "/P");
--- a/crates/pdftract-core/src/parser/marked_content_stack.rs
+++ b/crates/pdftract-core/src/parser/marked_content_stack.rs
@ -6,7 +6,7 @@
 //! Per PDF spec section 14.5, the marked-content stack is independent of the
 //! graphics state stack — q/Q operators do not affect it.

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};

 /// Maximum depth of marked-content stack (prevents stack overflow).
 const MAX_MC_DEPTH: usize = 64;
@ -73,7 +73,11 @@ impl MarkedContentStack {
        if self.stack.len() >= MAX_MC_DEPTH {
            self.diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::MarkedContentDepthExceeded,
-                format!("Marked-content stack depth {} exceeds limit {}", self.stack.len() + 1, MAX_MC_DEPTH),
+                format!(
+                    "Marked-content stack depth {} exceeds limit {}",
+                    self.stack.len() + 1,
+                    MAX_MC_DEPTH
+                ),
            ));
            false
        } else {
@ -89,7 +93,11 @@ impl MarkedContentStack {
        if self.stack.len() >= MAX_MC_DEPTH {
            self.diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::MarkedContentDepthExceeded,
-                format!("Marked-content stack depth {} exceeds limit {}", self.stack.len() + 1, MAX_MC_DEPTH),
+                format!(
+                    "Marked-content stack depth {} exceeds limit {}",
+                    self.stack.len() + 1,
+                    MAX_MC_DEPTH
+                ),
            ));
            false
        } else {
@ -117,9 +125,7 @@ impl MarkedContentStack {
    ///
    /// Returns the MCID of the topmost frame that has one.
    pub fn innermost_mcid(&self) -> Option<u32> {
-        self.stack.iter()
-            .rev()
-            .find_map(|frame| frame.mcid)
+        self.stack.iter().rev().find_map(|frame| frame.mcid)
    }

    /// Get the innermost (top) frame, if any.
@ -247,7 +253,10 @@ mod tests {
        assert!(!stack.push_bmc("overflow".to_string()));
        assert_eq!(stack.depth(), MAX_MC_DEPTH);
        assert!(!stack.diagnostics().is_empty());
-        assert_eq!(stack.diagnostics().last().unwrap().code, DiagCode::MarkedContentDepthExceeded);
+        assert_eq!(
+            stack.diagnostics().last().unwrap().code,
+            DiagCode::MarkedContentDepthExceeded
+        );
    }

    #[test]
--- a/crates/pdftract-core/src/parser/mod.rs
+++ b/crates/pdftract-core/src/parser/mod.rs
@ -2,49 +2,50 @@
 //!
 //! This module provides the lexer and object parser for reading PDF documents.

+pub mod catalog;
 pub mod diagnostic;
 pub mod lexer;
+pub mod marked_content;
+pub mod marked_content_operators;
+pub mod marked_content_stack;
 pub mod object;
 pub mod objstm;
-pub mod xref;
-pub mod catalog;
-pub mod stream;
-pub mod secrets;
-pub mod pages;
-pub mod outline;
-pub mod resources;
 pub mod ocg;
+pub mod outline;
+pub mod pages;
+pub mod resources;
+pub mod secrets;
+pub mod stream;
 pub mod struct_tree;
-pub mod marked_content;
-pub mod marked_content_stack;
-pub mod marked_content_operators;
+pub mod xref;

 // Re-export from the unified diagnostics module (Phase 1.6)
-pub use crate::diagnostics::{Diagnostic, Severity, DiagCode, ObjRef};
-pub use object::{PdfObject};
-pub use objstm::{ObjectStmParser, ObjStmCacheEntry, ObjStmResult, ObjStmError};
-pub use xref::{
-    XrefResolver, XrefEntry, ResolveError, ResolveResult, XrefSection,
-    parse_traditional_xref, parse_xref_stream, merge_hybrid, is_hybrid_trailer,
-    LinearizationInfo, detect_linearization, load_xref_linearized, merge_linearized_xrefs,
-    load_xref_with_prev_chain,
-};
-pub use catalog::{Catalog, MarkInfo, PageLabel, PageLabelsTree, PageLabelStyle, ReadingOrderAlgorithm, parse_catalog};
-pub use ocg::{OcProperties, OcGroup, Ocmd, OcmdPolicy, BaseState, parse_oc_properties};
-pub use resources::{ResourceDict, merge_resources, extract_resources};
-pub use pages::{PageDict, flatten_page_tree, DEFAULT_MEDIABOX};
-pub use struct_tree::{
-    StructureType, StructElemNode, StructTreeRoot, RoleMap, Kid,
-    BlockKind, MappingResult, ParentTreeResolver, ParentTreeEntry,
-    parse_struct_tree, structure_type_to_block_kind, map_element_to_block, is_artifact,
-    check_coverage_for_pages, CoverageCheckResult,
+pub use crate::diagnostics::{DiagCode, Diagnostic, ObjRef, Severity};
+pub use catalog::{
+    parse_catalog, Catalog, MarkInfo, PageLabel, PageLabelStyle, PageLabelsTree,
+    ReadingOrderAlgorithm,
 };
 pub use marked_content::{
-    McidTracker, CoverageResult, compute_coverage, compute_coverage_from_sets,
+    compute_coverage, compute_coverage_from_sets, CoverageResult, McidTracker,
 };
+pub use marked_content_operators::{parse_bdc, parse_bmc, parse_emc};
 pub use marked_content_stack::{MarkedContentFrame, MarkedContentStack};
-pub use marked_content_operators::{parse_bmc, parse_bdc, parse_emc};
+pub use object::PdfObject;
+pub use objstm::{ObjStmCacheEntry, ObjStmError, ObjStmResult, ObjectStmParser};
+pub use ocg::{parse_oc_properties, BaseState, OcGroup, OcProperties, Ocmd, OcmdPolicy};
+pub use pages::{flatten_page_tree, PageDict, DEFAULT_MEDIABOX};
+pub use resources::{extract_resources, merge_resources, ResourceDict};
 pub use stream::{
-    StreamDecoder, FlateDecoder, ASCII85Decoder, ASCIIHexDecoder, CryptDecoder, PassthroughDecoder,
-    normalize_filter_name, get_decoder, FilterError, DEFAULT_MAX_DECOMPRESS_BYTES,
+    get_decoder, normalize_filter_name, ASCII85Decoder, ASCIIHexDecoder, CryptDecoder, FilterError,
+    FlateDecoder, PassthroughDecoder, StreamDecoder, DEFAULT_MAX_DECOMPRESS_BYTES,
+};
+pub use struct_tree::{
+    check_coverage_for_pages, is_artifact, map_element_to_block, parse_struct_tree,
+    structure_type_to_block_kind, BlockKind, CoverageCheckResult, Kid, MappingResult,
+    ParentTreeEntry, ParentTreeResolver, RoleMap, StructElemNode, StructTreeRoot, StructureType,
+};
+pub use xref::{
+    detect_linearization, is_hybrid_trailer, load_xref_linearized, load_xref_with_prev_chain,
+    merge_hybrid, merge_linearized_xrefs, parse_traditional_xref, parse_xref_stream,
+    LinearizationInfo, ResolveError, ResolveResult, XrefEntry, XrefResolver, XrefSection,
 };
--- a/crates/pdftract-core/src/parser/object/mod.rs
+++ b/crates/pdftract-core/src/parser/object/mod.rs
@ -2,8 +2,8 @@
 //!
 //! This module defines the core PDF object types and the object reference type.

-pub mod types;
 pub mod parser;
+pub mod types;

-pub use types::{ObjRef, PdfObject, PdfDict, PdfStream, PdfIndirect, intern};
 pub use parser::ObjectParser;
+pub use types::{intern, ObjRef, PdfDict, PdfIndirect, PdfObject, PdfStream};
--- a/crates/pdftract-core/src/parser/object/parser.rs
+++ b/crates/pdftract-core/src/parser/object/parser.rs
@ -3,9 +3,9 @@
 //! This module provides the parser that converts tokens from the lexer
 //! into PDF objects.

-use super::types::{intern, ObjRef, PdfDict, PdfObject, PdfStream, PdfIndirect};
+use super::types::{intern, ObjRef, PdfDict, PdfIndirect, PdfObject, PdfStream};
+use crate::diagnostics::{DiagCode, Diagnostic as Diag};
 use crate::parser::lexer::{Lexer, Token};
-use crate::diagnostics::{Diagnostic as Diag, DiagCode};

 /// Maximum nesting depth for dictionaries and arrays.
 ///
@ -233,7 +233,10 @@ impl<'a> ObjectParser<'a> {
                                        // Missing value - insert PdfNull
                                        self.diagnostics.push(Diag::with_dynamic_no_offset(
                                            DiagCode::StructInvalidDictValue,
-                                            format!("Dictionary key '{}' has no value, inserting null", key),
+                                            format!(
+                                                "Dictionary key '{}' has no value, inserting null",
+                                                key
+                                            ),
                                        ));
                                        dict.insert(key, PdfObject::Null);
                                        break; // End of dict
@ -258,7 +261,10 @@ impl<'a> ObjectParser<'a> {
                                ));
                                // Skip the invalid token and the next token (would-be value)
                                let _ = self.lexer.next_token();
-                                if !matches!(self.lexer.peek_token(), Some(Token::DictEnd) | Some(Token::Eof) | None) {
+                                if !matches!(
+                                    self.lexer.peek_token(),
+                                    Some(Token::DictEnd) | Some(Token::Eof) | None
+                                ) {
                                    let _ = self.lexer.next_token();
                                }
                                expecting_key = true;
@ -281,13 +287,18 @@ impl<'a> ObjectParser<'a> {
            let offset = self.lexer.position();

            // Try to get /Length from the dict
-            let len_hint = dict.get("Length").and_then(|obj| obj.as_int()).map(|i| i as u64);
+            let len_hint = dict
+                .get("Length")
+                .and_then(|obj| obj.as_int())
+                .map(|i| i as u64);

            // Skip the stream body
            self.skip_stream_body(len_hint);

            // Parse the stream object
-            return Some(PdfObject::Stream(Box::new(PdfStream::new(dict, offset, len_hint))));
+            return Some(PdfObject::Stream(Box::new(PdfStream::new(
+                dict, offset, len_hint,
+            ))));
        }

        Some(PdfObject::Dict(Box::new(dict)))
@ -315,7 +326,10 @@ impl<'a> ObjectParser<'a> {
            if actual_skipped < len_usize {
                self.diagnostics.push(Diag::with_dynamic_no_offset(
                    DiagCode::StructUnexpectedEof,
-                    format!("Stream truncated at EOF: expected {} bytes, got {}", len, actual_skipped),
+                    format!(
+                        "Stream truncated at EOF: expected {} bytes, got {}",
+                        len, actual_skipped
+                    ),
                ));
            }
        } else {
@ -337,7 +351,10 @@ impl<'a> ObjectParser<'a> {
            Some(other) => {
                self.diagnostics.push(Diag::with_dynamic_no_offset(
                    DiagCode::StructUnexpectedByte,
-                    format!("Expected endstream keyword after stream body, found {:?}", other),
+                    format!(
+                        "Expected endstream keyword after stream body, found {:?}",
+                        other
+                    ),
                ));
                // Try to recover by scanning forward for EndStream
                self.scan_to_endstream();
@ -639,7 +656,10 @@ impl<'a> ObjectParser<'a> {
                }
                // Now we're at the end of the first integer (object number)
                // Skip the digits of the object number (and optional minus sign)
-                while scan_back > 0 && (remaining[scan_back - 1].is_ascii_digit() || remaining[scan_back - 1] == b'-') {
+                while scan_back > 0
+                    && (remaining[scan_back - 1].is_ascii_digit()
+                        || remaining[scan_back - 1] == b'-')
+                {
                    scan_back -= 1;
                }
                // scan_back now points to the start of the object number
@ -738,11 +758,14 @@ mod tests {
    fn test_parse_array_of_integers() {
        let mut parser = ObjectParser::new(b"[ 1 2 3 ]");
        let obj = parser.parse_direct_object();
-        assert_eq!(obj, Some(PdfObject::Array(Box::new(vec![
-            PdfObject::Integer(1),
-            PdfObject::Integer(2),
-            PdfObject::Integer(3),
-        ]))));
+        assert_eq!(
+            obj,
+            Some(PdfObject::Array(Box::new(vec![
+                PdfObject::Integer(1),
+                PdfObject::Integer(2),
+                PdfObject::Integer(3),
+            ])))
+        );
    }

    #[test]
@ -825,7 +848,9 @@ mod tests {
            assert_eq!(dict.len(), 1);
            assert_eq!(dict.get("Type"), Some(&PdfObject::Null));
            let diags = parser.take_diagnostics();
-            assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidDictValue));
+            assert!(diags
+                .iter()
+                .any(|d| d.code == DiagCode::StructInvalidDictValue));
        } else {
            panic!("Expected dict, got {:?}", obj);
        }
@ -838,7 +863,9 @@ mod tests {
        if let Some(PdfObject::Dict(dict)) = obj {
            assert_eq!(dict.len(), 0);
            let diags = parser.take_diagnostics();
-            assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidDictKey));
+            assert!(diags
+                .iter()
+                .any(|d| d.code == DiagCode::StructInvalidDictKey));
        } else {
            panic!("Expected dict, got {:?}", obj);
        }
@ -925,7 +952,9 @@ mod tests {

        // Should have emitted STRUCT_DEPTH_EXCEEDED diagnostic
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructDepthExceeded));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructDepthExceeded));
    }

    #[test]
@ -950,7 +979,9 @@ mod tests {

        // Should have emitted STRUCT_INVALID_DICT_VALUE diagnostic for missing value
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidDictValue));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidDictValue));
    }

    #[test]
@ -961,7 +992,9 @@ mod tests {
        // Should return PdfNull with diagnostic
        assert_eq!(obj, Some(PdfObject::Null));
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidIndirectHeader));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidIndirectHeader));
    }

    #[test]
@ -997,7 +1030,11 @@ mod tests {
                Just("true".to_string()),
                Just("false".to_string()),
                any::<i64>().prop_map(|n| n.to_string()),
-                any::<f64>().prop_map(|f| if f.is_finite() { f.to_string() } else { "0.0".to_string() }),
+                any::<f64>().prop_map(|f| if f.is_finite() {
+                    f.to_string()
+                } else {
+                    "0.0".to_string()
+                }),
                // Names
                "[a-zA-Z]{1,10}".prop_map(|s| format!("/{}", s)),
                // Strings
@ -1108,7 +1145,9 @@ mod tests {

        // Should have emitted STRUCT_INTEGER_OVERFLOW diagnostic
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructIntegerOverflow));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructIntegerOverflow));
    }

    #[test]
@ -1123,7 +1162,9 @@ mod tests {

        // Should have emitted STRUCT_INTEGER_OVERFLOW diagnostic
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructIntegerOverflow));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructIntegerOverflow));
    }

    #[test]
@ -1137,7 +1178,9 @@ mod tests {

        // Should have emitted STRUCT_INVALID_INDIRECT_HEADER diagnostic
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidIndirectHeader));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidIndirectHeader));
    }

    #[test]
@ -1150,7 +1193,9 @@ mod tests {

        // Should have emitted STRUCT_INVALID_INDIRECT_HEADER diagnostic
        let diags = parser.take_diagnostics();
-        assert!(diags.iter().any(|d| d.code == DiagCode::StructInvalidIndirectHeader));
+        assert!(diags
+            .iter()
+            .any(|d| d.code == DiagCode::StructInvalidIndirectHeader));
    }

    #[test]
--- a/crates/pdftract-core/src/parser/object/types.rs
+++ b/crates/pdftract-core/src/parser/object/types.rs
@ -126,7 +126,11 @@ impl PdfStream {
    /// Create a new stream.
    #[inline]
    pub fn new(dict: PdfDict, offset: u64, len_hint: Option<u64>) -> Self {
-        Self { dict, offset, len_hint }
+        Self {
+            dict,
+            offset,
+            len_hint,
+        }
    }

    /// Get the /Filter entry from the stream dictionary.
@ -149,16 +153,18 @@ impl PdfStream {
            }
            PdfObject::Array(arr) => arr
                .iter()
-                .filter_map(|obj| obj.as_name().map(|n| {
-                    // Strip leading slash from filter name for normalization
-                    let name_str: &str = n.as_ref();
-                    let stripped = if name_str.starts_with('/') {
-                        &name_str[1..]
-                    } else {
-                        name_str
-                    };
-                    stripped.to_string()
-                }))
+                .filter_map(|obj| {
+                    obj.as_name().map(|n| {
+                        // Strip leading slash from filter name for normalization
+                        let name_str: &str = n.as_ref();
+                        let stripped = if name_str.starts_with('/') {
+                            &name_str[1..]
+                        } else {
+                            name_str
+                        };
+                        stripped.to_string()
+                    })
+                })
                .collect(),
            _ => return None,
        })
@ -521,7 +527,10 @@ mod tests {
        let obj = PdfObject::Dict(Box::new(dict.clone()));

        assert!(obj.as_dict().is_some());
-        assert_eq!(obj.as_dict().unwrap().get("Type").unwrap().as_name(), Some("Page"));
+        assert_eq!(
+            obj.as_dict().unwrap().get("Type").unwrap().as_name(),
+            Some("Page")
+        );
        assert_eq!(PdfObject::Integer(42).as_dict(), None);
    }

@ -544,7 +553,11 @@ mod tests {

    #[test]
    fn test_as_array() {
-        let arr = vec![PdfObject::Integer(1), PdfObject::Integer(2), PdfObject::Integer(3)];
+        let arr = vec![
+            PdfObject::Integer(1),
+            PdfObject::Integer(2),
+            PdfObject::Integer(3),
+        ];
        let obj = PdfObject::Array(Box::new(arr.clone()));

        assert!(obj.as_array().is_some());
@ -639,7 +652,10 @@ mod tests {
    fn test_pdf_object_indirect_variant() {
        let obj_ref = ObjRef::new(5, 1);
        let inner = PdfObject::Name(intern("Test"));
-        let indirect = PdfIndirect { id: obj_ref, obj: inner };
+        let indirect = PdfIndirect {
+            id: obj_ref,
+            obj: inner,
+        };
        let obj = PdfObject::Indirect(Box::new(indirect));

        assert!(obj.as_indirect().is_some());
--- a/crates/pdftract-core/src/parser/objstm.rs
+++ b/crates/pdftract-core/src/parser/objstm.rs
@ -29,9 +29,9 @@
 use std::collections::{HashMap, HashSet};
 use std::sync::{Arc, RwLock};

-use crate::parser::object::{ObjRef, PdfObject, PdfStream, ObjectParser};
+use crate::diagnostics::{DiagCode, Diagnostic};
+use crate::parser::object::{ObjRef, ObjectParser, PdfObject, PdfStream};
 use crate::parser::stream::{decode_stream, ExtractionOptions, PdfSource};
-use crate::diagnostics::{Diagnostic, DiagCode};

 /// Maximum depth for `/Extends` chain to prevent adversarial deep chains.
 const MAX_EXTENDS_DEPTH: u8 = 16;
@ -58,9 +58,15 @@ impl std::fmt::Display for ObjStmError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            ObjStmError::MissingKey { key } => write!(f, "Missing required key: {}", key),
-            ObjStmError::InvalidFormat { msg } => write!(f, "Invalid object stream format: {}", msg),
-            ObjStmError::CircularRef { obj_ref } => write!(f, "Circular reference in /Extends chain at {}", obj_ref),
-            ObjStmError::DepthExceeded { max } => write!(f, "Extends chain depth exceeded (max {})", max),
+            ObjStmError::InvalidFormat { msg } => {
+                write!(f, "Invalid object stream format: {}", msg)
+            }
+            ObjStmError::CircularRef { obj_ref } => {
+                write!(f, "Circular reference in /Extends chain at {}", obj_ref)
+            }
+            ObjStmError::DepthExceeded { max } => {
+                write!(f, "Extends chain depth exceeded (max {})", max)
+            }
            ObjStmError::DecompressionFailed => write!(f, "Stream decompression failed"),
        }
    }
@ -184,13 +190,11 @@ impl ObjectStmParser {
        // Load the object stream
        let stream = match resolve_fn(host_objstm_ref) {
            Some(s) => s,
-            None => return PdfObject::Null,    // Not found
+            None => return PdfObject::Null, // Not found
        };

        // Create a wrapper that handles the recursion properly
-        let resolve_wrapper = |ref_obj: ObjRef| -> Option<PdfStream> {
-            resolve_fn(ref_obj)
-        };
+        let resolve_wrapper = |ref_obj: ObjRef| -> Option<PdfStream> { resolve_fn(ref_obj) };

        match self.load_object_stream_impl(
            host_objstm_ref,
@ -207,15 +211,13 @@ impl ObjectStmParser {
                }

                // Return the requested object by 0-based index
-                entry.get(embedded_index as usize)
+                entry
+                    .get(embedded_index as usize)
                    .map(|(_, obj)| obj.clone())
                    .unwrap_or(PdfObject::Null)
            }
            Err(e) => {
-                self.emit_diagnostic(
-                    e.diag_code(),
-                    format!("Object stream error: {}", e),
-                );
+                self.emit_diagnostic(e.diag_code(), format!("Object stream error: {}", e));
                PdfObject::Null
            }
        }
@ -257,9 +259,7 @@ impl ObjectStmParser {
        }

        // Create a wrapper that handles the recursion properly
-        let resolve_wrapper = |ref_obj: ObjRef| -> Option<PdfStream> {
-            resolve_fn(ref_obj)
-        };
+        let resolve_wrapper = |ref_obj: ObjRef| -> Option<PdfStream> { resolve_fn(ref_obj) };

        match self.load_object_stream_impl(
            obj_stm_ref,
@ -302,12 +302,17 @@ impl ObjectStmParser {

        // Check for circular reference
        if in_progress.contains(&obj_stm_ref) {
-            return Err(ObjStmError::CircularRef { obj_ref: obj_stm_ref });
+            return Err(ObjStmError::CircularRef {
+                obj_ref: obj_stm_ref,
+            });
        }

        // Check cache first
        {
-            let cache = self.cache.read().map_err(|_| ObjStmError::DecompressionFailed)?;
+            let cache = self
+                .cache
+                .read()
+                .map_err(|_| ObjStmError::DecompressionFailed)?;
            if let Some(cached) = cache.get(&obj_stm_ref) {
                // Return the cached Arc directly (no clone)
                return Ok(cached.clone());
@ -323,7 +328,9 @@ impl ObjectStmParser {
        let n = stream_dict
            .get("/N")
            .and_then(|obj| obj.as_int())
-            .ok_or_else(|| ObjStmError::MissingKey { key: "/N".to_string() })? as u32;
+            .ok_or_else(|| ObjStmError::MissingKey {
+                key: "/N".to_string(),
+            })? as u32;

        let first = stream_dict
            .get("/First")
@ -344,7 +351,11 @@ impl ObjectStmParser {
        }

        #[cfg(test)]
-        eprintln!("DEBUG: decompressed {} bytes, first: {:?}", decompressed.len(), decompressed.get(0..20));
+        eprintln!(
+            "DEBUG: decompressed {} bytes, first: {:?}",
+            decompressed.len(),
+            decompressed.get(0..20)
+        );

        if decompressed.is_empty() {
            in_progress.remove(&obj_stm_ref);
@ -356,7 +367,11 @@ impl ObjectStmParser {
            in_progress.remove(&obj_stm_ref);
            self.emit_diagnostic(
                DiagCode::StructInvalidObjstm,
-                format!("ObjStm /First offset {} exceeds decompressed size {}", first, decompressed.len()),
+                format!(
+                    "ObjStm /First offset {} exceeds decompressed size {}",
+                    first,
+                    decompressed.len()
+                ),
            );
            return Ok(Arc::new(Vec::new()));
        }
@ -421,7 +436,10 @@ impl ObjectStmParser {
            let remaining = &decompressed[obj_start..];

            #[cfg(test)]
-            eprintln!("DEBUG: Parsing object {} at offset {}, remaining bytes: {:?}", obj_number, obj_start, remaining);
+            eprintln!(
+                "DEBUG: Parsing object {} at offset {}, remaining bytes: {:?}",
+                obj_number, obj_start, remaining
+            );

            let mut obj_parser = ObjectParser::new(remaining);

@ -478,12 +496,16 @@ impl ObjectStmParser {
                    Err(ObjStmError::CircularRef { .. }) => {
                        // Propagate circular reference errors
                        in_progress.remove(&obj_stm_ref);
-                        return Err(ObjStmError::CircularRef { obj_ref: extends_ref });
+                        return Err(ObjStmError::CircularRef {
+                            obj_ref: extends_ref,
+                        });
                    }
                    Err(ObjStmError::DepthExceeded { .. }) => {
                        // Propagate depth exceeded errors
                        in_progress.remove(&obj_stm_ref);
-                        return Err(ObjStmError::DepthExceeded { max: MAX_EXTENDS_DEPTH });
+                        return Err(ObjStmError::DepthExceeded {
+                            max: MAX_EXTENDS_DEPTH,
+                        });
                    }
                    Err(_) => {
                        // Failed to parse parent - just use our objects
@ -594,7 +616,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(2));
        dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        // Create a PdfStream with the dict and offset 0 (for MemorySource)
        let stream = PdfStream::new(dict.clone(), 0, Some(compressed.len() as u64));
@ -606,18 +631,13 @@ mod tests {
        // Mock resolve function that returns the stream
        let obj_stm_ref = ObjRef::new(10, 0);
        let stream_clone = stream.clone();
-        let result = parser.load_object_stream(
-            obj_stm_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == obj_stm_ref {
-                    Some(stream_clone.clone())
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(obj_stm_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == obj_stm_ref {
+                Some(stream_clone.clone())
+            } else {
+                None
+            }
+        });

        assert!(result.is_ok());
        let entry = result.unwrap();
@ -706,7 +726,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(10));
        dict.insert(intern("/First"), PdfObject::Integer(first as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        // Create a PdfStream with the dict and offset 0 (for MemorySource)
        let stream = PdfStream::new(dict.clone(), 0, Some(compressed.len() as u64));
@ -716,18 +739,13 @@ mod tests {

        let obj_stm_ref = ObjRef::new(10, 0);
        let stream_clone = stream.clone();
-        let result = parser.load_object_stream(
-            obj_stm_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == obj_stm_ref {
-                    Some(stream_clone.clone())
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(obj_stm_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == obj_stm_ref {
+                Some(stream_clone.clone())
+            } else {
+                None
+            }
+        });

        assert!(result.is_ok());
        let entry = result.unwrap();
@ -754,12 +772,7 @@ mod tests {
        let source = MemorySource::new(vec![0u8; 100]);
        let parser = ObjectStmParser::default();

-        let result = parser.load_object_stream(
-            ObjRef::new(1, 0),
-            &stream,
-            &source,
-            |_| None,
-        );
+        let result = parser.load_object_stream(ObjRef::new(1, 0), &stream, &source, |_| None);

        assert!(matches!(result, Err(ObjStmError::MissingKey { key }) if key == "/N"));
    }
@ -773,12 +786,7 @@ mod tests {
        let source = MemorySource::new(vec![0u8; 100]);
        let parser = ObjectStmParser::default();

-        let result = parser.load_object_stream(
-            ObjRef::new(1, 0),
-            &stream,
-            &source,
-            |_| None,
-        );
+        let result = parser.load_object_stream(ObjRef::new(1, 0), &stream, &source, |_| None);

        assert!(matches!(result, Err(ObjStmError::MissingKey { key }) if key == "/First"));
    }
@ -799,18 +807,13 @@ mod tests {
        // Mock resolve function that returns the same stream (circular reference)
        let self_ref = ObjRef::new(1, 0);
        let stream_clone = stream.clone();
-        let result = parser.load_object_stream(
-            self_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == self_ref {
-                    Some(stream_clone.clone())
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(self_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == self_ref {
+                Some(stream_clone.clone())
+            } else {
+                None
+            }
+        });

        assert!(matches!(result, Err(ObjStmError::CircularRef { .. })));
    }
@ -838,7 +841,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(2));
        dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        let stream = PdfStream::new(dict.clone(), 0, Some(compressed.len() as u64));

@ -849,18 +855,13 @@ mod tests {
        let stream_clone = stream.clone();

        // First call - should load and cache
-        let result1 = parser.load_object_stream(
-            obj_stm_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == obj_stm_ref {
-                    Some(stream_clone.clone())
-                } else {
-                    None
-                }
-            },
-        );
+        let result1 = parser.load_object_stream(obj_stm_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == obj_stm_ref {
+                Some(stream_clone.clone())
+            } else {
+                None
+            }
+        });

        assert!(result1.is_ok());
        let entry1 = result1.unwrap();
@ -893,9 +894,15 @@ mod tests {
        let mut parent_dict = PdfDict::new();
        parent_dict.insert(intern("/Type"), PdfObject::Name(intern("/ObjStm")));
        parent_dict.insert(intern("/N"), PdfObject::Integer(3));
-        parent_dict.insert(intern("/First"), PdfObject::Integer(parent_header.len() as i64));
+        parent_dict.insert(
+            intern("/First"),
+            PdfObject::Integer(parent_header.len() as i64),
+        );
        parent_dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        parent_dict.insert(intern("/Length"), PdfObject::Integer(parent_compressed.len() as i64));
+        parent_dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(parent_compressed.len() as i64),
+        );

        // Create child ObjStm (objects 4-5) that extends parent
        let child_header = b"4 0 5 4";
@ -913,9 +920,15 @@ mod tests {
        let mut child_dict = PdfDict::new();
        child_dict.insert(intern("/Type"), PdfObject::Name(intern("/ObjStm")));
        child_dict.insert(intern("/N"), PdfObject::Integer(2));
-        child_dict.insert(intern("/First"), PdfObject::Integer(child_header.len() as i64));
+        child_dict.insert(
+            intern("/First"),
+            PdfObject::Integer(child_header.len() as i64),
+        );
        child_dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        child_dict.insert(intern("/Length"), PdfObject::Integer(child_compressed.len() as i64));
+        child_dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(child_compressed.len() as i64),
+        );
        child_dict.insert(intern("/Extends"), PdfObject::Ref(parent_ref));

        let parser = ObjectStmParser::default();
@ -927,29 +940,16 @@ mod tests {
        let parent_dict_clone = parent_dict.clone();
        let child_stream = PdfStream::new(child_dict_clone.clone(), 0, None);

-        let result = parser.load_object_stream(
-            child_ref,
-            &child_stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == parent_ref {
-                    // Return parent stream
-                    Some(PdfStream::new(
-                        parent_dict_clone.clone(),
-                        0,
-                        None,
-                    ))
-                } else if ref_obj == child_ref {
-                    Some(PdfStream::new(
-                        child_dict_clone.clone(),
-                        0,
-                        None,
-                    ))
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(child_ref, &child_stream, &source, move |ref_obj| {
+            if ref_obj == parent_ref {
+                // Return parent stream
+                Some(PdfStream::new(parent_dict_clone.clone(), 0, None))
+            } else if ref_obj == child_ref {
+                Some(PdfStream::new(child_dict_clone.clone(), 0, None))
+            } else {
+                None
+            }
+        });

        // The test may not fully work due to source limitations,
        // but it verifies the /Extends handling doesn't crash
@ -979,7 +979,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(2));
        dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        let source = MemorySource::new(compressed);
        let parser = ObjectStmParser::default();
@ -1053,7 +1056,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(3));
        dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        let source = MemorySource::new(compressed);
        let parser = ObjectStmParser::default();
@ -1061,22 +1067,13 @@ mod tests {
        let obj_stm_ref = ObjRef::new(10, 0);
        let dict_clone = dict.clone();
        let stream = PdfStream::new(dict.clone(), 0, Some(compressed_len));
-        let result = parser.load_object_stream(
-            obj_stm_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == obj_stm_ref {
-                    Some(PdfStream::new(
-                        dict_clone.clone(),
-                        0,
-                        Some(compressed_len),
-                    ))
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(obj_stm_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == obj_stm_ref {
+                Some(PdfStream::new(dict_clone.clone(), 0, Some(compressed_len)))
+            } else {
+                None
+            }
+        });

        // Should succeed with partial objects
        assert!(result.is_ok());
@ -1121,7 +1118,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(2));
        dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        // Create parser with very small decompression limit
        let parser = ObjectStmParser::new(max_bytes);
@ -1130,22 +1130,13 @@ mod tests {
        let obj_stm_ref = ObjRef::new(10, 0);
        let dict_clone = dict.clone();
        let stream = PdfStream::new(dict.clone(), 0, None);
-        let result = parser.load_object_stream(
-            obj_stm_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == obj_stm_ref {
-                    Some(PdfStream::new(
-                        dict_clone.clone(),
-                        0,
-                        None,
-                    ))
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(obj_stm_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == obj_stm_ref {
+                Some(PdfStream::new(dict_clone.clone(), 0, None))
+            } else {
+                None
+            }
+        });

        // The result should be ok (we get what we can before hitting the limit)
        // but diagnostics should be emitted
@ -1183,7 +1174,10 @@ mod tests {
        dict.insert(intern("/N"), PdfObject::Integer(1));
        dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        let source = MemorySource::new(compressed);
        let parser = ObjectStmParser::default();
@ -1191,22 +1185,13 @@ mod tests {
        let obj_stm_ref = ObjRef::new(10, 0);
        let dict_clone = dict.clone();
        let stream = PdfStream::new(dict.clone(), 0, None);
-        let result = parser.load_object_stream(
-            obj_stm_ref,
-            &stream,
-            &source,
-            move |ref_obj| {
-                if ref_obj == obj_stm_ref {
-                    Some(PdfStream::new(
-                        dict_clone.clone(),
-                        0,
-                        None,
-                    ))
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(obj_stm_ref, &stream, &source, move |ref_obj| {
+            if ref_obj == obj_stm_ref {
+                Some(PdfStream::new(dict_clone.clone(), 0, None))
+            } else {
+                None
+            }
+        });

        assert!(result.is_ok());
        let entry = result.unwrap();
@ -1238,7 +1223,10 @@ mod tests {
        base_dict.insert(intern("/N"), PdfObject::Integer(1));
        base_dict.insert(intern("/First"), PdfObject::Integer(header.len() as i64));
        base_dict.insert(intern("/Filter"), PdfObject::Name(intern("/FlateDecode")));
-        base_dict.insert(intern("/Length"), PdfObject::Integer(compressed.len() as i64));
+        base_dict.insert(
+            intern("/Length"),
+            PdfObject::Integer(compressed.len() as i64),
+        );

        // Create a chain of ObjStms where each extends the previous
        // We'll create 18 dicts (0-17), each extending the previous
@ -1247,7 +1235,10 @@ mod tests {
            let mut dict = base_dict.clone();
            if i > 0 {
                // This ObjStm extends the previous one
-                dict.insert(intern("/Extends"), PdfObject::Ref(ObjRef::new(100 + (i as u32) - 1, 0)));
+                dict.insert(
+                    intern("/Extends"),
+                    PdfObject::Ref(ObjRef::new(100 + (i as u32) - 1, 0)),
+                );
            }
            dicts.push(dict);
        }
@ -1259,20 +1250,15 @@ mod tests {
        let obj_stm_17_ref = ObjRef::new(117, 0);
        let stream_17 = PdfStream::new(dicts[17].clone(), 0, None);

-        let result = parser.load_object_stream(
-            obj_stm_17_ref,
-            &stream_17,
-            &source,
-            |ref_obj| {
-                // Return a stream for any ref in the chain
-                if ref_obj.object >= 100 && ref_obj.object <= 117 {
-                    let idx = (ref_obj.object - 100) as usize;
-                    Some(PdfStream::new(dicts[idx].clone(), 0, None))
-                } else {
-                    None
-                }
-            },
-        );
+        let result = parser.load_object_stream(obj_stm_17_ref, &stream_17, &source, |ref_obj| {
+            // Return a stream for any ref in the chain
+            if ref_obj.object >= 100 && ref_obj.object <= 117 {
+                let idx = (ref_obj.object - 100) as usize;
+                Some(PdfStream::new(dicts[idx].clone(), 0, None))
+            } else {
+                None
+            }
+        });

        // Should fail with DepthExceeded
        assert!(matches!(result, Err(ObjStmError::DepthExceeded { .. })));
--- a/crates/pdftract-core/src/parser/ocg.rs
+++ b/crates/pdftract-core/src/parser/ocg.rs
@ -8,9 +8,9 @@

 use std::collections::HashMap;

-use crate::parser::{Diagnostic, DiagCode};
 use crate::parser::object::{intern, ObjRef, PdfDict, PdfObject};
 use crate::parser::xref::XrefResolver;
+use crate::parser::{DiagCode, Diagnostic};

 /// Base state for OCG visibility in the default configuration.
 ///
@ -102,15 +102,13 @@ impl Ocmd {
        // Parse /OCGs (can be a single ref or an array)
        let ocgs = match dict.get("OCGs") {
            Some(PdfObject::Ref(ref_)) => vec![*ref_],
-            Some(PdfObject::Array(arr)) => arr
-                .iter()
-                .filter_map(|o| o.as_ref())
-                .collect(),
+            Some(PdfObject::Array(arr)) => arr.iter().filter_map(|o| o.as_ref()).collect(),
            _ => return None,
        };

        // Parse /P (policy; defaults to AnyOn if absent per spec)
-        let policy = dict.get("P")
+        let policy = dict
+            .get("P")
            .and_then(|o| o.as_name())
            .and_then(OcmdPolicy::from_name)
            .unwrap_or(OcmdPolicy::AnyOn);
@ -153,7 +151,8 @@ impl OcGroup {

        // Parse /Name (required per spec, but we handle missing)
        if let Some(name_obj) = dict.get("Name") {
-            group.name = name_obj.as_string()
+            group.name = name_obj
+                .as_string()
                .or_else(|| name_obj.as_name().map(|s| s.as_bytes()))
                .and_then(|bytes| String::from_utf8(bytes.to_vec()).ok());
        }
@ -245,7 +244,8 @@ impl OcProperties {

    /// Evaluate an OCMD policy against current OCG states.
    fn evaluate_ocmd_policy(&self, ocmd: &Ocmd) -> bool {
-        let ocg_states: Vec<bool> = ocmd.ocgs
+        let ocg_states: Vec<bool> = ocmd
+            .ocgs
            .iter()
            .map(|&ref_| self.is_visible(ref_))
            .collect();
@ -279,10 +279,7 @@ impl Default for OcProperties {
 /// # Returns
 /// An `OcProperties` struct containing the parsed OCG information.
 /// If `oc_props_ref` is None, returns `OcProperties::not_present()`.
-pub fn parse_oc_properties(
-    resolver: &XrefResolver,
-    oc_props_ref: Option<ObjRef>,
-) -> OcProperties {
+pub fn parse_oc_properties(resolver: &XrefResolver, oc_props_ref: Option<ObjRef>) -> OcProperties {
    let oc_props_ref = match oc_props_ref {
        Some(r) => r,
        None => return OcProperties::not_present(),
@ -316,7 +313,10 @@ pub fn parse_oc_properties(
        None => {
            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::StructUnexpectedEof,
-                format!("/OCProperties is not a dictionary (type: {})", oc_props_obj.type_name()),
+                format!(
+                    "/OCProperties is not a dictionary (type: {})",
+                    oc_props_obj.type_name()
+                ),
            ));
            oc_properties.diagnostics = diagnostics;
            return oc_properties;
@ -325,10 +325,7 @@ pub fn parse_oc_properties(

    // Parse /OCGs array (required per spec)
    let ocg_refs: Vec<ObjRef> = match oc_props_dict.get("OCGs") {
-        Some(PdfObject::Array(arr)) => arr
-            .iter()
-            .filter_map(|o| o.as_ref())
-            .collect(),
+        Some(PdfObject::Array(arr)) => arr.iter().filter_map(|o| o.as_ref()).collect(),
        Some(other) => {
            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::StructUnexpectedEof,
@ -385,14 +382,17 @@ pub fn parse_oc_properties(
    };

    // Parse /BaseState (defaults to ON if absent)
-    oc_properties.base_state = default_config.get("BaseState")
+    oc_properties.base_state = default_config
+        .get("BaseState")
        .and_then(|o| o.as_name())
        .and_then(BaseState::from_name)
        .unwrap_or(BaseState::On);

    // Initialize all OCGs to base state
    for &ocg_ref in &ocg_refs {
-        oc_properties.default_visibility.insert(ocg_ref, oc_properties.base_state.as_bool());
+        oc_properties
+            .default_visibility
+            .insert(ocg_ref, oc_properties.base_state.as_bool());
    }

    // Apply /ON array (overrides BaseState for these OCGs)
@ -433,7 +433,10 @@ mod tests {
    fn make_test_ocg(obj_ref: ObjRef, name: &str, intent: Option<&str>) -> PdfObject {
        let mut dict = PdfDict::new();
        dict.insert(intern("Type"), PdfObject::Name(intern("OCG")));
-        dict.insert(intern("Name"), PdfObject::String(Box::new(name.as_bytes().to_vec())));
+        dict.insert(
+            intern("Name"),
+            PdfObject::String(Box::new(name.as_bytes().to_vec())),
+        );
        if let Some(i) = intent {
            dict.insert(intern("Intent"), PdfObject::Name(intern(i)));
        }
@ -444,7 +447,10 @@ mod tests {
    fn test_base_state_from_name() {
        assert_eq!(BaseState::from_name("ON"), Some(BaseState::On));
        assert_eq!(BaseState::from_name("OFF"), Some(BaseState::Off));
-        assert_eq!(BaseState::from_name("Unchanged"), Some(BaseState::Unchanged));
+        assert_eq!(
+            BaseState::from_name("Unchanged"),
+            Some(BaseState::Unchanged)
+        );
        assert_eq!(BaseState::from_name("Invalid"), None);
    }

@ -495,10 +501,13 @@ mod tests {

        // Create /OCProperties dict
        let mut oc_props_dict = PdfDict::new();
-        oc_props_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-            PdfObject::Ref(ocg2_ref),
-        ])));
+        oc_props_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(ocg1_ref),
+                PdfObject::Ref(ocg2_ref),
+            ])),
+        );

        let mut default_config = PdfDict::new();
        default_config.insert(intern("BaseState"), PdfObject::Name(intern("ON")));
@ -527,10 +536,13 @@ mod tests {
        resolver.cache_object(ocg2_ref, make_test_ocg(ocg2_ref, "Layer2", None));

        let mut oc_props_dict = PdfDict::new();
-        oc_props_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-            PdfObject::Ref(ocg2_ref),
-        ])));
+        oc_props_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(ocg1_ref),
+                PdfObject::Ref(ocg2_ref),
+            ])),
+        );

        let mut default_config = PdfDict::new();
        default_config.insert(intern("BaseState"), PdfObject::Name(intern("OFF")));
@ -559,18 +571,24 @@ mod tests {
        resolver.cache_object(ocg3_ref, make_test_ocg(ocg3_ref, "Layer3", None));

        let mut oc_props_dict = PdfDict::new();
-        oc_props_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-            PdfObject::Ref(ocg2_ref),
-            PdfObject::Ref(ocg3_ref),
-        ])));
+        oc_props_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(ocg1_ref),
+                PdfObject::Ref(ocg2_ref),
+                PdfObject::Ref(ocg3_ref),
+            ])),
+        );

        let mut default_config = PdfDict::new();
        default_config.insert(intern("BaseState"), PdfObject::Name(intern("OFF")));
-        default_config.insert(intern("ON"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-            PdfObject::Ref(ocg2_ref),
-        ])));
+        default_config.insert(
+            intern("ON"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(ocg1_ref),
+                PdfObject::Ref(ocg2_ref),
+            ])),
+        );
        oc_props_dict.insert(intern("D"), PdfObject::Dict(Box::new(default_config)));

        let oc_props_ref = ObjRef::new(1, 0);
@ -595,16 +613,20 @@ mod tests {
        resolver.cache_object(ocg2_ref, make_test_ocg(ocg2_ref, "Layer2", None));

        let mut oc_props_dict = PdfDict::new();
-        oc_props_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-            PdfObject::Ref(ocg2_ref),
-        ])));
+        oc_props_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(ocg1_ref),
+                PdfObject::Ref(ocg2_ref),
+            ])),
+        );

        let mut default_config = PdfDict::new();
        default_config.insert(intern("BaseState"), PdfObject::Name(intern("ON")));
-        default_config.insert(intern("OFF"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg2_ref),
-        ])));
+        default_config.insert(
+            intern("OFF"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(ocg2_ref)])),
+        );
        oc_props_dict.insert(intern("D"), PdfObject::Dict(Box::new(default_config)));

        let oc_props_ref = ObjRef::new(1, 0);
@ -626,19 +648,22 @@ mod tests {
        resolver.cache_object(ocg1_ref, make_test_ocg(ocg1_ref, "Layer1", None));

        let mut oc_props_dict = PdfDict::new();
-        oc_props_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-        ])));
+        oc_props_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(ocg1_ref)])),
+        );

        let mut default_config = PdfDict::new();
        default_config.insert(intern("BaseState"), PdfObject::Name(intern("OFF")));
        // OCG in both /ON and /OFF: /OFF wins per spec
-        default_config.insert(intern("ON"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-        ])));
-        default_config.insert(intern("OFF"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-        ])));
+        default_config.insert(
+            intern("ON"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(ocg1_ref)])),
+        );
+        default_config.insert(
+            intern("OFF"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(ocg1_ref)])),
+        );
        oc_props_dict.insert(intern("D"), PdfObject::Dict(Box::new(default_config)));

        let oc_props_ref = ObjRef::new(1, 0);
@ -658,9 +683,10 @@ mod tests {
        resolver.cache_object(ocg1_ref, make_test_ocg(ocg1_ref, "TestLayer", None));

        let mut oc_props_dict = PdfDict::new();
-        oc_props_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-        ])));
+        oc_props_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(ocg1_ref)])),
+        );

        let mut default_config = PdfDict::new();
        default_config.insert(intern("BaseState"), PdfObject::Name(intern("ON")));
@ -699,10 +725,13 @@ mod tests {

        let mut ocmd_dict = PdfDict::new();
        ocmd_dict.insert(intern("Type"), PdfObject::Name(intern("OCMD")));
-        ocmd_dict.insert(intern("OCGs"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(ocg1_ref),
-            PdfObject::Ref(ocg2_ref),
-        ])));
+        ocmd_dict.insert(
+            intern("OCGs"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(ocg1_ref),
+                PdfObject::Ref(ocg2_ref),
+            ])),
+        );
        ocmd_dict.insert(intern("P"), PdfObject::Name(intern("AllOn")));

        let ocmd = Ocmd::parse(&PdfObject::Dict(Box::new(ocmd_dict)));
@ -789,11 +818,17 @@ mod tests {
    fn test_ocg_group_parse() {
        let mut ocg_dict = PdfDict::new();
        ocg_dict.insert(intern("Type"), PdfObject::Name(intern("OCG")));
-        ocg_dict.insert(intern("Name"), PdfObject::String(Box::new(b"TestLayer".to_vec())));
-        ocg_dict.insert(intern("Intent"), PdfObject::Array(Box::new(vec![
-            PdfObject::Name(intern("View")),
-            PdfObject::Name(intern("Design")),
-        ])));
+        ocg_dict.insert(
+            intern("Name"),
+            PdfObject::String(Box::new(b"TestLayer".to_vec())),
+        );
+        ocg_dict.insert(
+            intern("Intent"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Name(intern("View")),
+                PdfObject::Name(intern("Design")),
+            ])),
+        );

        let group = OcGroup::parse(&PdfObject::Dict(Box::new(ocg_dict)), &mut Vec::new());

--- a/crates/pdftract-core/src/parser/outline.rs
+++ b/crates/pdftract-core/src/parser/outline.rs
@ -9,10 +9,10 @@
 //! - /Count indicates open (positive) or closed (negative) state
 //! - /Dest or /A specify the destination

+use crate::diagnostics::{DiagCode, Diagnostic};
 use crate::parser::object::{ObjRef, PdfObject};
 use crate::parser::pages::PageDict;
 use crate::parser::xref::XrefResolver;
-use crate::diagnostics::{Diagnostic, DiagCode};
 use std::collections::HashSet;

 /// Maximum depth of outline nesting to prevent stack overflow.
@ -173,12 +173,10 @@ fn decode_pdf_string(bytes: &[u8]) -> Result<String> {
 /// Decode UTF-16BE string with BOM (bytes after 0xFE 0xFF).
 fn decode_utf16be_bom(bytes: &[u8]) -> Result<String> {
    if bytes.len() % 2 != 0 {
-        return Err(vec![
-            Diagnostic::with_static_no_offset(
-                DiagCode::StructInvalidUtf16,
-                "STRUCT_INVALID_UTF16: UTF-16BE string has odd length",
-            )
-        ]);
+        return Err(vec![Diagnostic::with_static_no_offset(
+            DiagCode::StructInvalidUtf16,
+            "STRUCT_INVALID_UTF16: UTF-16BE string has odd length",
+        )]);
    }

    let utf16_chars: Vec<u16> = bytes
@ -187,12 +185,10 @@ fn decode_utf16be_bom(bytes: &[u8]) -> Result<String> {
        .collect();

    String::from_utf16(&utf16_chars).map_err(|_| {
-        vec![
-            Diagnostic::with_static_no_offset(
-                DiagCode::StructInvalidUtf16,
-                "STRUCT_INVALID_UTF16: Invalid UTF-16BE sequence",
-            )
-        ]
+        vec![Diagnostic::with_static_no_offset(
+            DiagCode::StructInvalidUtf16,
+            "STRUCT_INVALID_UTF16: Invalid UTF-16BE sequence",
+        )]
    })
 }

@ -246,252 +242,252 @@ fn decode_pdfdocencoding(bytes: &[u8]) -> Result<String> {
    // Key: octal value from spec, Value: Unicode codepoint
    fn pdfdoc_override(byte: u8) -> Option<char> {
        match byte {
-            0o010 => Some('\u{0000}'),      // NUL
-            0o011 => Some('\u{0001}'),      // SOH
-            0o012 => Some('\u{0002}'),      // STX
-            0o013 => Some('\u{0003}'),      // ETX
-            0o014 => Some('\u{0004}'),      // EOT
-            0o015 => Some('\u{0005}'),      // ENQ
-            0o016 => Some('\u{0006}'),      // ACK
-            0o017 => Some('\u{0007}'),      // BEL
-            0o020 => Some('\u{0008}'),      // BS
-            0o021 => Some('\u{0009}'),      // HT
-            0o022 => Some('\u{000A}'),      // LF
-            0o023 => Some('\u{000B}'),      // VT
-            0o024 => Some('\u{000C}'),      // FF
-            0o025 => Some('\u{000D}'),      // CR
-            0o026 => Some('\u{000E}'),      // SO
-            0o027 => Some('\u{000F}'),      // SI
-            0o030 => Some('\u{0010}'),      // DLE
-            0o031 => Some('\u{0011}'),      // DC1
-            0o032 => Some('\u{0012}'),      // DC2
-            0o033 => Some('\u{0013}'),      // DC3
-            0o034 => Some('\u{0014}'),      // DC4
-            0o035 => Some('\u{0015}'),      // NAK
-            0o036 => Some('\u{0016}'),      // SYN
-            0o037 => Some('\u{0017}'),      // ETB
-            0o040 => Some('\u{0020}'),      // Space (same as Latin-1)
-            0o041 => Some('\u{0021}'),      // !
-            0o042 => Some('\u{0022}'),      // "
-            0o043 => Some('\u{0023}'),      // #
-            0o044 => Some('\u{0024}'),      // $
-            0o045 => Some('\u{0025}'),      // %
-            0o046 => Some('\u{0026}'),      // &
-            0o047 => Some('\u{0027}'),      // '
-            0o050 => Some('\u{0028}'),      // (
-            0o051 => Some('\u{0029}'),      // )
-            0o052 => Some('\u{002A}'),      // *
-            0o053 => Some('\u{002B}'),      // +
-            0o054 => Some('\u{002C}'),      // ,
-            0o055 => Some('\u{002D}'),      // -
-            0o056 => Some('\u{002E}'),      // .
-            0o057 => Some('\u{002F}'),      // /
-            0o060 => Some('\u{0030}'),      // 0
-            0o061 => Some('\u{0031}'),      // 1
-            0o062 => Some('\u{0032}'),      // 2
-            0o063 => Some('\u{0033}'),      // 3
-            0o064 => Some('\u{0034}'),      // 4
-            0o065 => Some('\u{0035}'),      // 5
-            0o066 => Some('\u{0036}'),      // 6
-            0o067 => Some('\u{0037}'),      // 7
-            0o070 => Some('\u{0038}'),      // 8
-            0o071 => Some('\u{0039}'),      // 9
-            0o072 => Some('\u{003A}'),      // :
-            0o073 => Some('\u{003B}'),      // ;
-            0o074 => Some('\u{003C}'),      // <
-            0o075 => Some('\u{003D}'),      // =
-            0o076 => Some('\u{003E}'),      // >
-            0o077 => Some('\u{003F}'),      // ?
-            0o100 => Some('\u{0040}'),      // @
-            0o101 => Some('\u{0041}'),      // A
-            0o102 => Some('\u{0042}'),      // B
-            0o103 => Some('\u{0043}'),      // C
-            0o104 => Some('\u{0044}'),      // D
-            0o105 => Some('\u{0045}'),      // E
-            0o106 => Some('\u{0046}'),      // F
-            0o107 => Some('\u{0047}'),      // G
-            0o110 => Some('\u{0048}'),      // H
-            0o111 => Some('\u{0049}'),      // I
-            0o112 => Some('\u{004A}'),      // J
-            0o113 => Some('\u{004B}'),      // K
-            0o114 => Some('\u{004C}'),      // L
-            0o115 => Some('\u{004D}'),      // M
-            0o116 => Some('\u{004E}'),      // N
-            0o117 => Some('\u{004F}'),      // O
-            0o120 => Some('\u{0050}'),      // P
-            0o121 => Some('\u{0051}'),      // Q
-            0o122 => Some('\u{0052}'),      // R
-            0o123 => Some('\u{0053}'),      // S
-            0o124 => Some('\u{0054}'),      // T
-            0o125 => Some('\u{0055}'),      // U
-            0o126 => Some('\u{0056}'),      // V
-            0o127 => Some('\u{0057}'),      // W
-            0o130 => Some('\u{0058}'),      // X
-            0o131 => Some('\u{0059}'),      // Y
-            0o132 => Some('\u{005A}'),      // Z
-            0o133 => Some('\u{005B}'),      // [
-            0o134 => Some('\u{005C}'),      // \
-            0o135 => Some('\u{005D}'),      // ]
-            0o136 => Some('\u{005E}'),      // ^
-            0o137 => Some('\u{005F}'),      // _
-            0o140 => Some('\u{0060}'),      // `
-            0o141 => Some('\u{0061}'),      // a
-            0o142 => Some('\u{0062}'),      // b
-            0o143 => Some('\u{0063}'),      // c
-            0o144 => Some('\u{0064}'),      // d
-            0o145 => Some('\u{0065}'),      // e
-            0o146 => Some('\u{0066}'),      // f
-            0o147 => Some('\u{0067}'),      // g
-            0o150 => Some('\u{0068}'),      // h
-            0o151 => Some('\u{0069}'),      // i
-            0o152 => Some('\u{006A}'),      // j
-            0o153 => Some('\u{006B}'),      // k
-            0o154 => Some('\u{006C}'),      // l
-            0o155 => Some('\u{006D}'),      // m
-            0o156 => Some('\u{006E}'),      // n
-            0o157 => Some('\u{006F}'),      // o
-            0o160 => Some('\u{0070}'),      // p
-            0o161 => Some('\u{0071}'),      // q
-            0o162 => Some('\u{0072}'),      // r
-            0o163 => Some('\u{0073}'),      // s
-            0o164 => Some('\u{0074}'),      // t
-            0o165 => Some('\u{0075}'),      // u
-            0o166 => Some('\u{0076}'),      // v
-            0o167 => Some('\u{0077}'),      // w
-            0o170 => Some('\u{0078}'),      // x
-            0o171 => Some('\u{0079}'),      // y
-            0o172 => Some('\u{007A}'),      // z
-            0o173 => Some('\u{007B}'),      // {
-            0o174 => Some('\u{007C}'),      // |
-            0o175 => Some('\u{007D}'),      // }
-            0o176 => Some('\u{007E}'),      // ~
-            0o200 => Some('\u{2022}'),      // Bullet
-            0o201 => Some('\u{2020}'),      // Dagger
-            0o202 => Some('\u{2021}'),      // Double Dagger
-            0o203 => Some('\u{2026}'),      // Ellipsis
-            0o204 => Some('\u{2014}'),      // Em Dash
-            0o205 => Some('\u{2013}'),      // En Dash
-            0o206 => Some('\u{0192}'),      // Florin
-            0o207 => Some('\u{2044}'),      // Fraction
-            0o210 => Some('\u{2039}'),      // Single Left Angle Quote
-            0o211 => Some('\u{203A}'),      // Single Right Angle Quote
-            0o212 => Some('\u{201C}'),      // Double Left Quote
-            0o213 => Some('\u{201D}'),      // Double Right Quote
-            0o214 => Some('\u{2018}'),      // Single Left Quote
-            0o215 => Some('\u{2019}'),      // Single Right Quote
-            0o216 => Some('\u{201A}'),      // Single Low-9 Quote
-            0o217 => Some('\u{2122}'),      // Trademark
-            0o220 => Some('\u{FB01}'),      // fi ligature
-            0o221 => Some('\u{FB02}'),      // fl ligature
-            0o222 => Some('\u{0141}'),      // L with stroke
-            0o223 => Some('\u{0152}'),      // OE ligature
-            0o224 => Some('\u{0133}'),      // oe ligature
-            0o225 => Some('\u{0178}'),      // Y with diaeresis
-            0o226 => Some('\u{00A1}'),      // Inverted exclamation
-            0o227 => Some('\u{00BF}'),      // Inverted question mark
-            0o230 => Some('\u{00A1}'),      // Inverted exclamation (duplicate in spec)
-            0o231 => Some('\u{00BF}'),      // Inverted question mark (duplicate in spec)
-            0o232 => Some('\u{00A2}'),      // Cent sign
-            0o233 => Some('\u{00A3}'),      // Pound sign
-            0o234 => Some('\u{00A5}'),      // Yen sign
-            0o235 => Some('\u{20A7}'),      // Peseta sign (changed in PDF 2.0, using original)
-            0o236 => Some('\u{0192}'),      // Florin (duplicate)
-            0o240 => Some('\u{00E6}'),      // ae ligature
-            0o241 => Some('\u{0153}'),      // OE ligature (duplicate)
-            0o242 => Some('\u{0178}'),      // Y with diaeresis (duplicate)
-            0o243 => Some('\u{00C1}'),      // A with acute
-            0o244 => Some('\u{00C2}'),      // A with circumflex
-            0o245 => Some('\u{00C4}'),      // A with diaeresis
-            0o246 => Some('\u{00C0}'),      // A with grave
-            0o247 => Some('\u{00C5}'),      // A with ring
-            0o250 => Some('\u{00C7}'),      // C with cedilla
-            0o251 => Some('\u{00C9}'),      // E with acute
-            0o252 => Some('\u{00C9}'),      // E with acute (duplicate, using correct value)
-            0o253 => Some('\u{00CA}'),      // E with circumflex
-            0o254 => Some('\u{00CB}'),      // E with diaeresis
-            0o255 => Some('\u{00C8}'),      // E with grave
-            0o256 => Some('\u{00CD}'),      // I with acute
-            0o257 => Some('\u{00CE}'),      // I with circumflex
-            0o260 => Some('\u{00CF}'),      // I with diaeresis
-            0o261 => Some('\u{00CC}'),      // I with grave
-            0o262 => Some('\u{00D1}'),      // N with tilde
-            0o263 => Some('\u{00D3}'),      // O with acute
-            0o264 => Some('\u{00D4}'),      // O with circumflex
-            0o265 => Some('\u{00D6}'),      // O with diaeresis
-            0o266 => Some('\u{00D2}'),      // O with grave
-            0o267 => Some('\u{00D8}'),      // O with stroke
-            0o270 => Some('\u{0152}'),      // OE ligature (duplicate)
-            0o271 => Some('\u{00D5}'),      // O with tilde
-            0o272 => Some('\u{00D7}'),      // Multiplication
-            0o273 => Some('\u{00F7}'),      // Division
-            0o274 => Some('\u{0178}'),      // Y with diaeresis (duplicate)
-            0o275 => Some('\u{00E1}'),      // a with acute
-            0o276 => Some('\u{00E2}'),      // a with circumflex
-            0o277 => Some('\u{00E4}'),      // a with diaeresis
-            0o300 => Some('\u{00E0}'),      // a with grave
-            0o301 => Some('\u{00E5}'),      // a with ring
-            0o302 => Some('\u{00E7}'),      // c with cedilla
-            0o303 => Some('\u{00E9}'),      // e with acute
-            0o304 => Some('\u{00EA}'),      // e with circumflex
-            0o305 => Some('\u{00EB}'),      // e with diaeresis
-            0o306 => Some('\u{00E8}'),      // e with grave
-            0o307 => Some('\u{00ED}'),      // i with acute
-            0o310 => Some('\u{00EE}'),      // i with circumflex
-            0o311 => Some('\u{00EF}'),      // i with diaeresis
-            0o312 => Some('\u{00EC}'),      // i with grave
-            0o313 => Some('\u{00F1}'),      // n with tilde
-            0o314 => Some('\u{00F3}'),      // o with acute
-            0o315 => Some('\u{00F4}'),      // o with circumflex
-            0o316 => Some('\u{00F6}'),      // o with diaeresis
-            0o317 => Some('\u{00F2}'),      // o with grave
-            0o320 => Some('\u{00F8}'),      // o with stroke
-            0o321 => Some('\u{0153}'),      // oe ligature
-            0o322 => Some('\u{00F5}'),      // o with tilde
-            0o323 => Some('\u{00DF}'),      // Sharp s
-            0o324 => Some('\u{007B}'),      // { (duplicate)
-            0o325 => Some('\u{007D}'),      // } (duplicate)
-            0o326 => Some('\u{00A1}'),      // Inverted exclamation (duplicate)
-            0o327 => Some('\u{00BF}'),      // Inverted question mark (duplicate)
-            0o330 => Some('\u{0161}'),      // s with caron
-            0o331 => Some('\u{017D}'),      // Z with caron
-            0o332 => Some('\u{00A9}'),      // Copyright
-            0o333 => Some('\u{00AE}'),      // Registered
-            0o334 => Some('\u{2122}'),      // Trademark (duplicate)
-            0o335 => Some('\u{2212}'),      // Minus sign
-            0o336 => Some('\u{2012}'),      // Figure dash
-            0o337 => Some('\u{0452}'),      // Serbian soft sign
-            0o340 => Some('\u{0452}'),      // Serbian soft sign (duplicate)
-            0o341 => Some('\u{2013}'),      // En dash (duplicate)
-            0o342 => Some('\u{2014}'),      // Em dash (duplicate)
-            0o343 => Some('\u{201C}'),      // Double left quote (duplicate)
-            0o344 => Some('\u{201D}'),      // Double right quote (duplicate)
-            0o345 => Some('\u{2018}'),      // Single left quote (duplicate)
-            0o346 => Some('\u{2019}'),      // Single right quote (duplicate)
-            0o347 => Some('\u{2022}'),      // Bullet (duplicate)
-            0o350 => Some('\u{201A}'),      // Single low-9 quote (duplicate)
-            0o351 => Some('\u{2039}'),      // Single left angle quote (duplicate)
-            0o352 => Some('\u{203A}'),      // Single right angle quote (duplicate)
-            0o353 => Some('\u{2026}'),      // Ellipsis (duplicate)
-            0o354 => Some('\u{2020}'),      // Dagger (duplicate)
-            0o355 => Some('\u{2021}'),      // Double dagger (duplicate)
-            0o356 => Some('\u{20AC}'),      // Euro sign (PDF 1.4+)
-            0o357 => Some('\u{2030}'),      // Per mille
-            0o360 => Some('\u{0160}'),      // S with caron
-            0o361 => Some('\u{017E}'),      // z with caron
-            0o362 => Some('\u{0161}'),      // s with caron (duplicate)
-            0o363 => Some('\u{017D}'),      // Z with caron (duplicate)
-            0o364 => Some('\u{0178}'),      // Y with diaeresis (duplicate)
-            0o365 => Some('\u{00A1}'),      // Inverted exclamation (duplicate)
-            0o366 => Some('\u{00BF}'),      // Inverted question mark (duplicate)
-            0o367 => Some('\u{2212}'),      // Minus sign (duplicate)
-            0o370 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o371 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o372 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o373 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o374 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o375 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o376 => Some('\u{0000}'),      // Should be "unused" but using null
-            0o377 => Some('\u{0000}'),      // Should be "unused" but using null
+            0o010 => Some('\u{0000}'), // NUL
+            0o011 => Some('\u{0001}'), // SOH
+            0o012 => Some('\u{0002}'), // STX
+            0o013 => Some('\u{0003}'), // ETX
+            0o014 => Some('\u{0004}'), // EOT
+            0o015 => Some('\u{0005}'), // ENQ
+            0o016 => Some('\u{0006}'), // ACK
+            0o017 => Some('\u{0007}'), // BEL
+            0o020 => Some('\u{0008}'), // BS
+            0o021 => Some('\u{0009}'), // HT
+            0o022 => Some('\u{000A}'), // LF
+            0o023 => Some('\u{000B}'), // VT
+            0o024 => Some('\u{000C}'), // FF
+            0o025 => Some('\u{000D}'), // CR
+            0o026 => Some('\u{000E}'), // SO
+            0o027 => Some('\u{000F}'), // SI
+            0o030 => Some('\u{0010}'), // DLE
+            0o031 => Some('\u{0011}'), // DC1
+            0o032 => Some('\u{0012}'), // DC2
+            0o033 => Some('\u{0013}'), // DC3
+            0o034 => Some('\u{0014}'), // DC4
+            0o035 => Some('\u{0015}'), // NAK
+            0o036 => Some('\u{0016}'), // SYN
+            0o037 => Some('\u{0017}'), // ETB
+            0o040 => Some('\u{0020}'), // Space (same as Latin-1)
+            0o041 => Some('\u{0021}'), // !
+            0o042 => Some('\u{0022}'), // "
+            0o043 => Some('\u{0023}'), // #
+            0o044 => Some('\u{0024}'), // $
+            0o045 => Some('\u{0025}'), // %
+            0o046 => Some('\u{0026}'), // &
+            0o047 => Some('\u{0027}'), // '
+            0o050 => Some('\u{0028}'), // (
+            0o051 => Some('\u{0029}'), // )
+            0o052 => Some('\u{002A}'), // *
+            0o053 => Some('\u{002B}'), // +
+            0o054 => Some('\u{002C}'), // ,
+            0o055 => Some('\u{002D}'), // -
+            0o056 => Some('\u{002E}'), // .
+            0o057 => Some('\u{002F}'), // /
+            0o060 => Some('\u{0030}'), // 0
+            0o061 => Some('\u{0031}'), // 1
+            0o062 => Some('\u{0032}'), // 2
+            0o063 => Some('\u{0033}'), // 3
+            0o064 => Some('\u{0034}'), // 4
+            0o065 => Some('\u{0035}'), // 5
+            0o066 => Some('\u{0036}'), // 6
+            0o067 => Some('\u{0037}'), // 7
+            0o070 => Some('\u{0038}'), // 8
+            0o071 => Some('\u{0039}'), // 9
+            0o072 => Some('\u{003A}'), // :
+            0o073 => Some('\u{003B}'), // ;
+            0o074 => Some('\u{003C}'), // <
+            0o075 => Some('\u{003D}'), // =
+            0o076 => Some('\u{003E}'), // >
+            0o077 => Some('\u{003F}'), // ?
+            0o100 => Some('\u{0040}'), // @
+            0o101 => Some('\u{0041}'), // A
+            0o102 => Some('\u{0042}'), // B
+            0o103 => Some('\u{0043}'), // C
+            0o104 => Some('\u{0044}'), // D
+            0o105 => Some('\u{0045}'), // E
+            0o106 => Some('\u{0046}'), // F
+            0o107 => Some('\u{0047}'), // G
+            0o110 => Some('\u{0048}'), // H
+            0o111 => Some('\u{0049}'), // I
+            0o112 => Some('\u{004A}'), // J
+            0o113 => Some('\u{004B}'), // K
+            0o114 => Some('\u{004C}'), // L
+            0o115 => Some('\u{004D}'), // M
+            0o116 => Some('\u{004E}'), // N
+            0o117 => Some('\u{004F}'), // O
+            0o120 => Some('\u{0050}'), // P
+            0o121 => Some('\u{0051}'), // Q
+            0o122 => Some('\u{0052}'), // R
+            0o123 => Some('\u{0053}'), // S
+            0o124 => Some('\u{0054}'), // T
+            0o125 => Some('\u{0055}'), // U
+            0o126 => Some('\u{0056}'), // V
+            0o127 => Some('\u{0057}'), // W
+            0o130 => Some('\u{0058}'), // X
+            0o131 => Some('\u{0059}'), // Y
+            0o132 => Some('\u{005A}'), // Z
+            0o133 => Some('\u{005B}'), // [
+            0o134 => Some('\u{005C}'), // \
+            0o135 => Some('\u{005D}'), // ]
+            0o136 => Some('\u{005E}'), // ^
+            0o137 => Some('\u{005F}'), // _
+            0o140 => Some('\u{0060}'), // `
+            0o141 => Some('\u{0061}'), // a
+            0o142 => Some('\u{0062}'), // b
+            0o143 => Some('\u{0063}'), // c
+            0o144 => Some('\u{0064}'), // d
+            0o145 => Some('\u{0065}'), // e
+            0o146 => Some('\u{0066}'), // f
+            0o147 => Some('\u{0067}'), // g
+            0o150 => Some('\u{0068}'), // h
+            0o151 => Some('\u{0069}'), // i
+            0o152 => Some('\u{006A}'), // j
+            0o153 => Some('\u{006B}'), // k
+            0o154 => Some('\u{006C}'), // l
+            0o155 => Some('\u{006D}'), // m
+            0o156 => Some('\u{006E}'), // n
+            0o157 => Some('\u{006F}'), // o
+            0o160 => Some('\u{0070}'), // p
+            0o161 => Some('\u{0071}'), // q
+            0o162 => Some('\u{0072}'), // r
+            0o163 => Some('\u{0073}'), // s
+            0o164 => Some('\u{0074}'), // t
+            0o165 => Some('\u{0075}'), // u
+            0o166 => Some('\u{0076}'), // v
+            0o167 => Some('\u{0077}'), // w
+            0o170 => Some('\u{0078}'), // x
+            0o171 => Some('\u{0079}'), // y
+            0o172 => Some('\u{007A}'), // z
+            0o173 => Some('\u{007B}'), // {
+            0o174 => Some('\u{007C}'), // |
+            0o175 => Some('\u{007D}'), // }
+            0o176 => Some('\u{007E}'), // ~
+            0o200 => Some('\u{2022}'), // Bullet
+            0o201 => Some('\u{2020}'), // Dagger
+            0o202 => Some('\u{2021}'), // Double Dagger
+            0o203 => Some('\u{2026}'), // Ellipsis
+            0o204 => Some('\u{2014}'), // Em Dash
+            0o205 => Some('\u{2013}'), // En Dash
+            0o206 => Some('\u{0192}'), // Florin
+            0o207 => Some('\u{2044}'), // Fraction
+            0o210 => Some('\u{2039}'), // Single Left Angle Quote
+            0o211 => Some('\u{203A}'), // Single Right Angle Quote
+            0o212 => Some('\u{201C}'), // Double Left Quote
+            0o213 => Some('\u{201D}'), // Double Right Quote
+            0o214 => Some('\u{2018}'), // Single Left Quote
+            0o215 => Some('\u{2019}'), // Single Right Quote
+            0o216 => Some('\u{201A}'), // Single Low-9 Quote
+            0o217 => Some('\u{2122}'), // Trademark
+            0o220 => Some('\u{FB01}'), // fi ligature
+            0o221 => Some('\u{FB02}'), // fl ligature
+            0o222 => Some('\u{0141}'), // L with stroke
+            0o223 => Some('\u{0152}'), // OE ligature
+            0o224 => Some('\u{0133}'), // oe ligature
+            0o225 => Some('\u{0178}'), // Y with diaeresis
+            0o226 => Some('\u{00A1}'), // Inverted exclamation
+            0o227 => Some('\u{00BF}'), // Inverted question mark
+            0o230 => Some('\u{00A1}'), // Inverted exclamation (duplicate in spec)
+            0o231 => Some('\u{00BF}'), // Inverted question mark (duplicate in spec)
+            0o232 => Some('\u{00A2}'), // Cent sign
+            0o233 => Some('\u{00A3}'), // Pound sign
+            0o234 => Some('\u{00A5}'), // Yen sign
+            0o235 => Some('\u{20A7}'), // Peseta sign (changed in PDF 2.0, using original)
+            0o236 => Some('\u{0192}'), // Florin (duplicate)
+            0o240 => Some('\u{00E6}'), // ae ligature
+            0o241 => Some('\u{0153}'), // OE ligature (duplicate)
+            0o242 => Some('\u{0178}'), // Y with diaeresis (duplicate)
+            0o243 => Some('\u{00C1}'), // A with acute
+            0o244 => Some('\u{00C2}'), // A with circumflex
+            0o245 => Some('\u{00C4}'), // A with diaeresis
+            0o246 => Some('\u{00C0}'), // A with grave
+            0o247 => Some('\u{00C5}'), // A with ring
+            0o250 => Some('\u{00C7}'), // C with cedilla
+            0o251 => Some('\u{00C9}'), // E with acute
+            0o252 => Some('\u{00C9}'), // E with acute (duplicate, using correct value)
+            0o253 => Some('\u{00CA}'), // E with circumflex
+            0o254 => Some('\u{00CB}'), // E with diaeresis
+            0o255 => Some('\u{00C8}'), // E with grave
+            0o256 => Some('\u{00CD}'), // I with acute
+            0o257 => Some('\u{00CE}'), // I with circumflex
+            0o260 => Some('\u{00CF}'), // I with diaeresis
+            0o261 => Some('\u{00CC}'), // I with grave
+            0o262 => Some('\u{00D1}'), // N with tilde
+            0o263 => Some('\u{00D3}'), // O with acute
+            0o264 => Some('\u{00D4}'), // O with circumflex
+            0o265 => Some('\u{00D6}'), // O with diaeresis
+            0o266 => Some('\u{00D2}'), // O with grave
+            0o267 => Some('\u{00D8}'), // O with stroke
+            0o270 => Some('\u{0152}'), // OE ligature (duplicate)
+            0o271 => Some('\u{00D5}'), // O with tilde
+            0o272 => Some('\u{00D7}'), // Multiplication
+            0o273 => Some('\u{00F7}'), // Division
+            0o274 => Some('\u{0178}'), // Y with diaeresis (duplicate)
+            0o275 => Some('\u{00E1}'), // a with acute
+            0o276 => Some('\u{00E2}'), // a with circumflex
+            0o277 => Some('\u{00E4}'), // a with diaeresis
+            0o300 => Some('\u{00E0}'), // a with grave
+            0o301 => Some('\u{00E5}'), // a with ring
+            0o302 => Some('\u{00E7}'), // c with cedilla
+            0o303 => Some('\u{00E9}'), // e with acute
+            0o304 => Some('\u{00EA}'), // e with circumflex
+            0o305 => Some('\u{00EB}'), // e with diaeresis
+            0o306 => Some('\u{00E8}'), // e with grave
+            0o307 => Some('\u{00ED}'), // i with acute
+            0o310 => Some('\u{00EE}'), // i with circumflex
+            0o311 => Some('\u{00EF}'), // i with diaeresis
+            0o312 => Some('\u{00EC}'), // i with grave
+            0o313 => Some('\u{00F1}'), // n with tilde
+            0o314 => Some('\u{00F3}'), // o with acute
+            0o315 => Some('\u{00F4}'), // o with circumflex
+            0o316 => Some('\u{00F6}'), // o with diaeresis
+            0o317 => Some('\u{00F2}'), // o with grave
+            0o320 => Some('\u{00F8}'), // o with stroke
+            0o321 => Some('\u{0153}'), // oe ligature
+            0o322 => Some('\u{00F5}'), // o with tilde
+            0o323 => Some('\u{00DF}'), // Sharp s
+            0o324 => Some('\u{007B}'), // { (duplicate)
+            0o325 => Some('\u{007D}'), // } (duplicate)
+            0o326 => Some('\u{00A1}'), // Inverted exclamation (duplicate)
+            0o327 => Some('\u{00BF}'), // Inverted question mark (duplicate)
+            0o330 => Some('\u{0161}'), // s with caron
+            0o331 => Some('\u{017D}'), // Z with caron
+            0o332 => Some('\u{00A9}'), // Copyright
+            0o333 => Some('\u{00AE}'), // Registered
+            0o334 => Some('\u{2122}'), // Trademark (duplicate)
+            0o335 => Some('\u{2212}'), // Minus sign
+            0o336 => Some('\u{2012}'), // Figure dash
+            0o337 => Some('\u{0452}'), // Serbian soft sign
+            0o340 => Some('\u{0452}'), // Serbian soft sign (duplicate)
+            0o341 => Some('\u{2013}'), // En dash (duplicate)
+            0o342 => Some('\u{2014}'), // Em dash (duplicate)
+            0o343 => Some('\u{201C}'), // Double left quote (duplicate)
+            0o344 => Some('\u{201D}'), // Double right quote (duplicate)
+            0o345 => Some('\u{2018}'), // Single left quote (duplicate)
+            0o346 => Some('\u{2019}'), // Single right quote (duplicate)
+            0o347 => Some('\u{2022}'), // Bullet (duplicate)
+            0o350 => Some('\u{201A}'), // Single low-9 quote (duplicate)
+            0o351 => Some('\u{2039}'), // Single left angle quote (duplicate)
+            0o352 => Some('\u{203A}'), // Single right angle quote (duplicate)
+            0o353 => Some('\u{2026}'), // Ellipsis (duplicate)
+            0o354 => Some('\u{2020}'), // Dagger (duplicate)
+            0o355 => Some('\u{2021}'), // Double dagger (duplicate)
+            0o356 => Some('\u{20AC}'), // Euro sign (PDF 1.4+)
+            0o357 => Some('\u{2030}'), // Per mille
+            0o360 => Some('\u{0160}'), // S with caron
+            0o361 => Some('\u{017E}'), // z with caron
+            0o362 => Some('\u{0161}'), // s with caron (duplicate)
+            0o363 => Some('\u{017D}'), // Z with caron (duplicate)
+            0o364 => Some('\u{0178}'), // Y with diaeresis (duplicate)
+            0o365 => Some('\u{00A1}'), // Inverted exclamation (duplicate)
+            0o366 => Some('\u{00BF}'), // Inverted question mark (duplicate)
+            0o367 => Some('\u{2212}'), // Minus sign (duplicate)
+            0o370 => Some('\u{0000}'), // Should be "unused" but using null
+            0o371 => Some('\u{0000}'), // Should be "unused" but using null
+            0o372 => Some('\u{0000}'), // Should be "unused" but using null
+            0o373 => Some('\u{0000}'), // Should be "unused" but using null
+            0o374 => Some('\u{0000}'), // Should be "unused" but using null
+            0o375 => Some('\u{0000}'), // Should be "unused" but using null
+            0o376 => Some('\u{0000}'), // Should be "unused" but using null
+            0o377 => Some('\u{0000}'), // Should be "unused" but using null
            _ => None,
        }
    }
@ -596,7 +592,10 @@ fn parse_outline_recursive(
    if !visited.insert(node_ref) {
        diagnostics.push(Diagnostic::with_dynamic_no_offset(
            DiagCode::StructCircularRef,
-            format!("STRUCT_CIRCULAR_REF: Cycle detected at outline node {}", node_ref),
+            format!(
+                "STRUCT_CIRCULAR_REF: Cycle detected at outline node {}",
+                node_ref
+            ),
        ));
        return None;
    }
@ -605,7 +604,10 @@ fn parse_outline_recursive(
    if depth >= MAX_OUTLINE_DEPTH {
        diagnostics.push(Diagnostic::with_dynamic_no_offset(
            DiagCode::StructDepthExceeded,
-            format!("STRUCT_DEPTH_EXCEEDED: Outline depth exceeds limit of {}", MAX_OUTLINE_DEPTH),
+            format!(
+                "STRUCT_DEPTH_EXCEEDED: Outline depth exceeds limit of {}",
+                MAX_OUTLINE_DEPTH
+            ),
        ));
        return None;
    }
@ -645,7 +647,10 @@ fn parse_outline_recursive(
        None => {
            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::StructMissingKey,
-                format!("STRUCT_MISSING_KEY: Outline node {} missing /Title", node_ref),
+                format!(
+                    "STRUCT_MISSING_KEY: Outline node {} missing /Title",
+                    node_ref
+                ),
            ));
            String::from("<missing title>")
        }
@ -879,7 +884,9 @@ mod tests {
        let result = decode_pdf_string(&utf16be);
        assert!(result.is_err());
        let diags = result.unwrap_err();
-        assert!(diags.iter().any(|d| d.message.contains("STRUCT_INVALID_UTF16")));
+        assert!(diags
+            .iter()
+            .any(|d| d.message.contains("STRUCT_INVALID_UTF16")));
    }

    #[test]
@ -1000,7 +1007,10 @@ mod tests {

        // Create a simple outline item
        let mut outline_dict = IndexMap::new();
-        outline_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Chapter 1".to_vec())));
+        outline_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Chapter 1".to_vec())),
+        );
        outline_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
            dest.push(PdfObject::Ref(ObjRef::new(10, 0)));
@ -1030,7 +1040,10 @@ mod tests {

        // Create an outline item with /Count
        let mut outline_dict = IndexMap::new();
-        outline_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Section".to_vec())));
+        outline_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Section".to_vec())),
+        );
        outline_dict.insert(intern("Count"), PdfObject::Integer(-3)); // Collapsed with 3 descendants
        outline_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
@ -1059,7 +1072,10 @@ mod tests {

        // Create child outline
        let mut child_dict = IndexMap::new();
-        child_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Section 1.1".to_vec())));
+        child_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Section 1.1".to_vec())),
+        );
        child_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
            dest.push(PdfObject::Ref(ObjRef::new(12, 0)));
@ -1071,7 +1087,10 @@ mod tests {

        // Create parent outline with /First pointing to child
        let mut parent_dict = IndexMap::new();
-        parent_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Chapter 1".to_vec())));
+        parent_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Chapter 1".to_vec())),
+        );
        parent_dict.insert(intern("First"), PdfObject::Ref(ObjRef::new(101, 0)));
        parent_dict.insert(intern("Count"), PdfObject::Integer(1)); // One child

@ -1097,7 +1116,10 @@ mod tests {

        // Level 3: Grandchild
        let mut grandchild_dict = IndexMap::new();
-        grandchild_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Section 1.1.1".to_vec())));
+        grandchild_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Section 1.1.1".to_vec())),
+        );
        grandchild_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
            dest.push(PdfObject::Ref(ObjRef::new(10, 0)));
@ -1105,11 +1127,17 @@ mod tests {
            PdfObject::Array(Box::new(dest))
        });

-        resolver.cache_object(ObjRef::new(102, 0), PdfObject::Dict(Box::new(grandchild_dict)));
+        resolver.cache_object(
+            ObjRef::new(102, 0),
+            PdfObject::Dict(Box::new(grandchild_dict)),
+        );

        // Level 2: Child with /First pointing to grandchild
        let mut child_dict = IndexMap::new();
-        child_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Section 1.1".to_vec())));
+        child_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Section 1.1".to_vec())),
+        );
        child_dict.insert(intern("First"), PdfObject::Ref(ObjRef::new(102, 0)));
        child_dict.insert(intern("Count"), PdfObject::Integer(1));

@ -1117,7 +1145,10 @@ mod tests {

        // Level 1: Parent with /First pointing to child
        let mut parent_dict = IndexMap::new();
-        parent_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Chapter 1".to_vec())));
+        parent_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Chapter 1".to_vec())),
+        );
        parent_dict.insert(intern("First"), PdfObject::Ref(ObjRef::new(101, 0)));
        parent_dict.insert(intern("Count"), PdfObject::Integer(2));

@ -1145,7 +1176,10 @@ mod tests {

        // Create second sibling
        let mut sibling2_dict = IndexMap::new();
-        sibling2_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Chapter 2".to_vec())));
+        sibling2_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Chapter 2".to_vec())),
+        );
        sibling2_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
            dest.push(PdfObject::Ref(ObjRef::new(11, 0)));
@ -1153,11 +1187,17 @@ mod tests {
            PdfObject::Array(Box::new(dest))
        });

-        resolver.cache_object(ObjRef::new(101, 0), PdfObject::Dict(Box::new(sibling2_dict)));
+        resolver.cache_object(
+            ObjRef::new(101, 0),
+            PdfObject::Dict(Box::new(sibling2_dict)),
+        );

        // Create first sibling with /Next pointing to second
        let mut sibling1_dict = IndexMap::new();
-        sibling1_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Chapter 1".to_vec())));
+        sibling1_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Chapter 1".to_vec())),
+        );
        sibling1_dict.insert(intern("Next"), PdfObject::Ref(ObjRef::new(101, 0)));
        sibling1_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
@ -1166,7 +1206,10 @@ mod tests {
            PdfObject::Array(Box::new(dest))
        });

-        resolver.cache_object(ObjRef::new(100, 0), PdfObject::Dict(Box::new(sibling1_dict)));
+        resolver.cache_object(
+            ObjRef::new(100, 0),
+            PdfObject::Dict(Box::new(sibling1_dict)),
+        );

        // Create outlines root
        let mut root_dict = IndexMap::new();
@ -1188,16 +1231,28 @@ mod tests {

        // Create an outline that forms a cycle: 100 -> 101 -> 100
        let mut outline1_dict = IndexMap::new();
-        outline1_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Outline 1".to_vec())));
+        outline1_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Outline 1".to_vec())),
+        );
        outline1_dict.insert(intern("Next"), PdfObject::Ref(ObjRef::new(101, 0)));

-        resolver.cache_object(ObjRef::new(100, 0), PdfObject::Dict(Box::new(outline1_dict)));
+        resolver.cache_object(
+            ObjRef::new(100, 0),
+            PdfObject::Dict(Box::new(outline1_dict)),
+        );

        let mut outline2_dict = IndexMap::new();
-        outline2_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Outline 2".to_vec())));
+        outline2_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Outline 2".to_vec())),
+        );
        outline2_dict.insert(intern("Next"), PdfObject::Ref(ObjRef::new(100, 0))); // Cycle back

-        resolver.cache_object(ObjRef::new(101, 0), PdfObject::Dict(Box::new(outline2_dict)));
+        resolver.cache_object(
+            ObjRef::new(101, 0),
+            PdfObject::Dict(Box::new(outline2_dict)),
+        );

        // Create outlines root
        let mut root_dict = IndexMap::new();
@ -1208,7 +1263,9 @@ mod tests {
        // Should get both outlines before detecting the cycle
        assert_eq!(outlines.len(), 2);
        // Should have a cycle diagnostic
-        assert!(diags.iter().any(|d| d.message.contains("STRUCT_CIRCULAR_REF")));
+        assert!(diags
+            .iter()
+            .any(|d| d.message.contains("STRUCT_CIRCULAR_REF")));
    }

    #[test]
@ -1236,7 +1293,9 @@ mod tests {
        let (outlines, diags) = parse_outlines(&resolver, Some(ObjRef::new(99, 0)), &pages);
        assert_eq!(outlines.len(), 1);
        assert_eq!(outlines[0].title, "<missing title>");
-        assert!(diags.iter().any(|d| d.message.contains("STRUCT_MISSING_KEY")));
+        assert!(diags
+            .iter()
+            .any(|d| d.message.contains("STRUCT_MISSING_KEY")));
    }

    #[test]
@ -1257,7 +1316,10 @@ mod tests {
        action_dict.insert(intern("D"), PdfObject::Array(Box::new(goto_dest)));

        let mut outline_dict = IndexMap::new();
-        outline_dict.insert(intern("Title"), PdfObject::String(Box::new(b"GoTo Test".to_vec())));
+        outline_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"GoTo Test".to_vec())),
+        );
        outline_dict.insert(intern("A"), PdfObject::Dict(Box::new(action_dict)));

        resolver.cache_object(ObjRef::new(100, 0), PdfObject::Dict(Box::new(outline_dict)));
@ -1289,10 +1351,16 @@ mod tests {
        // Create an outline with /A /URI action
        let mut action_dict = IndexMap::new();
        action_dict.insert(intern("S"), PdfObject::Name(intern("URI")));
-        action_dict.insert(intern("URI"), PdfObject::String(Box::new(b"https://example.com".to_vec())));
+        action_dict.insert(
+            intern("URI"),
+            PdfObject::String(Box::new(b"https://example.com".to_vec())),
+        );

        let mut outline_dict = IndexMap::new();
-        outline_dict.insert(intern("Title"), PdfObject::String(Box::new(b"External Link".to_vec())));
+        outline_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"External Link".to_vec())),
+        );
        outline_dict.insert(intern("A"), PdfObject::Dict(Box::new(action_dict)));

        resolver.cache_object(ObjRef::new(100, 0), PdfObject::Dict(Box::new(outline_dict)));
@ -1306,7 +1374,9 @@ mod tests {
        assert_eq!(outlines.len(), 1);
        assert_eq!(outlines[0].title, "External Link");
        assert_eq!(outlines[0].dest_page, None);
-        assert!(diags.iter().any(|d| d.message.contains("STRUCT_NON_GOTO_OUTLINE")));
+        assert!(diags
+            .iter()
+            .any(|d| d.message.contains("STRUCT_NON_GOTO_OUTLINE")));
    }

    #[test]
@ -1316,7 +1386,10 @@ mod tests {

        // Create an outline with a named destination (string instead of page ref)
        let mut outline_dict = IndexMap::new();
-        outline_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Named Dest".to_vec())));
+        outline_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Named Dest".to_vec())),
+        );
        outline_dict.insert(intern("Dest"), PdfObject::Name(intern("Chapter1")));

        resolver.cache_object(ObjRef::new(100, 0), PdfObject::Dict(Box::new(outline_dict)));
@ -1329,7 +1402,9 @@ mod tests {
        let (outlines, diags) = parse_outlines(&resolver, Some(ObjRef::new(99, 0)), &pages);
        assert_eq!(outlines.len(), 1);
        assert_eq!(outlines[0].dest_page, None);
-        assert!(diags.iter().any(|d| d.message.contains("STRUCT_UNRESOLVED_DESTINATION")));
+        assert!(diags
+            .iter()
+            .any(|d| d.message.contains("STRUCT_UNRESOLVED_DESTINATION")));
    }

    #[test]
@ -1383,7 +1458,10 @@ mod tests {

        // Create an outline with /XYZ destination where left/top/zoom are null
        let mut outline_dict = IndexMap::new();
-        outline_dict.insert(intern("Title"), PdfObject::String(Box::new(b"Null Values".to_vec())));
+        outline_dict.insert(
+            intern("Title"),
+            PdfObject::String(Box::new(b"Null Values".to_vec())),
+        );
        outline_dict.insert(intern("Dest"), {
            let mut dest = Vec::new();
            dest.push(PdfObject::Ref(ObjRef::new(10, 0)));
--- a/crates/pdftract-core/src/parser/pages.rs
+++ b/crates/pdftract-core/src/parser/pages.rs
@ -10,10 +10,10 @@
 //! - Inheritance is "last-write-wins" at each level (child overrides parent)
 //! - If a required inheritable attribute is missing and not inherited, use a safe default

-use crate::parser::object::{ObjRef, PdfObject, PdfDict, intern};
+use crate::diagnostics::{DiagCode, Diagnostic};
+use crate::parser::object::{intern, ObjRef, PdfDict, PdfObject};
+use crate::parser::resources::{merge_resources, ResourceDict};
 use crate::parser::xref::XrefResolver;
-use crate::diagnostics::{Diagnostic, DiagCode};
-use crate::parser::resources::{ResourceDict, merge_resources};
 use std::collections::HashSet;
 use std::sync::Arc;

@ -156,7 +156,10 @@ fn count_pages_walk(
    if depth > MAX_PAGES_DEPTH {
        diagnostics.push(Diagnostic::with_dynamic_no_offset(
            DiagCode::StructDepthExceeded,
-            format!("STRUCT_DEPTH_EXCEEDED: /Pages nesting exceeds {} levels", MAX_PAGES_DEPTH),
+            format!(
+                "STRUCT_DEPTH_EXCEEDED: /Pages nesting exceeds {} levels",
+                MAX_PAGES_DEPTH
+            ),
        ));
        return 0;
    }
@ -165,7 +168,10 @@ fn count_pages_walk(
    if visited.contains(&node_ref) {
        diagnostics.push(Diagnostic::with_dynamic_no_offset(
            DiagCode::StructCircularRef,
-            format!("STRUCT_CIRCULAR_REF: /Pages node {} already visited", node_ref),
+            format!(
+                "STRUCT_CIRCULAR_REF: /Pages node {} already visited",
+                node_ref
+            ),
        ));
        return 0;
    }
@ -190,9 +196,7 @@ fn count_pages_walk(
        }
    };

-    let node_type = dict.get("Type")
-        .and_then(|o| o.as_name())
-        .unwrap_or("");
+    let node_type = dict.get("Type").and_then(|o| o.as_name()).unwrap_or("");

    match node_type {
        "Page" => {
@ -226,7 +230,8 @@ fn count_pages_walk(
                    PdfObject::Ref(ref_) => *ref_,
                    PdfObject::Dict(_) => {
                        // Direct dictionary - count as a page if it's a /Page
-                        let kid_type = kid.as_dict()
+                        let kid_type = kid
+                            .as_dict()
                            .and_then(|d| d.get("Type"))
                            .and_then(|o| o.as_name())
                            .unwrap_or("");
@ -241,7 +246,7 @@ fn count_pages_walk(
            }
            total
        }
-        _ => 0
+        _ => 0,
    }
 }

@ -297,7 +302,8 @@ pub fn flatten_page_tree(resolver: &XrefResolver, pages_ref: ObjRef) -> Result<V
    };

    // Extract /Count if present (for validation later)
-    let declared_count = pages_obj.as_dict()
+    let declared_count = pages_obj
+        .as_dict()
        .and_then(|d| d.get("Count"))
        .and_then(|o| o.as_int())
        .unwrap_or(0);
@ -359,7 +365,10 @@ fn walk_page_tree(
    if depth > MAX_PAGES_DEPTH {
        diagnostics.push(Diagnostic::with_dynamic_no_offset(
            DiagCode::StructDepthExceeded,
-            format!("STRUCT_DEPTH_EXCEEDED: /Pages nesting exceeds {} levels", MAX_PAGES_DEPTH),
+            format!(
+                "STRUCT_DEPTH_EXCEEDED: /Pages nesting exceeds {} levels",
+                MAX_PAGES_DEPTH
+            ),
        ));
        return Vec::new();
    }
@ -373,9 +382,7 @@ fn walk_page_tree(
    };

    // Check /Type to determine if this is /Pages or /Page
-    let node_type = dict.get("Type")
-        .and_then(|o| o.as_name())
-        .unwrap_or("");
+    let node_type = dict.get("Type").and_then(|o| o.as_name()).unwrap_or("");

    // Save the inherited state before merging this node's attributes
    let parent_inherited = inherited.clone();
@ -423,7 +430,10 @@ fn walk_page_tree(
                        if visited.contains(ref_) {
                            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                                DiagCode::StructCircularRef,
-                                format!("STRUCT_CIRCULAR_REF: /Pages node {} already visited", ref_),
+                                format!(
+                                    "STRUCT_CIRCULAR_REF: /Pages node {} already visited",
+                                    ref_
+                                ),
                            ));
                            continue;
                        }
@ -434,7 +444,10 @@ fn walk_page_tree(
                            Err(e) => {
                                diagnostics.push(Diagnostic::with_dynamic_no_offset(
                                    DiagCode::StructMissingKey,
-                                    format!("STRUCT_MISSING_KEY: Failed to resolve /Kids entry {}: {}", ref_, e),
+                                    format!(
+                                        "STRUCT_MISSING_KEY: Failed to resolve /Kids entry {}: {}",
+                                        ref_, e
+                                    ),
                                ));
                                continue;
                            }
@ -479,7 +492,11 @@ fn walk_page_tree(
 ///
 /// Per PDF spec 7.7.3.4, only MediaBox, CropBox, Resources, and Rotate are inheritable.
 /// This function updates the `inherited` accumulator with any values present in `dict`.
-fn merge_inherited_attrs(dict: &PdfDict, inherited: &mut InheritedAttrs, diagnostics: &mut Vec<Diagnostic>) {
+fn merge_inherited_attrs(
+    dict: &PdfDict,
+    inherited: &mut InheritedAttrs,
+    diagnostics: &mut Vec<Diagnostic>,
+) {
    // MediaBox (inheritable)
    if let Some(mb) = parse_rect(dict.get("MediaBox")) {
        inherited.media_box = Some(mb);
@ -501,7 +518,10 @@ fn merge_inherited_attrs(dict: &PdfDict, inherited: &mut InheritedAttrs, diagnos
        if rot % 90 != 0 {
            diagnostics.push(Diagnostic::with_dynamic_no_offset(
                DiagCode::PageInvalidRotate,
-                format!("STRUCT_INVALID_ROTATE: /Rotate value {} is not a multiple of 90", rot),
+                format!(
+                    "STRUCT_INVALID_ROTATE: /Rotate value {} is not a multiple of 90",
+                    rot
+                ),
            ));
            // Clamp to nearest multiple of 90 (floor toward negative infinity)
            inherited.rotate = ((rot as f64 / 90.0).floor() as i64 * 90) as i32;
@ -515,7 +535,11 @@ fn merge_inherited_attrs(dict: &PdfDict, inherited: &mut InheritedAttrs, diagnos
 ///
 /// This function extracts all page-level attributes, substituting defaults for
 /// missing values and emitting diagnostics where appropriate.
-fn build_page_dict(page_obj: &PdfObject, inherited: &InheritedAttrs, diagnostics: &mut Vec<Diagnostic>) -> PageDict {
+fn build_page_dict(
+    page_obj: &PdfObject,
+    inherited: &InheritedAttrs,
+    diagnostics: &mut Vec<Diagnostic>,
+) -> PageDict {
    let dict = match page_obj.as_dict() {
        Some(d) => d,
        None => {
@ -578,7 +602,10 @@ fn build_page_dict(page_obj: &PdfObject, inherited: &InheritedAttrs, diagnostics
            diagnostics.push(Diagnostic::with_dynamic(
                DiagCode::PageInvalidRotate,
                0,
-                format!("Page {} has /Rotate value {} (not a multiple of 90)", obj_ref, rot),
+                format!(
+                    "Page {} has /Rotate value {} (not a multiple of 90)",
+                    obj_ref, rot
+                ),
            ));
            // Clamp to nearest multiple of 90 (floor toward negative infinity)
            rotate = ((rot as f64 / 90.0).floor() as i64 * 90) as i32;
@ -602,20 +629,20 @@ fn build_page_dict(page_obj: &PdfObject, inherited: &InheritedAttrs, diagnostics

    // Annots: collect array of references
    let annots = if let Some(PdfObject::Array(arr)) = dict.get("Annots") {
-        arr.iter()
-            .filter_map(|o| o.as_ref())
-            .collect()
+        arr.iter().filter_map(|o| o.as_ref()).collect()
    } else {
        Vec::new()
    };

    // ActualText (from tagged PDF)
-    let actual_text = dict.get("ActualText")
+    let actual_text = dict
+        .get("ActualText")
        .and_then(|o| o.as_string())
        .and_then(|s| String::from_utf8(s.to_vec()).ok());

    // Lang (language identifier)
-    let lang = dict.get("Lang")
+    let lang = dict
+        .get("Lang")
        .and_then(|o| o.as_string())
        .and_then(|s| String::from_utf8(s.to_vec()).ok());

@ -623,7 +650,8 @@ fn build_page_dict(page_obj: &PdfObject, inherited: &InheritedAttrs, diagnostics
    let aa = dict.get("AA").cloned();

    // StructParents: for StructTree MCID resolution (Phase 7.1.4)
-    let struct_parents = dict.get("StructParents")
+    let struct_parents = dict
+        .get("StructParents")
        .and_then(|o| o.as_int())
        .map(|i| i as i32);

@ -654,10 +682,22 @@ fn parse_rect(obj: Option<&PdfObject>) -> Option<[f64; 4]> {
        return None;
    }

-    let x1 = arr[0].as_int().map(|i| i as f64).or_else(|| arr[0].as_real())?;
-    let y1 = arr[1].as_int().map(|i| i as f64).or_else(|| arr[1].as_real())?;
-    let x2 = arr[2].as_int().map(|i| i as f64).or_else(|| arr[2].as_real())?;
-    let y2 = arr[3].as_int().map(|i| i as f64).or_else(|| arr[3].as_real())?;
+    let x1 = arr[0]
+        .as_int()
+        .map(|i| i as f64)
+        .or_else(|| arr[0].as_real())?;
+    let y1 = arr[1]
+        .as_int()
+        .map(|i| i as f64)
+        .or_else(|| arr[1].as_real())?;
+    let x2 = arr[2]
+        .as_int()
+        .map(|i| i as f64)
+        .or_else(|| arr[2].as_real())?;
+    let y2 = arr[3]
+        .as_int()
+        .map(|i| i as f64)
+        .or_else(|| arr[3].as_real())?;

    Some([x1, y1, x2, y2])
 }
@ -673,11 +713,7 @@ fn parse_contents_array(obj: Option<&PdfObject>) -> Vec<ObjRef> {
    match obj {
        None => Vec::new(),
        Some(PdfObject::Ref(ref_)) => vec![*ref_],
-        Some(PdfObject::Array(arr)) => {
-            arr.iter()
-                .filter_map(|o| o.as_ref())
-                .collect()
-        }
+        Some(PdfObject::Array(arr)) => arr.iter().filter_map(|o| o.as_ref()).collect(),
        Some(PdfObject::Stream(_)) => {
            // Direct stream is illegal - should be indirect
            // Return empty; diagnostics would be emitted by parser
@ -771,7 +807,10 @@ mod tests {
    #[test]
    fn test_parse_contents_single_ref() {
        let ref_obj = PdfObject::Ref(ObjRef::new(10, 0));
-        assert_eq!(parse_contents_array(Some(&ref_obj)), vec![ObjRef::new(10, 0)]);
+        assert_eq!(
+            parse_contents_array(Some(&ref_obj)),
+            vec![ObjRef::new(10, 0)]
+        );
    }

    #[test]
@ -780,10 +819,10 @@ mod tests {
            PdfObject::Ref(ObjRef::new(10, 0)),
            PdfObject::Ref(ObjRef::new(11, 0)),
        ]));
-        assert_eq!(parse_contents_array(Some(&arr)), vec![
-            ObjRef::new(10, 0),
-            ObjRef::new(11, 0),
-        ]);
+        assert_eq!(
+            parse_contents_array(Some(&arr)),
+            vec![ObjRef::new(10, 0), ObjRef::new(11, 0),]
+        );
    }

    #[test]
@ -831,13 +870,16 @@ mod tests {
        let mut grandparent_dict = grandparent.as_dict().unwrap().clone();
        grandparent_dict.insert(
            intern("Kids"),
-            PdfObject::Array(Box::new(vec![PdfObject::Ref(parent_ref)]))
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(parent_ref)])),
        );

        let mut parent_dict = parent.as_dict().unwrap().clone();
        parent_dict.insert(
            intern("Kids"),
-            PdfObject::Array(Box::new(vec![PdfObject::Ref(page1_ref), PdfObject::Ref(page2_ref)]))
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(page1_ref),
+                PdfObject::Ref(page2_ref),
+            ])),
        );

        resolver.cache_object(grandparent_ref, PdfObject::Dict(Box::new(grandparent_dict)));
@ -861,11 +903,7 @@ mod tests {
        let pages_ref = ObjRef::new(1, 0);

        // /Pages with no MediaBox
-        let pages = make_pages_dict(
-            vec![make_page_dict(None, None)],
-            1,
-            None,
-        );
+        let pages = make_pages_dict(vec![make_page_dict(None, None)], 1, None);

        resolver.cache_object(pages_ref, pages);

@ -960,7 +998,7 @@ mod tests {
        // /Count says 5, but we only have 1 page
        let pages = make_pages_dict(
            vec![make_page_dict(Some(DEFAULT_MEDIABOX), None)],
-            5,  // Wrong count
+            5, // Wrong count
            Some(DEFAULT_MEDIABOX),
        );

@ -992,22 +1030,31 @@ mod tests {
        // Create child2 with a valid page and a reference to child1 (creating cycle)
        let mut child2_dict = PdfDict::new();
        child2_dict.insert(intern("Type"), PdfObject::Name(intern("Pages")));
-        child2_dict.insert(intern("Kids"), PdfObject::Array(Box::new(vec![
-            PdfObject::Ref(page_ref),
-            PdfObject::Ref(child1_ref),  // This will cause a cycle
-        ])));
+        child2_dict.insert(
+            intern("Kids"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(page_ref),
+                PdfObject::Ref(child1_ref), // This will cause a cycle
+            ])),
+        );
        child2_dict.insert(intern("Count"), PdfObject::Integer(2));

        // Create child1 that references child2 (the other half of the cycle)
        let mut child1_dict = PdfDict::new();
        child1_dict.insert(intern("Type"), PdfObject::Name(intern("Pages")));
-        child1_dict.insert(intern("Kids"), PdfObject::Array(Box::new(vec![PdfObject::Ref(child2_ref)])));
+        child1_dict.insert(
+            intern("Kids"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(child2_ref)])),
+        );
        child1_dict.insert(intern("Count"), PdfObject::Integer(1));

        // Create parent that references child1
        let mut parent_dict = PdfDict::new();
        parent_dict.insert(intern("Type"), PdfObject::Name(intern("Pages")));
-        parent_dict.insert(intern("Kids"), PdfObject::Array(Box::new(vec![PdfObject::Ref(child1_ref)])));
+        parent_dict.insert(
+            intern("Kids"),
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(child1_ref)])),
+        );
        parent_dict.insert(intern("Count"), PdfObject::Integer(2));
        parent_dict.insert(intern("MediaBox"), make_rect_array(DEFAULT_MEDIABOX));

@ -1043,7 +1090,10 @@ mod tests {
        grandparent.insert(intern("Type"), PdfObject::Name(intern("Pages")));
        grandparent.insert(intern("Kids"), PdfObject::Array(Box::new(vec![])));
        grandparent.insert(intern("Count"), PdfObject::Integer(2));
-        grandparent.insert(intern("Resources"), PdfObject::Dict(Box::new(grandparent_resources)));
+        grandparent.insert(
+            intern("Resources"),
+            PdfObject::Dict(Box::new(grandparent_resources)),
+        );
        grandparent.insert(intern("MediaBox"), make_rect_array(DEFAULT_MEDIABOX));

        // Parent /Pages adds /F2
@ -1057,7 +1107,10 @@ mod tests {
        parent.insert(intern("Type"), PdfObject::Name(intern("Pages")));
        parent.insert(intern("Kids"), PdfObject::Array(Box::new(vec![])));
        parent.insert(intern("Count"), PdfObject::Integer(2));
-        parent.insert(intern("Resources"), PdfObject::Dict(Box::new(parent_resources)));
+        parent.insert(
+            intern("Resources"),
+            PdfObject::Dict(Box::new(parent_resources)),
+        );

        // Page 1 adds /F3 and overrides /F1
        let page1_ref = ObjRef::new(3, 0);
@ -1070,7 +1123,10 @@ mod tests {
        let mut page1 = PdfDict::new();
        page1.insert(intern("Type"), PdfObject::Name(intern("Page")));
        page1.insert(intern("MediaBox"), make_rect_array(DEFAULT_MEDIABOX));
-        page1.insert(intern("Resources"), PdfObject::Dict(Box::new(page1_resources)));
+        page1.insert(
+            intern("Resources"),
+            PdfObject::Dict(Box::new(page1_resources)),
+        );

        // Page 2 has no resources (should inherit all)
        let page2_ref = ObjRef::new(4, 0);
@ -1082,13 +1138,16 @@ mod tests {
        let mut grandparent_dict = grandparent.clone();
        grandparent_dict.insert(
            intern("Kids"),
-            PdfObject::Array(Box::new(vec![PdfObject::Ref(parent_ref)]))
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(parent_ref)])),
        );

        let mut parent_dict = parent.clone();
        parent_dict.insert(
            intern("Kids"),
-            PdfObject::Array(Box::new(vec![PdfObject::Ref(page1_ref), PdfObject::Ref(page2_ref)]))
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(page1_ref),
+                PdfObject::Ref(page2_ref),
+            ])),
        );

        resolver.cache_object(grandparent_ref, PdfObject::Dict(Box::new(grandparent_dict)));
@ -1103,18 +1162,39 @@ mod tests {

        // Page 1: should have F1 (overridden), F2 (inherited), F3 (new), Im1 (inherited)
        assert_eq!(pages_vec[0].resources.fonts.len(), 3);
-        assert_eq!(pages_vec[0].resources.fonts.get(&intern("F1")), Some(&ObjRef::new(15, 0))); // Overridden
-        assert_eq!(pages_vec[0].resources.fonts.get(&intern("F2")), Some(&ObjRef::new(11, 0))); // Inherited from parent
-        assert_eq!(pages_vec[0].resources.fonts.get(&intern("F3")), Some(&ObjRef::new(12, 0))); // New on page
+        assert_eq!(
+            pages_vec[0].resources.fonts.get(&intern("F1")),
+            Some(&ObjRef::new(15, 0))
+        ); // Overridden
+        assert_eq!(
+            pages_vec[0].resources.fonts.get(&intern("F2")),
+            Some(&ObjRef::new(11, 0))
+        ); // Inherited from parent
+        assert_eq!(
+            pages_vec[0].resources.fonts.get(&intern("F3")),
+            Some(&ObjRef::new(12, 0))
+        ); // New on page
        assert_eq!(pages_vec[0].resources.xobjects.len(), 1);
-        assert_eq!(pages_vec[0].resources.xobjects.get(&intern("Im1")), Some(&ObjRef::new(20, 0))); // Inherited from grandparent
+        assert_eq!(
+            pages_vec[0].resources.xobjects.get(&intern("Im1")),
+            Some(&ObjRef::new(20, 0))
+        ); // Inherited from grandparent

        // Page 2: should have all inherited resources (F1, F2, Im1)
        assert_eq!(pages_vec[1].resources.fonts.len(), 2);
-        assert_eq!(pages_vec[1].resources.fonts.get(&intern("F1")), Some(&ObjRef::new(10, 0))); // From grandparent
-        assert_eq!(pages_vec[1].resources.fonts.get(&intern("F2")), Some(&ObjRef::new(11, 0))); // From parent
+        assert_eq!(
+            pages_vec[1].resources.fonts.get(&intern("F1")),
+            Some(&ObjRef::new(10, 0))
+        ); // From grandparent
+        assert_eq!(
+            pages_vec[1].resources.fonts.get(&intern("F2")),
+            Some(&ObjRef::new(11, 0))
+        ); // From parent
        assert_eq!(pages_vec[1].resources.xobjects.len(), 1);
-        assert_eq!(pages_vec[1].resources.xobjects.get(&intern("Im1")), Some(&ObjRef::new(20, 0))); // From grandparent
+        assert_eq!(
+            pages_vec[1].resources.xobjects.get(&intern("Im1")),
+            Some(&ObjRef::new(20, 0))
+        ); // From grandparent
    }

    #[test]
@ -1134,7 +1214,10 @@ mod tests {
        parent.insert(intern("Type"), PdfObject::Name(intern("Pages")));
        parent.insert(intern("Kids"), PdfObject::Array(Box::new(vec![])));
        parent.insert(intern("Count"), PdfObject::Integer(2));
-        parent.insert(intern("Resources"), PdfObject::Dict(Box::new(parent_resources)));
+        parent.insert(
+            intern("Resources"),
+            PdfObject::Dict(Box::new(parent_resources)),
+        );
        parent.insert(intern("MediaBox"), make_rect_array(DEFAULT_MEDIABOX));

        // Two pages without /Resources
@ -1152,7 +1235,10 @@ mod tests {
        let mut parent_dict = parent.clone();
        parent_dict.insert(
            intern("Kids"),
-            PdfObject::Array(Box::new(vec![PdfObject::Ref(page1_ref), PdfObject::Ref(page2_ref)]))
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Ref(page1_ref),
+                PdfObject::Ref(page2_ref),
+            ])),
        );

        resolver.cache_object(parent_ref, PdfObject::Dict(Box::new(parent_dict)));
@ -1166,13 +1252,22 @@ mod tests {

        // Both pages should have inherited F1 from parent
        assert_eq!(pages_vec[0].resources.fonts.len(), 1);
-        assert_eq!(pages_vec[0].resources.fonts.get(&intern("F1")), Some(&ObjRef::new(10, 0)));
+        assert_eq!(
+            pages_vec[0].resources.fonts.get(&intern("F1")),
+            Some(&ObjRef::new(10, 0))
+        );
        assert_eq!(pages_vec[1].resources.fonts.len(), 1);
-        assert_eq!(pages_vec[1].resources.fonts.get(&intern("F1")), Some(&ObjRef::new(10, 0)));
+        assert_eq!(
+            pages_vec[1].resources.fonts.get(&intern("F1")),
+            Some(&ObjRef::new(10, 0))
+        );

        // Verify Arc pointer sharing: when pages have no resources,
        // they should share the same Arc instance (memory efficiency)
-        assert!(Arc::ptr_eq(&pages_vec[0].resources, &pages_vec[1].resources));
+        assert!(Arc::ptr_eq(
+            &pages_vec[0].resources,
+            &pages_vec[1].resources
+        ));
    }

    #[test]
@ -1187,7 +1282,10 @@ mod tests {
        root.insert(intern("Type"), PdfObject::Name(intern("Pages")));
        root.insert(intern("Kids"), PdfObject::Array(Box::new(vec![])));
        root.insert(intern("Count"), PdfObject::Integer(1));
-        root.insert(intern("Resources"), PdfObject::Dict(Box::new(root_resources)));
+        root.insert(
+            intern("Resources"),
+            PdfObject::Dict(Box::new(root_resources)),
+        );
        root.insert(intern("MediaBox"), make_rect_array(DEFAULT_MEDIABOX));

        // Page without /Resources
@ -1200,7 +1298,7 @@ mod tests {
        let mut root_dict = root.clone();
        root_dict.insert(
            intern("Kids"),
-            PdfObject::Array(Box::new(vec![PdfObject::Ref(page_ref)]))
+            PdfObject::Array(Box::new(vec![PdfObject::Ref(page_ref)])),
        );

        resolver.cache_object(root_ref, PdfObject::Dict(Box::new(root_dict)));
@ -1253,7 +1351,10 @@ impl<'a> LazyPageIter<'a> {
    /// Create a new lazy page iterator starting from the given /Pages reference.
    ///
    /// This resolves the root /Pages node and initializes the traversal stack.
-    pub fn new(resolver: &'a XrefResolver, pages_ref: ObjRef) -> std::result::Result<Self, Vec<Diagnostic>> {
+    pub fn new(
+        resolver: &'a XrefResolver,
+        pages_ref: ObjRef,
+    ) -> std::result::Result<Self, Vec<Diagnostic>> {
        let mut visited = HashSet::new();
        let mut diagnostics = Vec::new();

@ -1309,7 +1410,10 @@ impl<'a> Iterator for LazyPageIter<'a> {
            if self.stack.len() > MAX_PAGES_DEPTH as usize {
                self.diagnostics.push(Diagnostic::with_dynamic_no_offset(
                    DiagCode::StructDepthExceeded,
-                    format!("STRUCT_DEPTH_EXCEEDED: /Pages nesting exceeds {} levels", MAX_PAGES_DEPTH),
+                    format!(
+                        "STRUCT_DEPTH_EXCEEDED: /Pages nesting exceeds {} levels",
+                        MAX_PAGES_DEPTH
+                    ),
                ));
                continue;
            }
@ -1322,9 +1426,7 @@ impl<'a> Iterator for LazyPageIter<'a> {
                }
            };

-            let node_type = dict.get("Type")
-                .and_then(|o| o.as_name())
-                .unwrap_or("");
+            let node_type = dict.get("Type").and_then(|o| o.as_name()).unwrap_or("");

            // Save the inherited state before merging this node's attributes
            let parent_inherited = inherited.clone();
@ -1369,7 +1471,11 @@ impl<'a> Iterator for LazyPageIter<'a> {
                    // We need to push kids[kid_idx+1..] first, then process kid at kid_idx
                    if kid_idx + 1 < kids_array.len() {
                        // Clone node before moving it to avoid borrow checker error
-                        self.stack.push((node.clone(), pages_parent_inherited.clone(), kid_idx + 1));
+                        self.stack.push((
+                            node.clone(),
+                            pages_parent_inherited.clone(),
+                            kid_idx + 1,
+                        ));
                    }

                    // Push the current kid onto stack
@ -1383,7 +1489,10 @@ impl<'a> Iterator for LazyPageIter<'a> {
                                if self.visited.contains(ref_) {
                                    self.diagnostics.push(Diagnostic::with_dynamic_no_offset(
                                        DiagCode::StructCircularRef,
-                                        format!("STRUCT_CIRCULAR_REF: /Pages node {} already visited", ref_),
+                                        format!(
+                                            "STRUCT_CIRCULAR_REF: /Pages node {} already visited",
+                                            ref_
+                                        ),
                                    ));
                                    inherited = parent_inherited;
                                    continue;
@ -1445,12 +1554,15 @@ mod proptests {
        dict.insert(intern("Kids"), PdfObject::Array(Box::new(kids)));
        dict.insert(intern("Count"), PdfObject::Integer(count));
        if let Some(mb) = media_box {
-            dict.insert(intern("MediaBox"), PdfObject::Array(Box::new(vec![
-                PdfObject::Real(mb[0]),
-                PdfObject::Real(mb[1]),
-                PdfObject::Real(mb[2]),
-                PdfObject::Real(mb[3]),
-            ])));
+            dict.insert(
+                intern("MediaBox"),
+                PdfObject::Array(Box::new(vec![
+                    PdfObject::Real(mb[0]),
+                    PdfObject::Real(mb[1]),
+                    PdfObject::Real(mb[2]),
+                    PdfObject::Real(mb[3]),
+                ])),
+            );
        }
        PdfObject::Dict(Box::new(dict))
    }
@ -1460,12 +1572,15 @@ mod proptests {
        let mut dict = PdfDict::new();
        dict.insert(intern("Type"), PdfObject::Name(intern("Page")));
        if let Some(mb) = media_box {
-            dict.insert(intern("MediaBox"), PdfObject::Array(Box::new(vec![
-                PdfObject::Real(mb[0]),
-                PdfObject::Real(mb[1]),
-                PdfObject::Real(mb[2]),
-                PdfObject::Real(mb[3]),
-            ])));
+            dict.insert(
+                intern("MediaBox"),
+                PdfObject::Array(Box::new(vec![
+                    PdfObject::Real(mb[0]),
+                    PdfObject::Real(mb[1]),
+                    PdfObject::Real(mb[2]),
+                    PdfObject::Real(mb[3]),
+                ])),
+            );
        }
        if let Some(rot) = rotate {
            dict.insert(intern("Rotate"), PdfObject::Integer(rot));
@ -1485,36 +1600,46 @@ mod proptests {
            prop::option::of(-1000i64..1000),
            prop::option::of(arb_rect()),
            prop::option::of(arb_rect()),
-        ).prop_map(|(media_box, rotate, crop_box, bleed_box)| {
-            let mut dict = PdfDict::new();
-            dict.insert(intern("Type"), PdfObject::Name(intern("Page")));
-            dict.insert(intern("MediaBox"), PdfObject::Array(Box::new(vec![
-                PdfObject::Real(media_box[0]),
-                PdfObject::Real(media_box[1]),
-                PdfObject::Real(media_box[2]),
-                PdfObject::Real(media_box[3]),
-            ])));
-            if let Some(rot) = rotate {
-                dict.insert(intern("Rotate"), PdfObject::Integer(rot));
-            }
-            if let Some(cb) = crop_box {
-                dict.insert(intern("CropBox"), PdfObject::Array(Box::new(vec![
-                    PdfObject::Real(cb[0]),
-                    PdfObject::Real(cb[1]),
-                    PdfObject::Real(cb[2]),
-                    PdfObject::Real(cb[3]),
-                ])));
-            }
-            if let Some(bb) = bleed_box {
-                dict.insert(intern("BleedBox"), PdfObject::Array(Box::new(vec![
-                    PdfObject::Real(bb[0]),
-                    PdfObject::Real(bb[1]),
-                    PdfObject::Real(bb[2]),
-                    PdfObject::Real(bb[3]),
-                ])));
-            }
-            dict
-        })
+        )
+            .prop_map(|(media_box, rotate, crop_box, bleed_box)| {
+                let mut dict = PdfDict::new();
+                dict.insert(intern("Type"), PdfObject::Name(intern("Page")));
+                dict.insert(
+                    intern("MediaBox"),
+                    PdfObject::Array(Box::new(vec![
+                        PdfObject::Real(media_box[0]),
+                        PdfObject::Real(media_box[1]),
+                        PdfObject::Real(media_box[2]),
+                        PdfObject::Real(media_box[3]),
+                    ])),
+                );
+                if let Some(rot) = rotate {
+                    dict.insert(intern("Rotate"), PdfObject::Integer(rot));
+                }
+                if let Some(cb) = crop_box {
+                    dict.insert(
+                        intern("CropBox"),
+                        PdfObject::Array(Box::new(vec![
+                            PdfObject::Real(cb[0]),
+                            PdfObject::Real(cb[1]),
+                            PdfObject::Real(cb[2]),
+                            PdfObject::Real(cb[3]),
+                        ])),
+                    );
+                }
+                if let Some(bb) = bleed_box {
+                    dict.insert(
+                        intern("BleedBox"),
+                        PdfObject::Array(Box::new(vec![
+                            PdfObject::Real(bb[0]),
+                            PdfObject::Real(bb[1]),
+                            PdfObject::Real(bb[2]),
+                            PdfObject::Real(bb[3]),
+                        ])),
+                    );
+                }
+                dict
+            })
    }

    /// Strategy to generate /Pages dictionaries with direct /Kids.
@ -1527,9 +1652,10 @@ mod proptests {
            dict.insert(intern("Count"), PdfObject::Integer(0));

            if let Some(page) = maybe_page {
-                dict.insert(intern("Kids"), PdfObject::Array(Box::new(vec![
-                    PdfObject::Dict(Box::new(page))
-                ])));
+                dict.insert(
+                    intern("Kids"),
+                    PdfObject::Array(Box::new(vec![PdfObject::Dict(Box::new(page))])),
+                );
                dict.insert(intern("Count"), PdfObject::Integer(1));
            } else {
                dict.insert(intern("Kids"), PdfObject::Array(Box::new(vec![])));
--- a/crates/pdftract-core/src/parser/resources.rs
+++ b/crates/pdftract-core/src/parser/resources.rs
@ -7,9 +7,9 @@
 //! containing all resources from its ancestor /Pages nodes, with per-key
 //! last-write-wins semantics at the page level.

-use crate::parser::object::{ObjRef, PdfObject, PdfDict, intern};
-use std::sync::Arc;
+use crate::parser::object::{intern, ObjRef, PdfDict, PdfObject};
 use indexmap::IndexMap;
+use std::sync::Arc;

 /// A merged resource dictionary for a page.
 ///
@ -290,8 +290,8 @@ mod tests {

        assert_eq!(merged.fonts.len(), 3);
        assert_eq!(merged.fonts.get(&intern("F1")), Some(&ObjRef::new(10, 0))); // Overridden
-        assert_eq!(merged.fonts.get(&intern("F2")), Some(&ObjRef::new(2, 0)));  // Inherited
-        assert_eq!(merged.fonts.get(&intern("F3")), Some(&ObjRef::new(3, 0)));  // New
+        assert_eq!(merged.fonts.get(&intern("F2")), Some(&ObjRef::new(2, 0))); // Inherited
+        assert_eq!(merged.fonts.get(&intern("F3")), Some(&ObjRef::new(3, 0))); // New
    }

    #[test]
@ -307,8 +307,14 @@ mod tests {
        let merged = merge_resources(&ancestor, &PdfObject::Dict(Box::new(child_resources)));

        assert_eq!(merged.xobjects.len(), 2);
-        assert_eq!(merged.xobjects.get(&intern("Im1")), Some(&ObjRef::new(5, 0)));
-        assert_eq!(merged.xobjects.get(&intern("Im2")), Some(&ObjRef::new(6, 0)));
+        assert_eq!(
+            merged.xobjects.get(&intern("Im1")),
+            Some(&ObjRef::new(5, 0))
+        );
+        assert_eq!(
+            merged.xobjects.get(&intern("Im2")),
+            Some(&ObjRef::new(6, 0))
+        );
    }

    #[test]
@ -321,11 +327,14 @@ mod tests {

        // Inline color space array: [/CalRGB << /Gamma [1 1 1] >>]
        let mut gamma_arr = PdfDict::new();
-        gamma_arr.insert(intern("Gamma"), PdfObject::Array(Box::new(vec![
-            PdfObject::Integer(1),
-            PdfObject::Integer(1),
-            PdfObject::Integer(1),
-        ])));
+        gamma_arr.insert(
+            intern("Gamma"),
+            PdfObject::Array(Box::new(vec![
+                PdfObject::Integer(1),
+                PdfObject::Integer(1),
+                PdfObject::Integer(1),
+            ])),
+        );

        child_cs.insert(
            intern("CS1"),
--- a/crates/pdftract-core/src/parser/secrets.rs
+++ b/crates/pdftract-core/src/parser/secrets.rs
@ -16,7 +16,7 @@
 //! CI should run: `rg "expose_secret\(\)" crates/ --type rust` and fail the
 //! build if any matches are found outside of these approved locations.

-use secrecy::{SecretString, ExposeSecret};
+use secrecy::{ExposeSecret, SecretString};
 use sha2::{Digest, Sha256};

 /// A fingerprint of a secret value for use in audit logs.
@ -91,7 +91,10 @@ mod tests {
    fn test_fingerprint_display() {
        let fp = SecretFingerprint::from_str("test");
        let display = format!("{}", fp);
-        assert!(!display.contains("test"), "fingerprint doesn't contain secret");
+        assert!(
+            !display.contains("test"),
+            "fingerprint doesn't contain secret"
+        );
        assert_eq!(display.len(), 64, "SHA-256 produces 64 hex chars");
    }
 }
--- a/crates/pdftract-core/src/parser/stream.rs
+++ b/crates/pdftract-core/src/parser/stream.rs
--- a/crates/pdftract-core/src/parser/struct_tree.rs
+++ b/crates/pdftract-core/src/parser/struct_tree.rs
--- a/crates/pdftract-core/src/parser/xref.rs
+++ b/crates/pdftract-core/src/parser/xref.rs
--- a/crates/pdftract-core/src/preprocess.rs
+++ b/crates/pdftract-core/src/preprocess.rs
@ -14,7 +14,7 @@

 #![cfg(feature = "ocr")]

-use crate::diagnostics::{Diagnostic, DiagCode};
+use crate::diagnostics::{DiagCode, Diagnostic};
 use image::{GrayImage, ImageBuffer, Luma};
 use std::ffi::c_float;

@ -114,8 +114,8 @@ const DESKEW_MAX_RANGE_DEG: f64 = 15.0;
 /// ```
 pub fn deskew(image: &GrayImage) -> Result<(GrayImage, f64, Vec<Diagnostic>)> {
    use leptonica_plumbing::leptonica_sys::{
-        pixDestroy, pixFindSkewAndDeskew, pixGetWidth, pixGetHeight, pixGetDepth,
-        Pix, l_float32, l_int32,
+        l_float32, l_int32, pixDestroy, pixFindSkewAndDeskew, pixGetDepth, pixGetHeight,
+        pixGetWidth, Pix,
    };

    let mut diagnostics = Vec::new();
@ -157,7 +157,10 @@ pub fn deskew(image: &GrayImage) -> Result<(GrayImage, f64, Vec<Diagnostic>)> {
            pixDestroy(pix);
            diagnostics.push(Diagnostic::with_static_no_offset(
                DiagCode::ImgDeskewOutOfRange,
-                format!("Skew angle {}° exceeds detection range (±{}°)", angle_deg, DESKEW_MAX_RANGE_DEG),
+                format!(
+                    "Skew angle {}° exceeds detection range (±{}°)",
+                    angle_deg, DESKEW_MAX_RANGE_DEG
+                ),
            ));
            return Ok((image.clone(), angle_deg, diagnostics));
        }
@ -180,9 +183,7 @@ pub fn deskew(image: &GrayImage) -> Result<(GrayImage, f64, Vec<Diagnostic>)> {
 ///
 /// Creates an 8-bit grayscale Pix from the image data.
 fn grayimage_to_pix(image: &GrayImage) -> Result<*mut Pix> {
-    use leptonica_plumbing::leptonica_sys::{
-        pixCreate, pixDestroy, pixGetData, Pix,
-    };
+    use leptonica_plumbing::leptonica_sys::{pixCreate, pixDestroy, pixGetData, Pix};
    use std::ptr;

    let width = image.width() as i32;
@ -231,7 +232,7 @@ fn grayimage_to_pix(image: &GrayImage) -> Result<*mut Pix> {
 /// Expects an 8-bit grayscale Pix.
 fn pix_to_grayimage(pix: *mut Pix) -> Result<GrayImage> {
    use leptonica_plumbing::leptonica_sys::{
-        pixGetData, pixGetWidth, pixGetHeight, pixGetDepth, Pix,
+        pixGetData, pixGetDepth, pixGetHeight, pixGetWidth, Pix,
    };

    unsafe {
@ -323,7 +324,9 @@ mod tests {
        let (deskewed, angle, diagnostics) = deskew(&img).expect("Deskew failed");

        assert!(angle.abs() < 0.1, "Angle should be near 0°, got {}", angle);
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgDeskewOutOfRange));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgDeskewOutOfRange));
    }

    #[test]
@ -343,7 +346,9 @@ mod tests {

        // Check that the Pix was created successfully
        unsafe {
-            use leptonica_plumbing::leptonica_sys::{pixGetWidth, pixGetHeight, pixGetDepth, pixDestroy};
+            use leptonica_plumbing::leptonica_sys::{
+                pixDestroy, pixGetDepth, pixGetHeight, pixGetWidth,
+            };

            assert!(!pix.is_null(), "Pix pointer should not be null");
            assert_eq!(pixGetWidth(pix) as u32, img.width());
@ -445,14 +450,24 @@ mod tests {
        let (deskewed, angle, diagnostics) = deskew(&skewed).expect("Deskew failed");

        // The detected angle should be close to 2 degrees
-        assert!((angle.abs() - 2.0).abs() < 0.5, "Detected angle {} should be close to 2°", angle);
+        assert!(
+            (angle.abs() - 2.0).abs() < 0.5,
+            "Detected angle {} should be close to 2°",
+            angle
+        );

        // After deskewing, a second pass should detect near-zero skew
        let (_, second_angle, _) = deskew(&deskewed).expect("Second deskew failed");
-        assert!(second_angle.abs() < 0.1, "Second pass should detect near-zero skew, got {}", second_angle);
+        assert!(
+            second_angle.abs() < 0.1,
+            "Second pass should detect near-zero skew, got {}",
+            second_angle
+        );

        // No out-of-range diagnostic for 2 degrees
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgDeskewOutOfRange));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgDeskewOutOfRange));
    }

    #[test]
@ -462,7 +477,11 @@ mod tests {
        let (deskewed, angle, diagnostics) = deskew(&skewed).expect("Deskew failed");

        // Angle should be 0.0 because we skip deskewing for angles < 0.3 deg
-        assert_eq!(angle, 0.0, "Angle should be 0.0 for sub-threshold skew, got {}", angle);
+        assert_eq!(
+            angle, 0.0,
+            "Angle should be 0.0 for sub-threshold skew, got {}",
+            angle
+        );

        // Image should be unchanged (same dimensions and pixels)
        assert_eq!(deskewed.dimensions(), skewed.dimensions());
@ -479,8 +498,12 @@ mod tests {
        let (deskewed, angle, diagnostics) = deskew(&skewed).expect("Deskew failed");

        // Should emit the out-of-range diagnostic
-        assert!(diagnostics.iter().any(|d| d.code == DiagCode::ImgDeskewOutOfRange),
-                "Should emit IMG_DESKEW_OUT_OF_RANGE for 20-degree skew");
+        assert!(
+            diagnostics
+                .iter()
+                .any(|d| d.code == DiagCode::ImgDeskewOutOfRange),
+            "Should emit IMG_DESKEW_OUT_OF_RANGE for 20-degree skew"
+        );

        // Image dimensions should be preserved (may be different due to rotation padding,
        // but should not be the original since pixFindSkewAndDeskew will attempt to rotate)
@ -722,8 +745,7 @@ mod tests {
        // Helper to get sum from integral image
        let get_sum = |integral: &[u64], x1: usize, y1: usize, x2: usize, y2: usize| -> u64 {
            let w = width + 1;
-            integral[y2 * w + x2]
-                + integral[y1 * w + x1]
+            integral[y2 * w + x2] + integral[y1 * w + x1]
                - integral[y1 * w + x2]
                - integral[y2 * w + x1]
        };
@ -827,7 +849,10 @@ mod tests {
    /// let original: GrayImage = // ... load image
    /// let (preprocessed, diagnostics) = preprocess(&original, ImageSource::PhysicalScan)?;
    /// ```
-    pub fn preprocess(image: &GrayImage, source: ImageSource) -> Result<(GrayImage, Vec<Diagnostic>)> {
+    pub fn preprocess(
+        image: &GrayImage,
+        source: ImageSource,
+    ) -> Result<(GrayImage, Vec<Diagnostic>)> {
        let mut diagnostics = Vec::new();
        let mut current = image.clone();

@ -951,7 +976,11 @@ mod tests {
        for y in 0..100 {
            for x in 0..100 {
                let pixel = binary.get_pixel(x, y)[0];
-                assert!(pixel == 0 || pixel == 255, "Pixel should be 0 or 255, got {}", pixel);
+                assert!(
+                    pixel == 0 || pixel == 255,
+                    "Pixel should be 0 or 255, got {}",
+                    pixel
+                );
            }
        }

@ -978,7 +1007,11 @@ mod tests {
        for y in 0..100 {
            for x in 0..100 {
                let pixel = binary.get_pixel(x, y)[0];
-                assert!(pixel == 0 || pixel == 255, "Pixel should be 0 or 255, got {}", pixel);
+                assert!(
+                    pixel == 0 || pixel == 255,
+                    "Pixel should be 0 or 255, got {}",
+                    pixel
+                );
            }
        }
    }
@ -988,58 +1021,68 @@ mod tests {
        // Create an image with salt-and-pepper noise
        let mut img = GrayImage::from_pixel(100, 100, Luma([128]));
        // Add some noise
-        img.put_pixel(50, 50, Luma([0]));   // pepper
+        img.put_pixel(50, 50, Luma([0])); // pepper
        img.put_pixel(51, 50, Luma([255])); // salt
        img.put_pixel(50, 51, Luma([255])); // salt
-        img.put_pixel(51, 51, Luma([0]));   // pepper
+        img.put_pixel(51, 51, Luma([0])); // pepper

        let denoised = denoise_median(&img);

        // The noisy pixels should be closer to 128 after median filtering
        let center = denoised.get_pixel(50, 50)[0];
-        assert!(center > 64 && center < 192, "Denoised pixel should be near middle, got {}", center);
+        assert!(
+            center > 64 && center < 192,
+            "Denoised pixel should be near middle, got {}",
+            center
+        );
    }

    #[test]
    fn test_preprocess_physical_scan() {
        let img = create_horizontal_lines_image();
-        let (preprocessed, diagnostics) = preprocess(&img, ImageSource::PhysicalScan)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&img, ImageSource::PhysicalScan).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), img.width() + 20);
        assert_eq!(preprocessed.height(), img.height() + 20);

        // Diagnostics should not have errors
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
    fn test_preprocess_digital_origin() {
        let img = create_horizontal_lines_image();
-        let (preprocessed, diagnostics) = preprocess(&img, ImageSource::DigitalOrigin)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&img, ImageSource::DigitalOrigin).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), img.width() + 20);
        assert_eq!(preprocessed.height(), img.height() + 20);

        // Diagnostics should not have errors
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
    fn test_preprocess_jbig2() {
        let img = create_horizontal_lines_image();
-        let (preprocessed, diagnostics) = preprocess(&img, ImageSource::Jbig2)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&img, ImageSource::Jbig2).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), img.width() + 20);
        assert_eq!(preprocessed.height(), img.height() + 20);

        // Diagnostics should not have errors
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
@ -1067,18 +1110,21 @@ mod tests {

    /// Helper to load a fixture image.
    fn load_fixture(path: &str) -> GrayImage {
-        image::io::Reader::with_format(std::io::Cursor::new(std::fs::read(path).unwrap()), image::ImageFormat::Png)
-            .decode()
-            .unwrap()
-            .to_luma8()
+        image::io::Reader::with_format(
+            std::io::Cursor::new(std::fs::read(path).unwrap()),
+            image::ImageFormat::Png,
+        )
+        .decode()
+        .unwrap()
+        .to_luma8()
    }

    #[test]
    fn test_preprocess_skewed_2deg_deskews() {
        // Acceptance criterion: 2-deg skewed fixture deskewed within 0.1 deg
        let source = load_fixture("tests/fixtures/preprocess/skewed_2deg/source.png");
-        let (preprocessed, diagnostics) = preprocess(&source, ImageSource::PhysicalScan)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&source, ImageSource::PhysicalScan).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), source.width() + 20);
@ -1092,21 +1138,28 @@ mod tests {
            BORDER_PADDING,
            preprocessed.width() - 2 * BORDER_PADDING,
            preprocessed.height() - 2 * BORDER_PADDING,
-        ).to_image();
+        )
+        .to_image();

        let (_, second_angle, _) = deskew(&cropped).expect("Second deskew failed");
-        assert!(second_angle.abs() < 0.1, "Second pass should detect near-zero skew, got {}", second_angle);
+        assert!(
+            second_angle.abs() < 0.1,
+            "Second pass should detect near-zero skew, got {}",
+            second_angle
+        );

        // No errors in diagnostics
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
    fn test_preprocess_uneven_lighting_binarizes() {
        // Acceptance criterion: uneven-lighting binarized correctly
        let source = load_fixture("tests/fixtures/preprocess/uneven_lighting/source.png");
-        let (preprocessed, diagnostics) = preprocess(&source, ImageSource::PhysicalScan)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&source, ImageSource::PhysicalScan).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), source.width() + 20);
@ -1116,20 +1169,26 @@ mod tests {
        for y in BORDER_PADDING..preprocessed.height() - BORDER_PADDING {
            for x in BORDER_PADDING..preprocessed.width() - BORDER_PADDING {
                let pixel = preprocessed.get_pixel(x, y)[0];
-                assert!(pixel == 0 || pixel == 255, "Pixel should be binary (0 or 255), got {}", pixel);
+                assert!(
+                    pixel == 0 || pixel == 255,
+                    "Pixel should be binary (0 or 255), got {}",
+                    pixel
+                );
            }
        }

        // No errors in diagnostics
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
    fn test_preprocess_clean_digital_binarizes() {
        // Acceptance criterion: clean digital origin binarized with Otsu
        let source = load_fixture("tests/fixtures/preprocess/clean_digital/source.png");
-        let (preprocessed, diagnostics) = preprocess(&source, ImageSource::DigitalOrigin)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&source, ImageSource::DigitalOrigin).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), source.width() + 20);
@ -1139,20 +1198,26 @@ mod tests {
        for y in BORDER_PADDING..preprocessed.height() - BORDER_PADDING {
            for x in BORDER_PADDING..preprocessed.width() - BORDER_PADDING {
                let pixel = preprocessed.get_pixel(x, y)[0];
-                assert!(pixel == 0 || pixel == 255, "Pixel should be binary (0 or 255), got {}", pixel);
+                assert!(
+                    pixel == 0 || pixel == 255,
+                    "Pixel should be binary (0 or 255), got {}",
+                    pixel
+                );
            }
        }

        // No errors in diagnostics
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
    fn test_preprocess_jbig2_only_pads() {
        // Acceptance criterion: JBIG2 untouched except for border padding
        let source = load_fixture("tests/fixtures/preprocess/jbig2_scan/source.png");
-        let (preprocessed, diagnostics) = preprocess(&source, ImageSource::Jbig2)
-            .expect("Preprocess failed");
+        let (preprocessed, diagnostics) =
+            preprocess(&source, ImageSource::Jbig2).expect("Preprocess failed");

        // Should have border padding
        assert_eq!(preprocessed.width(), source.width() + 20);
@ -1163,12 +1228,18 @@ mod tests {
            for x in 0..source.width() {
                let orig = source.get_pixel(x, y)[0];
                let pad = preprocessed.get_pixel(x + BORDER_PADDING, y + BORDER_PADDING)[0];
-                assert_eq!(orig, pad, "JBIG2 inner pixel at ({}, {}) should match original", x, y);
+                assert_eq!(
+                    orig, pad,
+                    "JBIG2 inner pixel at ({}, {}) should match original",
+                    x, y
+                );
            }
        }

        // No errors in diagnostics
-        assert!(!diagnostics.iter().any(|d| d.code == DiagCode::ImgUnsupportedFormat));
+        assert!(!diagnostics
+            .iter()
+            .any(|d| d.code == DiagCode::ImgUnsupportedFormat));
    }

    #[test]
@ -1176,10 +1247,10 @@ mod tests {
        // Acceptance criterion: same input -> bit-identical output
        let source = load_fixture("tests/fixtures/preprocess/clean_digital/source.png");

-        let (result1, _) = preprocess(&source, ImageSource::DigitalOrigin)
-            .expect("First preprocess failed");
-        let (result2, _) = preprocess(&source, ImageSource::DigitalOrigin)
-            .expect("Second preprocess failed");
+        let (result1, _) =
+            preprocess(&source, ImageSource::DigitalOrigin).expect("First preprocess failed");
+        let (result2, _) =
+            preprocess(&source, ImageSource::DigitalOrigin).expect("Second preprocess failed");

        // Compare pixel-by-pixel
        assert_eq!(result1.dimensions(), result2.dimensions());
@ -1196,34 +1267,50 @@ mod tests {
    fn test_preprocess_border_padding_pixel_perfect() {
        // Acceptance criterion: padding adds exactly 10px on each side
        let source = load_fixture("tests/fixtures/preprocess/clean_digital/source.png");
-        let (preprocessed, _) = preprocess(&source, ImageSource::DigitalOrigin)
-            .expect("Preprocess failed");
+        let (preprocessed, _) =
+            preprocess(&source, ImageSource::DigitalOrigin).expect("Preprocess failed");

        // Check top border is white
        for x in 0..preprocessed.width() {
            for y in 0..BORDER_PADDING {
-                assert_eq!(preprocessed.get_pixel(x, y)[0], 255, "Top border should be white");
+                assert_eq!(
+                    preprocessed.get_pixel(x, y)[0],
+                    255,
+                    "Top border should be white"
+                );
            }
        }

        // Check bottom border is white
        for x in 0..preprocessed.width() {
            for y in preprocessed.height() - BORDER_PADDING..preprocessed.height() {
-                assert_eq!(preprocessed.get_pixel(x, y)[0], 255, "Bottom border should be white");
+                assert_eq!(
+                    preprocessed.get_pixel(x, y)[0],
+                    255,
+                    "Bottom border should be white"
+                );
            }
        }

        // Check left border is white
        for y in 0..preprocessed.height() {
            for x in 0..BORDER_PADDING {
-                assert_eq!(preprocessed.get_pixel(x, y)[0], 255, "Left border should be white");
+                assert_eq!(
+                    preprocessed.get_pixel(x, y)[0],
+                    255,
+                    "Left border should be white"
+                );
            }
        }

        // Check right border is white
        for y in 0..preprocessed.height() {
            for x in preprocessed.width() - BORDER_PADDING..preprocessed.width() {
-                assert_eq!(preprocessed.get_pixel(x, y)[0], 255, "Right border should be white");
+                assert_eq!(
+                    preprocessed.get_pixel(x, y)[0],
+                    255,
+                    "Right border should be white"
+                );
            }
        }
    }
@ -1267,8 +1354,8 @@ mod benches {
        let img = create_a4_test_image();

        let start = Instant::now();
-        let (result, diagnostics) = preprocess(&img, ImageSource::PhysicalScan)
-            .expect("Preprocess failed");
+        let (result, diagnostics) =
+            preprocess(&img, ImageSource::PhysicalScan).expect("Preprocess failed");
        let elapsed = start.elapsed();

        println!("A4 (2480x3508) PhysicalScan preprocess time: {:?}", elapsed);
@ -1292,11 +1379,13 @@ mod benches {
        let img = create_a4_test_image();

        let start = Instant::now();
-        let (result, _) = preprocess(&img, ImageSource::DigitalOrigin)
-            .expect("Preprocess failed");
+        let (result, _) = preprocess(&img, ImageSource::DigitalOrigin).expect("Preprocess failed");
        let elapsed = start.elapsed();

-        println!("A4 (2480x3508) DigitalOrigin preprocess time: {:?}", elapsed);
+        println!(
+            "A4 (2480x3508) DigitalOrigin preprocess time: {:?}",
+            elapsed
+        );

        assert_eq!(result.width(), A4_WIDTH + 20);
        assert_eq!(result.height(), A4_HEIGHT + 20);
@ -1313,8 +1402,7 @@ mod benches {
        let img = create_a4_test_image();

        let start = Instant::now();
-        let (result, _) = preprocess(&img, ImageSource::Jbig2)
-            .expect("Preprocess failed");
+        let (result, _) = preprocess(&img, ImageSource::Jbig2).expect("Preprocess failed");
        let elapsed = start.elapsed();

        println!("A4 (2480x3508) Jbig2 preprocess time: {:?}", elapsed);
--- a/crates/pdftract-core/src/receipts/lite.rs
+++ b/crates/pdftract-core/src/receipts/lite.rs
@ -67,7 +67,8 @@ mod tests {
    fn test_lite_size_benchmark() {
        // Benchmark: verify receipt sizes are reasonable
        // In a real document, all receipts share the same pdf_fingerprint
-        let pdf_fingerprint = "pdftract-v1:a7f3b8c4d2e1f6a9b5c3d8e7f4a2b1c9d6e3f8a7b4c2d9e6f3a8b7c4d1e9f6a3b8";
+        let pdf_fingerprint =
+            "pdftract-v1:a7f3b8c4d2e1f6a9b5c3d8e7f4a2b1c9d6e3f8a7b4c2d9e6f3a8b7c4d1e9f6a3b8";
        let mut total_size = 0;

        for i in 0..100 {
--- a/Show more
+++ b/Show more