pdftract/tests/fixtures/profiles/slide_deck
jedarden 21fcd902d1 feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests
Implements the slide_deck document profile for PowerPoint/Keynote/Google
Slides exports as PDF. Includes 5 fixtures, expected outputs, and regression
tests.

Components:
- profiles/builtin/slide_deck/profile.yaml - Profile configuration
- tests/fixtures/profiles/slide_deck/ - 5 PDF fixtures with expected outputs
- crates/pdftract-cli/tests/test_slide_deck.rs - Regression tests (12 PASS)

Fixtures cover:
1. pitch_deck - Sales pitch (10 slides)
2. academic_lecture - Academic lecture (40 slides)
3. corporate_kickoff - Corporate kickoff (15 slides)
4. bilingual_deck - Bilingual EN/ES (12 slides)
5. googleslides_handout - Google Slides handout mode (4 pages, 3 slides/page)

Extracted fields: title, presenter, date, slide_titles

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 21:12:24 -04:00
..
academic_lecture-expected.json feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
academic_lecture.pdf feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
bilingual_deck-expected.json feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
bilingual_deck.pdf feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
corporate_kickoff-expected.json feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
corporate_kickoff.pdf feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
googleslides_handout-expected.json feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
googleslides_handout.pdf feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
pitch_deck-expected.json feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
pitch_deck.pdf feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
PROVENANCE.md feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00
README.md feat(pdftract-2vajs): implement slide_deck profile with fixtures and tests 2026-05-27 21:12:24 -04:00

Slide Deck Profile Fixtures

This directory contains test fixtures for the slide deck document profile.

Fixture Types

  1. pitch_deck - Sales/product pitch deck (10 slides) with typical startup presentation structure
  2. academic_lecture - Academic lecture slides (40 slides) with technical content and Q&A slides
  3. corporate_kickoff - Corporate annual kickoff presentation (15 slides) with business metrics and roadmap
  4. bilingual_deck - Bilingual English/Spanish presentation (12 slides) testing multilingual extraction
  5. googleslides_handout - Google Slides handout mode export (3 slides per page, 4 pages total) testing multi-slide-per-page edge case

Expected Output Format

Each fixture should have a corresponding *-expected.json file with the following structure:

{
  "metadata": {
    "document_type": "slide_deck",
    "document_type_confidence": 0.XX,
    "document_type_reasons": [...],
    "profile_name": "slide_deck",
    "profile_version": "1.0.0",
    "profile_fields": {
      "title": "...",
      "presenter": "...",
      "date": "YYYY-MM-DD",
      "slide_titles": [...]
    }
  }
}

Profile Fields

The slide deck profile extracts the following fields:

  • title: Presentation title (region: middle_half, pick: largest_font)
  • presenter: Presenter name (region: bottom_half, pick: largest_font)
  • date: Presentation date (near: "Date", parse: date)
  • slide_titles: Ordered list of slide titles (pick: largest_font, collected per page)

Known Limitations

  • Multi-slide-per-page PDFs (handout mode) are a known limitation: page_count no longer equals slide count
  • Slides with image-based titles or icons will not extract slide titles correctly
  • Presenter extraction often fails when slides include logos or affiliations with names
  • Non-English presentations may have reduced extraction accuracy
  • Google Slides exports vary in structure depending on export settings
  • Beamer (LaTeX) exports have very different structural signals

Provenance

All fixtures are sourced from synthetic templates created for testing purposes. See PROVENANCE.md for details on each fixture.