pdftract/profiles/builtin/contract/README.md
jedarden eec40dad15 docs(pdftract-4iier): complete per-profile README documentation
Add comprehensive README files for all 9 built-in profiles (invoice,
receipt, contract, scientific_paper, slide_deck, form, bank_statement,
legal_filing, book_chapter). Each README includes:
- Match Criteria Summary: prose description of what makes a document match
- Extracted Fields table: field_name, type, description, example, source_hint
- Known Limitations: bullet list of edge cases and failure modes
- Sample Input Pointer: links to fixtures directory
- Configuration Tips: how to override via --profile or export

The xtask doc-profile skeleton generator was already implemented
and was used to generate the initial skeleton, which was then enhanced
with profile-specific human-authored content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 00:35:35 -04:00

2.9 KiB

CONTRACT Profile

Legal contract with parties, effective date, term, signatures

Match Criteria Summary

A document matches this profile when it exhibits the formal structure and language of a legal agreement. The classifier identifies contract-specific terminology such as "agreement is made", "terms and conditions", "effective date", "governing law", and "indemnification". Structurally, contracts are multi-page documents (typically 2-50 pages) with signature blocks in the final pages. The presence of defined legal language patterns combined with signature block detection is the strongest matching signal. Contracts often use formal legal language and may include recitals, numbered sections, and definitions sections.

Extracted Fields

Field Type Description Example Value Source Hint
parties array Extracted from page text using pattern matching [...] regex patterns
effective_date date Extracted from page text using pattern matching 2024-01-15 regex patterns
term string Extracted from page text using pattern matching "example value" regex patterns
governing_law string Extracted from page text using pattern matching "example value" regex patterns
signatures array Extracted from page text using pattern matching [...] region: bottom_20_percent

Known Limitations

  • Contracts with more than two parties may not extract all parties correctly
  • Signature extraction depends on clear text signatures; typed signatures are extracted but handwritten signatures are not OCR'd
  • Complex contract structures (e.g., exhibits, appendices) may not be fully captured
  • Contracts with amendments or riders attached may extract only the primary agreement
  • Non-English contracts may not match due to English-only text patterns
  • Contracts with scanned signatures (images) will not extract signature names
  • Term extraction may fail for contracts with complex duration formulas (e.g., "until completion of services")
  • Governing law extraction may capture jurisdiction incorrectly for federal/international agreements

Sample Input

Example fixtures demonstrating this profile are available in tests/fixtures/profiles/contract/.

See the classifier corpus for representative documents.

Configuration Tips

To override this profile:

pdftract profiles export contract > my-profile.yaml
# Edit my-profile.yaml to customize match criteria, fields, or extraction patterns
pdftract extract --profile my-profile.yaml document.pdf

For specific contract types (e.g., NDAs, employment agreements), consider adding contract-type-specific text patterns to improve matching. For international contracts, add region-specific governing law patterns.


This README was auto-generated from profile.yaml. Update the Match Criteria Summary and Known Limitations sections with profile-specific guidance.