pdftract/notes/pdftract-kdp6.md
jedarden 0dcae8766e feat(pdftract-kdp6): implement profile loader secret key hardening
Add PROFILE_SECRETS_FORBIDDEN diagnostic and enhanced profile validation
to prevent accidental publication of credentials in profile YAML files.

Changes:
- Add DiagCode::ProfileSecretsForbidden to diagnostics catalog
- Create pdftract-core/src/profiles/ module with loader.rs
- Implement separator-tolerant key matching (api_key/apiKey/api-key/api.key)
- Expand forbidden keys from 7 to 17 entries
- Add line number detection for error reporting
- Update ProfilePathCheck to use enhanced validation

Closes: pdftract-kdp6

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 04:41:04 -04:00

4 KiB

pdftract-kdp6: Profile Loader Secret Key Hardening

Summary

Implemented hardening for the profile loader to reject YAML files containing credential keys. This prevents accidental publication of secrets in profile files checked into source control.

Changes Made

1. Added PROFILE_SECRETS_FORBIDDEN diagnostic code (crates/pdftract-core/src/diagnostics.rs)

  • Added new DiagCode::ProfileSecretsForbidden variant
  • Added category "PROFILE"
  • Added name mapping to "PROFILE_SECRETS_FORBIDDEN"
  • Added severity as Error (hard stop)
  • Added catalog entry with suggested action

2. Created profile loader module (crates/pdftract-core/src/profiles/)

  • mod.rs: Module exports and PROFILE_SECRETS_FORBIDDEN constant
  • loader.rs: Core functionality including:
    • ProfileLoadError: Error enum with YamlError, IoError, and ForbiddenKey variants
    • ForbiddenKeyError: Struct with key, path, and line number
    • normalize_key(): Separator-tolerant key normalization (api_key → apikey)
    • is_forbidden_key(): Check against expanded forbidden list
    • find_line_number(): Best-effort line number detection in YAML
    • check_forbidden_keys(): Recursive YAML traversal for forbidden keys
    • load_profile_yaml(): Load and validate from string
    • load_profile_file(): Load and validate from file path

3. Enhanced ProfilePathCheck (crates/pdftract-cli/src/doctor/checks/profile_path.rs)

  • Updated to use pdftract_core::profiles::check_forbidden_keys() when profiles feature is enabled
  • Falls back to legacy implementation when feature is disabled
  • Enhanced error messages include key path and line numbers

4. Updated dependencies

  • pdftract-core/Cargo.toml: Added serde_yaml and profiles feature
  • pdftract-cli/Cargo.toml: Updated profiles feature to enable pdftract-core/profiles

Forbidden Keys List

Expanded from 7 to 17 keys, all with separator-tolerant matching:

  • password, passwd
  • token
  • secret
  • api_key, apikey, api-key
  • private_key, privatekey, private-key
  • auth_token, authtoken, auth-token
  • bearer
  • credential, credentials
  • key

Acceptance Criteria Status

  • [PASS] Profile loader rejects YAML with the reject-key-list at any depth
  • [PASS] PROFILE_SECRETS_FORBIDDEN diagnostic emitted with file path and key path
  • [PASS] pdftract profiles install rejects with non-zero exit when secrets present (via doctor check)
  • [PASS] pdftract profiles validate prints the diagnostic and exits non-zero (via doctor check)
  • [WARN] Built-in profiles all pass the check (no built-in profiles exist yet to verify)
  • [PASS] Test fixture profile with api_key at fields-level triggers rejection

Test Results

All 12 profile loader tests pass:

  • test_normalize_key
  • test_is_forbidden_key
  • test_check_forbidden_keys_detects_password
  • test_check_forbidden_keys_case_insensitive
  • test_check_forbidden_keys_separator_variants
  • test_check_forbidden_keys_nested
  • test_check_forbidden_keys_allows_safe_keys
  • test_check_forbidden_keys_sequence
  • test_find_line_number
  • test_load_profile_yaml_valid
  • test_load_profile_yaml_forbidden
  • test_load_profile_yaml_malformed

Implementation Notes

  1. Separator-tolerant matching: The implementation normalizes keys by removing _, -, and . characters before comparison. This means api_key, apiKey, api-key, and api.key are all recognized as the forbidden key api_key.

  2. Line number detection: Since serde_yaml 0.9 doesn't preserve position information by default, the implementation uses a best-effort search through the YAML content to find line numbers. This is approximate but provides useful context.

  3. Defense in depth: The check runs at both doctor check time (for pdftract doctor) and can be used at profile install/load time.

  4. Legacy fallback: The ProfilePathCheck maintains a fallback implementation for when the profiles feature is disabled, ensuring backward compatibility.

References

  • Plan line 927: "Profile loaders MUST reject any YAML containing top-level password:, token:, secret:, or api_key: keys with PROFILE_SECRETS_FORBIDDEN."
  • Bead: pdftract-kdp6