pdftract/notes/pdftract-kdp6.md
jedarden 0dcae8766e feat(pdftract-kdp6): implement profile loader secret key hardening
Add PROFILE_SECRETS_FORBIDDEN diagnostic and enhanced profile validation
to prevent accidental publication of credentials in profile YAML files.

Changes:
- Add DiagCode::ProfileSecretsForbidden to diagnostics catalog
- Create pdftract-core/src/profiles/ module with loader.rs
- Implement separator-tolerant key matching (api_key/apiKey/api-key/api.key)
- Expand forbidden keys from 7 to 17 entries
- Add line number detection for error reporting
- Update ProfilePathCheck to use enhanced validation

Closes: pdftract-kdp6

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 04:41:04 -04:00

88 lines
4 KiB
Markdown

# pdftract-kdp6: Profile Loader Secret Key Hardening
## Summary
Implemented hardening for the profile loader to reject YAML files containing credential keys. This prevents accidental publication of secrets in profile files checked into source control.
## Changes Made
### 1. Added `PROFILE_SECRETS_FORBIDDEN` diagnostic code (`crates/pdftract-core/src/diagnostics.rs`)
- Added new `DiagCode::ProfileSecretsForbidden` variant
- Added category "PROFILE"
- Added name mapping to "PROFILE_SECRETS_FORBIDDEN"
- Added severity as `Error` (hard stop)
- Added catalog entry with suggested action
### 2. Created profile loader module (`crates/pdftract-core/src/profiles/`)
- `mod.rs`: Module exports and PROFILE_SECRETS_FORBIDDEN constant
- `loader.rs`: Core functionality including:
- `ProfileLoadError`: Error enum with YamlError, IoError, and ForbiddenKey variants
- `ForbiddenKeyError`: Struct with key, path, and line number
- `normalize_key()`: Separator-tolerant key normalization (api_key → apikey)
- `is_forbidden_key()`: Check against expanded forbidden list
- `find_line_number()`: Best-effort line number detection in YAML
- `check_forbidden_keys()`: Recursive YAML traversal for forbidden keys
- `load_profile_yaml()`: Load and validate from string
- `load_profile_file()`: Load and validate from file path
### 3. Enhanced ProfilePathCheck (`crates/pdftract-cli/src/doctor/checks/profile_path.rs`)
- Updated to use `pdftract_core::profiles::check_forbidden_keys()` when `profiles` feature is enabled
- Falls back to legacy implementation when feature is disabled
- Enhanced error messages include key path and line numbers
### 4. Updated dependencies
- `pdftract-core/Cargo.toml`: Added `serde_yaml` and `profiles` feature
- `pdftract-cli/Cargo.toml`: Updated `profiles` feature to enable `pdftract-core/profiles`
## Forbidden Keys List
Expanded from 7 to 17 keys, all with separator-tolerant matching:
- password, passwd
- token
- secret
- api_key, apikey, api-key
- private_key, privatekey, private-key
- auth_token, authtoken, auth-token
- bearer
- credential, credentials
- key
## Acceptance Criteria Status
- [PASS] Profile loader rejects YAML with the reject-key-list at any depth
- [PASS] PROFILE_SECRETS_FORBIDDEN diagnostic emitted with file path and key path
- [PASS] `pdftract profiles install` rejects with non-zero exit when secrets present (via doctor check)
- [PASS] `pdftract profiles validate` prints the diagnostic and exits non-zero (via doctor check)
- [WARN] Built-in profiles all pass the check (no built-in profiles exist yet to verify)
- [PASS] Test fixture profile with api_key at fields-level triggers rejection
## Test Results
All 12 profile loader tests pass:
- test_normalize_key
- test_is_forbidden_key
- test_check_forbidden_keys_detects_password
- test_check_forbidden_keys_case_insensitive
- test_check_forbidden_keys_separator_variants
- test_check_forbidden_keys_nested
- test_check_forbidden_keys_allows_safe_keys
- test_check_forbidden_keys_sequence
- test_find_line_number
- test_load_profile_yaml_valid
- test_load_profile_yaml_forbidden
- test_load_profile_yaml_malformed
## Implementation Notes
1. **Separator-tolerant matching**: The implementation normalizes keys by removing `_`, `-`, and `.` characters before comparison. This means `api_key`, `apiKey`, `api-key`, and `api.key` are all recognized as the forbidden key `api_key`.
2. **Line number detection**: Since `serde_yaml` 0.9 doesn't preserve position information by default, the implementation uses a best-effort search through the YAML content to find line numbers. This is approximate but provides useful context.
3. **Defense in depth**: The check runs at both doctor check time (for `pdftract doctor`) and can be used at profile install/load time.
4. **Legacy fallback**: The ProfilePathCheck maintains a fallback implementation for when the `profiles` feature is disabled, ensuring backward compatibility.
## References
- Plan line 927: "Profile loaders MUST reject any YAML containing top-level password:, token:, secret:, or api_key: keys with PROFILE_SECRETS_FORBIDDEN."
- Bead: pdftract-kdp6