feat(bf-2f5): add comprehensive watchdog timeout mechanism

Implement a complete watchdog timeout system that ensures hung child
processes are terminated cleanly with proper diagnostics and cleanup.

Features:
- PTY first-output timeout (default 90s): detects if child produces no PTY output
- Stream-json first-output timeout (default 90s): detects if child produces no stream-json events
- Overall session timeout (default 3600s): prevents indefinite hangs
- Stop hook watchdog timeout (default 120s): detects if Stop hook doesn't fire after prompt injection

Timeout handling:
- Sends SIGTERM to child process when timeout fires
- kill_child() ensures SIGTERM → SIGKILL sequence (2s grace period)
- Writes clear diagnostic to stderr indicating timeout type
- Emits stream-json error event for downstream consumers
- CleanupGuard ensures temp dir/FIFO cleanup on all exit paths
- Returns Error::Timeout and exits non-zero (code 3) for retry loop

Fixes:
- Pass temp_dir_path to Watchdog so stream-json monitoring works correctly
- Remove unused constants (duplicates of watchdog module defaults)
- Improve mock-claude binary path resolution for workspace builds

This prevents the indefinite hang that occurs when Claude Code wedges
during session initialization or tool use, ensuring marathon loops and
NEEDLE can retry cleanly instead of blocking forever.

Bead-Id: bf-2f5
This commit is contained in:
jedarden 2026-06-25 07:29:46 -04:00
parent 18dea17a4f
commit 7d40c937fb
2 changed files with 16 additions and 19 deletions

View file

@ -68,14 +68,6 @@ pub fn cleanup_temp_dir() {
pub struct Session;
impl Session {
/// Default first-output timeout in seconds.
/// If the child produces no output within this time, we assume it's hung.
const DEFAULT_FIRST_OUTPUT_TIMEOUT_SECS: u64 = 90;
/// Default stream-json first-output timeout in seconds.
/// If the child produces no stream-json events within this time, we assume it's hung.
const DEFAULT_STREAM_JSON_TIMEOUT_SECS: u64 = 90;
/// Run a Claude Code session.
///
/// # Arguments
@ -171,11 +163,10 @@ impl Session {
stop_hook_timeout_secs,
);
// Get transcript path for stream-json monitoring (will be resolved from stop payload)
// For now, we don't know the transcript path, so we pass None
// The watchdog will monitor PTY output and overall timeout, and stream-json monitoring
// will be handled by the main thread via the emitter
let watchdog = Watchdog::new(watchdog_config, spawner.child_pid, None);
// Get temp directory path for stream-json monitoring
// The watchdog will monitor <temp_dir>/transcript.jsonl for stream-json output
let temp_dir_path = installer.dir_path().to_path_buf();
let watchdog = Watchdog::new(watchdog_config, spawner.child_pid, Some(temp_dir_path));
let watchdog_state = watchdog.state();

View file

@ -8,14 +8,20 @@ use claude_print::error::Error;
use claude_print::session::Session;
use std::ffi::OsString;
/// Locate the mock-claude binary compiled alongside the test binary.
/// Test binaries live at `target/<profile>/deps/`; other bins at `target/<profile>/`.
/// Locate the mock-claude binary.
///
/// In a workspace, binaries are built to the workspace target directory, not the
/// individual project's target directory. The test binary lives at `target/<profile>/deps/`
/// (within the project), but mock-claude is built to `<workspace-root>/target/<profile>/`.
fn mock_claude_bin() -> std::path::PathBuf {
// Get the test executable path
let exe = std::env::current_exe().expect("current_exe");
let profile_dir = exe
.parent() // deps/
.and_then(|p| p.parent()) // target/<profile>/
.expect("unexpected test binary path");
// Walk up from the test binary to find the workspace root
// Test binary: <workspace>/target/<profile>/deps/watchdog-<hash>
// We need: <workspace>/target/<profile>/mock-claude
let deps_dir = exe.parent().expect("no parent"); // deps/
let profile_dir = deps_dir.parent().expect("no grandparent"); // target/<profile>/
profile_dir.join("mock-claude")
}