## What changed
- keep the explicit stdin-close behavior after writing so the child
still receives EOF deterministically
- on Windows, stop using `python -c` for the round-trip assertion and
instead run a native `cmd.exe` pipeline that reads one line from stdin
with `set /p` and echoes it back
- send `
` on Windows so the stdin payload matches the platform-native line
ending the shell reader expects
## Why this fixes flakiness
The failing branch-local flake was not in `spawn_pipe_process` itself.
The child exited cleanly, but the Windows ARM runner sometimes produced
an empty stdout string when the test used Python as the stdin consumer.
That makes the test sensitive to Python startup and stdin-close timing
rather than the pipe primitive we actually want to validate. Switching
the Windows path to a native `cmd.exe` reader keeps the assertion
focused on our pipe behavior: bytes written to stdin should come back on
stdout before EOF closes the process. The explicit `
` write removes line-ending ambiguity on Windows.
## Scope
- test-only
- no production logic change
## Summary
- run the split stdout/stderr PTY test through the normal shell helper
on every platform
- use a Windows-native command string instead of depending on Python to
emit split streams
- assert CRLF line endings on Windows explicitly
## Why this fixes the flake
The earlier PTY split-output test used a Python one-liner on Windows
while the rest of the file exercised shell-command behavior. That made
the test depend on runner-local Python availability and masked the real
Windows shell output shape. Using a native cmd-compatible command and
asserting the actual CRLF output makes the split stdout/stderr coverage
deterministic on Windows runners.
(Experimental)
This PR adds a first MVP for hooks, with SessionStart and Stop
The core design is:
- hooks live in a dedicated engine under codex-rs/hooks
- each hook type has its own event-specific file
- hook execution is synchronous and blocks normal turn progression while
running
- matching hooks run in parallel, then their results are aggregated into
a normalized HookRunSummary
On the AppServer side, hooks are exposed as operational metadata rather
than transcript-native items:
- new live notifications: hook/started, hook/completed
- persisted/replayed hook results live on Turn.hookRuns
- we intentionally did not add hook-specific ThreadItem variants
Hooks messages are not persisted, they remain ephemeral. The context
changes they add are (they get appended to the user's prompt)
## What changed
- The PTY Python REPL test now starts Python with a startup marker
already embedded in argv.
- The test waits for that marker in PTY output before making assertions.
## Why this fixes the flake
- The old version tried to probe the live REPL almost immediately after
spawn.
- That races PTY initialization, Python startup, and prompt buffering,
all of which vary across platforms and CI load.
- By having the child process emit a known marker as part of its own
startup path, the test gets a deterministic synchronization point that
comes from the process under test rather than from guessed timing.
## Scope
- Test-only change.
Addresses #13945
The vendored WezTerm ConPTY backend in
`codex-rs/utils/pty/src/win/mod.rs` treated `TerminateProcess` return
values backwards: nonzero success was handled as failure, and `0`
failure was handled as success.
This is likely causing a number of bugs reported against Codex running
on Windows native where processes are not cleaned up.
Enhance pty utils:
* Support closing stdin
* Separate stderr and stdout streams to allow consumers differentiate them
* Provide compatibility helper to merge both streams back into combined one
* Support specifying terminal size for pty, including on-demand resizes while process is already running
* Support terminating the process while still consuming its outputs
Hardens PTY Python REPL test and make MCP test startup deterministic
**Summary**
- `utils/pty/src/tests.rs`
- Added a REPL readiness handshake (`wait_for_python_repl_ready`) that
repeatedly sends a marker and waits for it in PTY output before sending
test commands.
- Updated `pty_python_repl_emits_output_and_exits` to:
- wait for readiness first,
- preserve startup output,
- append output collected through process exit.
- Reduces Windows/ConPTY flakiness from early stdin writes racing REPL
startup.
- `mcp-server/tests/suite/codex_tool.rs`
- Avoid remote model refresh during MCP test startup, reducing
timeout-prone nondeterminism.
This PR changes stdio MCP child processes to run in their own process
group
* Add guarded teardown in codex-rmcp-client: send SIGTERM to the group
first, then SIGKILL after a short grace period.
* Add terminate_process_group helper in process_group.rs.
* Add Unix regression test in process_group_cleanup.rs to verify wrapper
+ grandchild are reaped on client drop.
Addresses reported MCP process/thread storm: #10581
Unified exec isn't working on Linux because we don't provide the correct
arg0.
The library we use for pty management doesn't allow setting arg0
separately from executable. Use the same aliasing strategy we use for
`apply_patch` for `codex-linux-sandbox`.
Use `#[ctor]` hack to dispatch codex-linux-sandbox calls.
Addresses https://github.com/openai/codex/issues/6450