## Why
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously
located `codex-execve-wrapper` by scanning `PATH` and sibling
directories. That lookup is brittle and can select the wrong binary when
the runtime environment differs from startup assumptions.
We already pass `codex-linux-sandbox` from `codex-arg0`;
`codex-execve-wrapper` should use the same startup-driven path plumbing.
## What changed
- Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper
executable paths:
- `codex_linux_sandbox_exe`
- `main_execve_wrapper_exe`
- Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to
top-level binaries and preserve helper paths created in
`prepend_path_entry_for_codex_aliases()`.
- Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`,
`tui`, `app-server`, and `mcp-server`.
- Added `main_execve_wrapper_exe` to core configuration plumbing
(`Config`, `ConfigOverrides`, and `SessionServices`).
- Updated zsh-fork shell escalation to consume the configured
`main_execve_wrapper_exe` and removed path-sniffing fallback logic.
- Updated app-server config reload paths so reloaded configs keep the
same startup-provided helper executable paths.
## References
- [`Arg0DispatchPaths`
definition](e355b43d5c/codex-rs/arg0/src/lib.rs (L20-L24))
- [`arg0_dispatch_or_else()` forwarding both
paths](e355b43d5c/codex-rs/arg0/src/lib.rs (L145-L176))
- [zsh-fork escalation using configured wrapper
path](e355b43d5c/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L109-L150))
## Testing
- `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p
codex-mcp-server -p codex-app-server`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core tools::runtimes:🐚:unix_escalation:: --
--nocapture`
Rename `SkillMetadata.path` to `SkillMetadata.path_to_skills_md` for
clarity.
Would ideally change the type to `AbsolutePathBuf`, but that can be done
later.
## Summary
- add graceful websocket app-server restart on Ctrl-C by draining until
no assistant turns are running
- stop the websocket acceptor and disconnect existing connections once
the drain condition is met
- add a websocket integration test that verifies Ctrl-C waits for an
in-flight turn before exit
## Verification
- `cargo check -p codex-app-server --quiet`
- `cargo test -p codex-app-server --test all
suite::v2::connection_handling_websocket`
- I (maxj) tested remote and local Codex.app
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- detect skill-invoking shell commands based on the original command
string, request approvals when needed, and cache positive decisions per
session
- keep implicit skill invocation emitted after approval and keep skill
approval decline messaging centralized to the shell handler
- expand and adjust skill approval tests to cover shell-based skill
scripts while matching the new detection expectations
Testing
- Not run (not requested)
## Why
This PR switches the `shell_command` zsh-fork path over to
`codex-shell-escalation` so the new shell tool can use the shared
exec-wrapper/escalation protocol instead of the `zsh_exec_bridge`
implementation that was introduced in
https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on
UNIX domain sockets, which is not as tamper-proof as the FD-based
approach in `codex-shell-escalation`.
## What Changed
- Added a Unix zsh-fork runtime adapter in `core`
(`core/src/tools/runtimes/shell/unix_escalation.rs`) that:
- runs zsh-fork commands through
`codex_shell_escalation::run_escalate_server`
- bridges exec-policy / approval decisions into `ShellActionProvider`
- executes escalated commands via a `ShellCommandExecutor` that calls
`process_exec_tool_call`
- Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to
select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving
the generic `shell` tool path unchanged.
- Removed the `zsh_exec_bridge`-based session service and deleted
`core/src/zsh_exec_bridge/mod.rs`.
- Moved exec-wrapper entrypoint dispatch to `arg0` by handling the
`codex-execve-wrapper` arg0 alias there, and removed the old
`codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and
`app-server` mains.
- Added the needed `codex-shell-escalation` dependencies for `core` and
`arg0`.
## Tests
- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-app-server turn_start_shell_zsh_fork --
--nocapture`
- verifies zsh-fork command execution and approval flows through the new
backend
- includes subcommand approve/decline coverage using the shared zsh
DotSlash fixture in `app-server/tests/suite/zsh`
- To test manually, I added the following to `~/.codex/config.toml`:
```toml
zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh"
[features]
shell_zsh_fork = true
```
Then I ran `just c` to run the dev build of Codex with these changes and
sent it the message:
```
run `echo $0`
```
And it replied with:
```
echo $0 printed:
/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh
In this tool context, $0 reflects the script path used to invoke the shell, not just zsh.
```
so the tool appears to be wired up correctly.
## Notes
- The zsh subcommand-decline integration test now uses `rm` under a
`WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is
auto-allowed by the new `shell-escalation` policy path, which no longer
produces subcommand approval prompts.
## Summary
Introduces the initial implementation of Feature::RequestPermissions.
RequestPermissions allows the model to request that a command be run
inside the sandbox, with additional permissions, like writing to a
specific folder. Eventually this will include other rules as well, and
the ability to persist these permissions, but this PR is already quite
large - let's get the core flow working and go from there!
<img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
/>
## Testing
- [x] Added tests
- [x] Tested locally
- [x] Feature
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.
update logic to rely on bundled models instead, update tests.
## Why
We already plan to remove the shell-tool MCP path, and doing that
cleanup first makes the follow-on `shell-escalation` work much simpler.
This change removes the last remaining reason to keep
`codex-rs/exec-server` around by moving the `codex-execve-wrapper`
binary and shared shell test fixtures to the crates/tests that now own
that functionality.
## What Changed
### Delete `codex-rs/exec-server`
- Remove the `exec-server` crate, including the MCP server binary,
MCP-specific modules, and its test support/test suite
- Remove `exec-server` from the `codex-rs` workspace and update
`Cargo.lock`
### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
- Move the wrapper implementation into `shell-escalation`
(`src/unix/execve_wrapper.rs`)
- Add the `codex-execve-wrapper` binary entrypoint under
`shell-escalation/src/bin/`
- Update `shell-escalation` exports/module layout so the wrapper
entrypoint is hosted there
- Move the wrapper README content from `exec-server` to
`shell-escalation/README.md`
### Move shared shell test fixtures to `app-server`
- Move the DotSlash `bash`/`zsh` test fixtures from
`exec-server/tests/suite/` to `app-server/tests/suite/`
- Update `app-server` zsh-fork tests to reference the new fixture paths
### Keep `shell-tool-mcp` as a shell-assets package
- Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
artifact contains only patched Bash/Zsh payloads (no Rust binaries)
- Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
and docs to reflect the shell-assets-only package shape
- `shell-tool-mcp-ci.yml` does not need changes because it is already
JS-only
## Verification
- `cargo shear`
- `cargo clippy -p codex-shell-escalation --tests`
- `just clippy`
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Why
The zsh integration tests were still brittle in two ways:
- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI
In particular, the Linux failures were not just test noise:
- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm
This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.
## What Changed
- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
- tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
- resolving an absolute `git` path
- running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text
## Verification
- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
- keep the per-thread app-server listener alive when the last client
unsubscribes or disconnects
- preserve listener-side active turn history so running `thread/resume`
can merge an in-progress turn snapshot after reconnect
- add `ThreadStateManager` regressions for disconnect/unsubscribe
retention and explicit thread teardown cleanup
Added unit tests, and I manually tested to confirm the fix
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.
This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.
## What Changed
- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
- `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
- `codex_protocol::protocol`
- `codex_protocol::config_types`
- `codex_protocol::models`
- `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
- `codex-utils-approval-presets`
- `codex-utils-cli`
## Verification
- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
## Summary
- switch a few app-server `turn_start` tests from
`codex/event/task_complete` waits to `turn/completed` waits
- avoid matching unrelated/background `task_complete` events
- keep this flaky test fix separate from the /title feature PR
## Why
On Windows ARM CI, these tests can return early after observing a
generic `codex/event/task_complete` notification from another task. That
can leave the mock Responses server with fewer calls than expected and
fail the test with a wiremock verification mismatch.
Using `turn/completed` matches the app-server turn lifecycle
notification the tests actually care about.
## Validation
- `cargo test -p codex-app-server
turn_start_updates_sandbox_and_cwd_between_turns_v2 -- --nocapture`
- `cargo test -p codex-app-server turn_start_exec_approval_ --
--nocapture`
- `just fmt`
## Why
`thread/resume` responses for already-running threads can be reported as
`Idle` even while a turn is still in progress. This is caused by a
timing window where the runtime watch state has not yet observed the
running-thread transition, so API clients can receive stale status
information at resume time.
Possibly related: https://github.com/openai/codex/pull/11786
## What
- Add a shared status normalization helper, `resolve_thread_status`, in
`codex-rs/app-server/src/thread_status.rs` that resolves
`Idle`/`NotLoaded` to `Active { active_flags: [] }` when an in-progress
turn is known.
- Reuse this helper across thread response paths in
`codex-rs/app-server/src/codex_message_processor.rs` (including
`thread/start`, `thread/unarchive`, `thread/read`, `thread/resume`,
`thread/fork`, and review/thread-started notification responses).
- In `handle_pending_thread_resume_request`, use both the in-memory
`active_turn_snapshot` and the resumed rollout turns to decide whether a
turn is in progress before resolving thread status for the response.
- Extend `thread_status` tests to validate the new status-resolution
behavior directly.
## Verification
- `cargo test -p codex-app-server
suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
Exposes through the app server updated names set for a thread. This
enables other surfaces to use the core as the source of truth for thread
naming. `threadName` is gathered using the helper functions used to
interact with `session_index.jsonl`, and is hydrated in:
- `thread/list`
- `thread/read`
- `thread/resume`
- `thread/unarchive`
- `thread/rollback`
We don't do this for `thread/start` and `thread/fork`.
Hardens codex-rs/app-server connection lifecycle and outbound routing
for websocket clients. Fixes some FUD I was having
- Added per-connection disconnect signaling (CancellationToken) for
websocket transports.
- Split websocket handling into independent inbound/outbound tasks
coordinated by cancellation.
- Changed outbound routing so websocket connections use non-blocking
try_send; slow/full websocket writers are disconnected instead of
stalling broadcast delivery.
- Kept stdio behavior blocking-on-send (no forced disconnect) so local
stdio clients are not dropped when queues are temporarily full.
- Simplified outbound router flow by removing deferred
pending_closed_connections handling.
- Added guards to drop incoming response/notification/error messages
from unknown connections.
- Fixed listener teardown race in thread listener tasks using a
listener_generation check so stale tasks do not clear newer listeners.
Fixes
https://linear.app/openai/issue/CODEX-4966/multiclient-handle-slow-notification-consumers
## Tests
Added/updated transport tests covering:
- broadcast does not block on a slow/full websocket connection
- stdio connection waits instead of disconnecting on full queue
I (maxj) have tested manually and will retest before landing
## Summary
Adds support for a Unix socket escape hatch so we can bypass socket
allowlisting when explicitly enabled.
## Description
* added a new flag, `network.dangerously_allow_all_unix_sockets` as an
explicit escape hatch
* In codex-network-proxy, enabling that flag now allows any absolute
Unix socket path from x-unix-socket instead of requiring each path to be
explicitly allowlisted. Relative paths are still rejected.
* updated the macOS seatbelt path in core so it enforces the same Unix
socket behavior:
* allowlisted sockets generate explicit network* subpath rules
* allow-all generates a broad network* (subpath "/") rule
---------
Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.
- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.
Example:
- 3 calls:
- a.com (line 443)
- b.com (line 443)
- a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443
## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
- add `LOG_FORMAT=json` support for app-server tracing logs via
`tracing_subscriber`'s built-in JSON formatter
- keep the default human-readable format unchanged and keep `RUST_LOG`
filtering behavior
- document the env var and update lockfile
thread/resume response includes latest turn with all items, in band so
no events are stale or lost
Testing
- e2e tested using app-server-test-client using flow described in
"Testing Thread Rejoin Behavior" in
codex-rs/app-server-test-client/README.md
- e2e tested in codex desktop by reconnecting to a running turn
## Why
`app/list` emits `app/list/updated` after whichever async load finishes
first (directory connectors or accessible tools). This test assumed the
directory-backed update always arrived first because it injected a tools
delay, but that assumption is not stable when the process-global Codex
Apps tools cache is already warm. In that case the accessible-tools path
can return immediately and the first notification shape flips, which
makes the assertion flaky.
Relevant code paths:
-
[`codex-rs/app-server/src/codex_message_processor.rs`](13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034))
(concurrent loads + per-load `app/list/updated` notifications)
-
[`codex-rs/core/src/mcp_connection_manager.rs`](13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197))
(Codex Apps tools cache hit path)
## What Changed
Updated
`suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags`
in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either
valid first `app/list/updated` payload:
- the directory-first snapshot
- the accessible-tools-first snapshot
The test still keeps the later assertions strict:
- the second `app/list/updated` notification must be the fully merged
result
- the final `app/list` response must match the same merged result
I also added an inline comment explaining why the first notification is
intentionally order-insensitive.
## Verification
- `cargo test -p codex-app-server`
## Why
Several tests intentionally exercise behavior while a turn is still
active. The cleanup sequence for those tests (`turn/interrupt` + waiting
for `codex/event/turn_aborted`) was duplicated across files, which made
the rationale easy to lose and the pattern easy to apply inconsistently.
This change centralizes that cleanup in one place with a single
explanatory doc comment.
## What Changed
### Added shared helper
In `codex-rs/app-server/tests/common/mcp_process.rs`:
- Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`.
- Added a doc comment explaining why explicit interrupt + terminal wait
is required for tests that intentionally leave a turn in-flight.
### Migrated call sites
Replaced duplicated interrupt/aborted blocks with the helper in:
- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`
- `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs`
- `turn_start_shell_zsh_fork_executes_command_v2`
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- `codex-rs/app-server/tests/suite/v2/turn_steer.rs`
- `turn_steer_returns_active_turn_id`
### Existing cleanup retained
In `codex-rs/app-server/tests/suite/v2/turn_start.rs`:
- `turn_start_accepts_local_image_input` continues to explicitly wait
for `turn/completed` so the turn lifecycle is fully drained before test
exit.
## Verification
- `cargo test -p codex-app-server`
## Why
`thread_resume` tests can intentionally create an in-flight turn, assert
a `thread/resume` error path, and return immediately. That leaves turn
work active during teardown, which can surface as intermittent `LEAK`
failures.
Sample output that motivated this investigation (reported during test
runs):
```text
LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch
```
## What Changed
Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`:
- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`
Both tests now:
1. capture the running turn id from `TurnStartResponse`
2. assert the expected `thread/resume` error
3. call `turn/interrupt` for that running turn
4. wait for `codex/event/turn_aborted` before returning
## Why This Is The Correct Fix
These tests are specifically validating resume behavior while a turn is
active. They should also own cleanup of that active turn before exiting.
Explicitly interrupting and waiting for the terminal abort notification
removes teardown races and avoids relying on process-drop behavior to
clean up in-flight work.
## Repro / Verification
Repro command used for investigation:
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) | test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)'
```
Observed before this change: intermittent `LEAK` in
`thread_resume_rejects_history_when_thread_is_running`.
Also verified with:
- `cargo test -p codex-app-server`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269).
* #12271
* __->__ #12269
## Summary
Implements a configurable MCP OAuth callback URL override for `codex mcp
login` and app-server OAuth login flows, including support for non-local
callback endpoints (for example, devbox ingress URLs).
## What changed
- Added new config key: `mcp_oauth_callback_url` in
`~/.codex/config.toml`.
- OAuth authorization now uses `mcp_oauth_callback_url` as
`redirect_uri` when set.
- Callback handling validates the callback path against the configured
redirect URI path.
- Listener bind behavior is now host-aware:
- local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to
`127.0.0.1`
- non-local callback URL hosts bind to `0.0.0.0`
- `mcp_oauth_callback_port` remains supported and is used for the
listener port.
- Wired through:
- CLI MCP login flow
- App-server MCP OAuth login flow
- Skill dependency OAuth login flow
- Updated config schema and config tests.
## Why
Some environments need OAuth callbacks to land on a specific reachable
URL (for example ingress in remote devboxes), not loopback. This change
allows that while preserving local defaults for existing users.
## Backward compatibility
- No behavior change when `mcp_oauth_callback_url` is unset.
- Existing `mcp_oauth_callback_port` behavior remains intact.
- Local callback flows continue binding to loopback by default.
## Testing
- `cargo test -p codex-rmcp-client callback -- --nocapture`
- `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture`
- `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client`
## Example config
```toml
mcp_oauth_callback_port = 5555
mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"
## Why
`cargo nextest` was intermittently reporting `LEAK` for
`codex-app-server` tests even when assertions passed. This adds noise
and flakiness to local/CI signals.
Sample output used as the basis of this investigation:
```text
LEAK [ 7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1
LEAK [ 7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model
LEAK [ 7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests
LEAK [ 8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2
LEAK [ 8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item
LEAK [ 8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2
LEAK [ 6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2
LEAK [ 6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2
```
## How I Reproduced
I focused on the suspect tests and ran them under `nextest` stress mode
with leak reporting enabled.
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) | test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) | test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) | test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) | test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) | test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) | test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)'
```
This reproduced intermittent `LEAK` statuses while tests still passed.
## What Changed
In `codex-rs/app-server/tests/common/mcp_process.rs`:
- Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown
can explicitly close stdin.
- In `Drop`, close stdin first to trigger EOF-based graceful shutdown.
- Wait briefly for graceful exit.
- If still running, fall back to `start_kill()` and the existing bounded
`try_wait()` loop.
- Updated send-path handling to bail if stdin is already closed.
## Why This Is the Right Fix
The leak signal was caused by child-process teardown timing, not
test-logic assertion failure. The helper previously relied mostly on
force-kill timing in `Drop`; that can race with nextest leak detection.
Closing stdin first gives `codex-app-server` a deterministic, graceful
shutdown path before force-kill. Keeping the force-kill fallback
preserves robustness if graceful shutdown does not complete in time.
## Verification
- `cargo test -p codex-app-server`
- Re-ran the stress repro above after this change: no `LEAK` statuses
observed.
- Additional high-signal stress run also showed no leaks:
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)'
```
TL;DR
Add top-level `model_catalog_json` config support so users can supply a
local model catalog override from a JSON file path (including adding new
models) without backend changes.
### Problem
Codex previously had no clean client-side way to replace/overlay model
catalog data for local testing of model metadata and new model entries.
### Fix
- Add top-level `model_catalog_json` config field (JSON file path).
- Apply catalog entries when resolving `ModelInfo`:
1. Base resolved model metadata (remote/fallback)
2. Catalog overlay from `model_catalog_json`
3. Existing global top-level overrides (`model_context_window`,
`model_supports_reasoning_summaries`, etc.)
### Note
Will revisit per-field overrides in a follow-up
### Tests
Added tests
## Summary
- stabilize
`thread_resume_rejoins_running_thread_even_with_override_mismatch` by
using a valid delayed second SSE response instead of an intentionally
truncated stream
- set `RUST_MIN_STACK=4194304` for spawned app-server test processes in
`McpProcess` to avoid stack-sensitive CI overflows in detached review
tests
## Why
- the thread-resume assertion could race with a mocked stream-disconnect
error and intermittently observe `systemError`
- detached review startup is stack-sensitive in some CI environments;
pinning a larger stack in the test harness removes that flake without
changing product behavior
## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
- `cargo test -p codex-app-server --test all
suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
Motivation
- Today, a newly connected client has no direct way to determine the
current runtime status of threads from read/list responses alone.
- This forces clients to infer state from transient events, which can
lead to stale or inconsistent UI when reconnecting or attaching late.
Changes
- Add `status` to `thread/read` responses.
- Add `statuses` to `thread/list` responses.
- Emit `thread/status/changed` notifications with `threadId` and the new
status.
- Track runtime status for all loaded threads and default unknown
threads to `idle`.
- Update protocol/docs/tests/schema fixtures for the revised API.
Testing
- Validated protocol API changes with automated protocol tests and
regenerated schema/type fixtures.
- Validated app-server behavior with unit and integration test suites,
including status transitions and notifications.
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.
The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051
- https://github.com/openai/codex/pull/12052👈
### Summary
This PR introduces a feature-gated native shell runtime path that routes
shell execution through a patched zsh exec bridge, removing MCP-specific
behavior from the shell hot path while preserving existing
CommandExecution lifecycle semantics.
When shell_zsh_fork is enabled, shell commands run via patched zsh with
per-`execve` interception through EXEC_WRAPPER. Core receives wrapper
IPC requests over a Unix socket, applies existing approval policy, and
returns allow/deny before the subcommand executes.
### What’s included
**1) New zsh exec bridge runtime in core**
- Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for
EXEC_WRAPPER invocations.
- Per-execution Unix-socket IPC handling for wrapper requests/responses.
- Approval callback integration using existing core approval
orchestration.
- Streaming stdout/stderr deltas to existing command output event
pipeline.
- Error handling for malformed IPC, denial/abort, and execution
failures.
**2) Session lifecycle integration**
SessionServices now owns a `ZshExecBridge`.
Session startup initializes bridge state; shutdown tears it down
cleanly.
**3) Shell runtime routing (feature-gated)**
When `shell_zsh_fork` is enabled:
- Build execution env/spec as usual.
- Add wrapper socket env wiring.
- Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of
the regular shell path.
- Non-zsh-fork behavior remains unchanged.
**4) Config + feature wiring**
- Added `Feature::ShellZshFork` (under development).
- Added config support for `zsh_path` (optional absolute path to patched
zsh):
- `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema.
- Session startup validates that `zsh_path` exists/usable when zsh-fork
is enabled.
- Added startup test for missing `zsh_path` failure mode.
**5) Seatbelt/sandbox updates for wrapper IPC**
- Extended seatbelt policy generation to optionally allow outbound
connection to explicitly permitted Unix sockets.
- Wired sandboxing path to pass wrapper socket path through to seatbelt
policy generation.
- Added/updated seatbelt tests for explicit socket allow rule and
argument emission.
**6) Runtime entrypoint hooks**
- This allows the same binary to act as the zsh wrapper subprocess when
invoked via `EXEC_WRAPPER`.
**7) Tool selection behavior**
- ToolsConfig now prefers ShellCommand type when shell_zsh_fork is
enabled.
- Added test coverage for precedence with unified-exec enabled.
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051👈
- https://github.com/openai/codex/pull/12052
With upcoming support for a fork of zsh that allows us to intercept
`execve` and run execpolicy checks for each subcommand as part of a
`CommandExecution`, it will be possible for there to be multiple
approval requests for a shell command like `/path/to/zsh -lc 'git status
&& rg \"TODO\" src && make test'`.
To support that, this PR introduces a new `approval_id` field across
core, protocol, and app-server so that we can associate approvals
properly for subcommands.
### Summary
Ensure that we use the model value from the response header only so that
we are guaranteed with the correct slug name. We are no longer checking
against the model value from response so that we are less likely to have
false positive.
There are two different treatments - for SSE we use the header from the
response and for websocket we check top-level events.
* Add v2 server notifications `thread/archived` and `thread/unarchived`
with a `threadId` payload.
* Wire new events into `thread/archive` and `thread/unarchive` success
paths.
* Update app-server protocol/schema/docs accordingly.
Testing:
- Updated archive/unarchive end-to-end tests to verify both
notifications are emitted with the expected thread id payload.
rm `remote_models` feature flag.
We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.
Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
### Summary
Builiding off
5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0),
we have created a `model/rerouted` notification that captures the event
so that consumers can render as expected. Keep the `EventMsg::Warning`
path in core so that this does not affect TUI rendering.
`model/rerouted` is meant to be generic to account for future usage
including capacity planning etc.
Add per-turn notice when a request is downgraded to a fallback model due
to cyber safety checks.
**Changes**
- codex-api: Emit a ServerModel event based on the openai-model response
header and/or response payload (SSE + WebSocket), including when the
model changes mid-stream.
- core: When the server-reported model differs from the requested model,
emit a single per-turn warning explaining the reroute to gpt-5.2 and
directing users to Trusted
Access verification and the cyber safety explainer.
- app-server (v2): Surface these cyber model-routing warnings as
synthetic userMessage items with text prefixed by Warning: (and document
this behavior).
## Summary
This feature is now reasonably stable, let's remove it so we can
simplify our upcoming iterations here.
## Testing
- [x] Existing tests pass