Currently `thread/name/set` does only work for loaded threads.
Expand the scope to also support persisted but not-yet-loaded ones for a
more predictable API surface.
This will make it possible to rename threads discovered via
`thread/list` and similar operations.
## Summary
- increase `DEFAULT_READ_TIMEOUT` in `codex_message_processor_flow` from
20s to 45s
- keep test behavior the same while avoiding platform timing flakes
## Why
Windows ARM64 CI showed these tests taking about 24s before
`task_complete`, which could fail early and produce wiremock
request-count mismatches.
## Testing
- just fmt
- cargo test -p codex-app-server codex_message_processor_flow --
--nocapture
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path
---------
Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
This change fixes a Codex app account-state sync bug where clients could
know the user was signed in but still miss the ChatGPT subscription
tier, which could lead to incorrect upgrade messaging for paid users.
The root cause was that `account/updated` only carried `authMode` while
plan information was available separately via `account/read` and
rate-limit snapshots, so this update adds `planType` to
`account/updated`, populates it consistently across login and refresh
paths.
Addresses https://github.com/openai/codex/issues/13007 and
https://github.com/openai/codex/issues/12170
There are situations where the ChatGPT auth backend might return a JWT
that contains no plan information. Most code paths already handle this
case well, but the internal implementation of the "account/read" app
server call was failing in this case (returning an error rather than
properly returning None for the plan).
This resulted in a situation where users needed to log in every time the
extension or app started even if they successfully logged in the last
time.
Summary
- allow ChatGPT-authenticated accounts to fall back to
`AccountPlanType::Unknown` when the token omits the plan claim
- add regression coverage in `app-server/tests/suite/v2/account.rs` to
confirm `account/read` returns `plan_type: Unknown` when the claim is
absent
- ensure the Rust auth helpers and fixtures treat missing plan claims as
Optional and default to `Unknown`
Stablize app list updated event so that we only send 2 updates: 1 when
installed apps become available, one when all directory apps are
available. Previously it also updates when directory apps become
available before installed apps, which cuts off installed apps.
Replay pending client requests after `thread/resume` and emit resolved
notifications when those requests clear so approval/input UI state stays
in sync after reconnects and across subscribed clients.
Affected RPCs:
- `item/commandExecution/requestApproval`
- `item/fileChange/requestApproval`
- `item/tool/requestUserInput`
Motivation:
- Resumed clients need to see pending approval/input requests that were
already outstanding before the reconnect.
- Clients also need an explicit signal when a pending request resolves
or is cleared so stale UI can be removed on turn start, completion, or
interruption.
Implementation notes:
- Use pending client requests from `OutgoingMessageSender` in order to
replay them after `thread/resume` attaches the connection, using
original request ids.
- Emit `serverRequest/resolved` when pending requests are answered
or cleared by lifecycle cleanup.
- Update the app-server protocol schema, generated TypeScript bindings,
and README docs for the replay/resolution flow.
High-level test plan:
- Added automated coverage for replaying pending command execution and
file change approval requests on `thread/resume`.
- Added automated coverage for resolved notifications in command
approval, file change approval, request_user_input, turn start, and turn
interrupt flows.
- Verified schema/docs updates in the relevant protocol and app-server
tests.
Manual testing:
- Tested reconnect/resume with multiple connections.
- Confirmed state stayed in sync between connections.
## Why
CI has been intermittently failing in
`suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
because these running-thread resume tests treated `turn/started` as
proof that the thread was already active.
That signal is too early for this path. `turn/started` is emitted
optimistically from
[`turn_start`](1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L5757-L5767)).
In `single_client_mode`, the listener skips `current_turn_history`
tracking in
[`codex_message_processor.rs`](1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L6461-L6465)),
so running-thread resume still depends on `ThreadWatchManager` observing
the core `TurnStarted` event in
[`bespoke_event_handling.rs`](1103d0037e/codex-rs/app-server/src/bespoke_event_handling.rs (L152-L156)).
If `thread/resume` lands in that window, the thread can still look
`Idle` and the assertion flakes.
## What
- Add a helper in `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
that waits for `thread/status/changed` to report `Active` for the target
thread.
- Use that public v2 notification as the synchronization barrier in the
four running-thread resume tests instead of relying on `turn/started`.
## Follow-up
This PR keeps the fix at the test layer so we can remove the flake
without changing server behavior. A broader runtime fix should still be
considered separately, for example:
- make `turn/start` eagerly transition the thread to `Active` so
`turn/started` and `thread/status/changed` are coherent
- or revisit the `single_client_mode` guard that skips current-turn
tracking for running-thread resume
## Testing
- `cargo test -p codex-app-server thread_resume -- --nocapture`
- `for i in $(seq 1 10); do cargo test -p codex-app-server
'suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch'
-- --exact --nocapture; done`
## Why
The `notify` hook payload did not identify which Codex client started
the turn. That meant downstream notification hooks could not distinguish
between completions coming from the TUI and completions coming from
app-server clients such as VS Code or Xcode. Now that the Codex App
provides its own desktop notifications, it would be nice to be able to
filter those out.
This change adds that context without changing the existing payload
shape for callers that do not know the client name, and keeps the new
end-to-end test cross-platform.
## What changed
- added an optional top-level `client` field to the legacy `notify` JSON
payload
- threaded that value through `core` and `hooks`; the internal session
and turn state now carries it as `app_server_client_name`
- set the field to `codex-tui` for TUI turns
- captured `initialize.clientInfo.name` in the app server and applied it
to subsequent turns before dispatching hooks
- replaced the notify integration test hook with a `python3` script so
the test does not rely on Unix shell permissions or `bash`
- documented the new field in `docs/config.md`
## Testing
- `cargo test -p codex-hooks`
- `cargo test -p codex-tui`
- `cargo test -p codex-app-server
suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
-- --exact --nocapture`
- `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
still has unrelated existing failures in this environment)
## Docs
The public config reference on `developers.openai.com/codex` should
mention that the legacy `notify` payload may include a top-level
`client` field. The TUI reports `codex-tui`, and the app server reports
`initialize.clientInfo.name` when it is available.
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
## Summary
This PR includes the session's local date and timezone in the
model-visible environment context and persists that data in
`TurnContextItem`.
## What changed
- captures the current local date and IANA timezone when building a turn
context, with a UTC fallback if the timezone lookup fails
- includes current_date and timezone in the serialized
<environment_context> payload
- stores those fields on TurnContextItem so they survive rollout/history
handling, subagent review threads, and resume flows
- treats date/timezone changes as environment updates, so prompt caching
and context refresh logic do not silently reuse stale time context
- updates tests to validate the new environment fields without depending
on a single hardcoded environment-context string
## test
built a local build and saw it in the rollout file:
```
{"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
```
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata
## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- bundle contextual prompt injection into at most one developer message
plus one contextual user message in both:
- per-turn settings updates
- initial context insertion
- preserve `<model_switch>` across compaction by rebuilding it through
canonical initial-context injection, instead of relying on
strip/reattach hacks
- centralize contextual user fragment detection in one shared definition
table and reuse it for parsing/compaction logic
- keep `AGENTS.md` in its natural serialized format:
- `# AGENTS.md instructions for {dirname}`
- `<INSTRUCTIONS>...</INSTRUCTIONS>`
- simplify related tests/helpers and accept the expected snapshot/layout
updates from bundled multi-part messages
## Why
The goal is to converge toward a simpler, more intentional prompt shape
where contextual updates are consistently represented as one developer
envelope plus one contextual user envelope, while keeping parsing and
compaction behavior aligned with that representation.
## Notable details
- the temporary `SettingsUpdateEnvelope` wrapper was removed; these
paths now return `Vec<ResponseItem>` directly
- local/remote compaction no longer rely on model-switch strip/restore
helpers
- contextual user detection is now driven by shared fragment definitions
instead of ad hoc matcher assembly
- AGENTS/user instructions are still the same logical context; only the
synthetic `<user_instructions>` wrapper was replaced by the natural
AGENTS text format
## Testing
- `just fmt`
- `cargo test -p codex-app-server
codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
-- --exact`
- `cargo test -p codex-core
compact::tests::collect_user_messages_filters_session_prefix_entries
--lib -- --exact`
- `cargo test -p codex-core --test all
'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_developer_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::includes_user_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::resume_includes_initial_messages_and_sends_prior_items'
-- --exact`
- `cargo test -p codex-core --test all
'suite::review::review_input_isolated_from_parent_history' -- --exact`
- `cargo test -p codex-exec --test all
'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
--exact`
- `cargo test -p core_test_support
context_snapshot::tests::full_text_mode_preserves_unredacted_text --
--exact`
## Notes
- I also ran several targeted `compact`, `compact_remote`,
`prompt_caching`, `model_visible_layout`, and `event_mapping` tests
while iterating on prompt-shape changes.
- I have not claimed a clean full-workspace `cargo test` from this
environment because local sandbox/resource conditions have previously
produced unrelated failures in large workspace runs.
Currently there is no bound on the length of a user message submitted in
the TUI or through the app server interface. That means users can paste
many megabytes of text, which can lead to bad performance, hangs, and
crashes. In extreme cases, it can lead to a [kernel
panic](https://github.com/openai/codex/issues/12323).
This PR limits the length of a user input to 2**20 (about 1M)
characters. This value was chosen because it fills the entire context
window on the latest models, so accepting longer inputs wouldn't make
sense anyway.
Summary
- add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
and surface it in TUI and app server code
- block oversized submissions in the TUI submit flow and emit error
history cells when validation fails
- reject heavy app-server requests with JSON-RPC `-32602` and structured
`input_too_large` data, plus document the behavior
Testing
- ran the IDE extension with this change and verified that when I
attempt to paste a user message that's several MB long, it correctly
reports an error instead of crashing or making my computer hot.
This reverts commit https://github.com/openai/codex/pull/12633. We no
longer need this PR, because we favor sending normal exec command
approval server request with `additional_permissions` of skill
permissions instead
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags
## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
This reverts commit daf0f03ac8.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`
Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.
This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.
Closed threads will also be removed from `thread/loaded/list`.
## Why
The prior
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
assertion was brittle under Bazel: command approval payloads in the test
could include environment-dependent wrapper/command formatting
differences, which makes exact command-string matching flaky even when
behavior is correct.
(This regression was knowingly introduced in
https://github.com/openai/codex/pull/12800, but it was urgent to land
that PR.)
## What changed
- Hardened
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
in
[`turn_start_zsh_fork.rs`](https://github.com/openai/codex/blob/main/codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs):
- Replaced strict `approval_command.starts_with("/bin/rm")` checks with
intent-based subcommand matching.
- Subcommand approvals are now recognized by file-target semantics
(`first.txt` or `second.txt`) plus `rm` intent.
- Parent approval recognition is now more tolerant of command-format
differences while still requiring a definitive parent command context.
- Uses a defensive loop that waits for all target subcommand decisions
and the parent approval request.
- Preserved the existing regression and unit test fixes from earlier
commits in `unix_escalation.rs` and `skill_approval.rs`.
## Verification
- Ran the zsh fork subcommand decline regression under this change:
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- Confirmed the test is now robust against approval-command-string
variation instead of hardcoding one expected command shape.
Previously, clients would call `thread/start` with dynamic_tools set,
and when a model invokes a dynamic tool, it would just make the
server->client `item/tool/call` request and wait for the client's
response to complete the tool call. This works, but it doesn't have an
`item/started` or `item/completed` event.
Now we are doing this:
- [new] emit `item/started` with `DynamicToolCall` populated with the
call arguments
- send an `item/tool/call` server request
- [new] once the client responds, emit `item/completed` with
`DynamicToolCall` populated with the response.
Also, with `persistExtendedHistory: true`, dynamic tool calls are now
reconstructable in `thread/read` and `thread/resume` as
`ThreadItem::DynamicToolCall`.
Add experimental `thread/realtime/*` v2 requests and notifications, then
route app-server realtime events through that thread-scoped surface with
integration coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
Add service name to the app-server so that the app can use it's own
service name
This is on thread level because later we might plan the app-server to
become a singleton on the computer
## Summary
- add graceful websocket app-server restart on Ctrl-C by draining until
no assistant turns are running
- stop the websocket acceptor and disconnect existing connections once
the drain condition is met
- add a websocket integration test that verifies Ctrl-C waits for an
in-flight turn before exit
## Verification
- `cargo check -p codex-app-server --quiet`
- `cargo test -p codex-app-server --test all
suite::v2::connection_handling_websocket`
- I (maxj) tested remote and local Codex.app
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- detect skill-invoking shell commands based on the original command
string, request approvals when needed, and cache positive decisions per
session
- keep implicit skill invocation emitted after approval and keep skill
approval decline messaging centralized to the shell handler
- expand and adjust skill approval tests to cover shell-based skill
scripts while matching the new detection expectations
Testing
- Not run (not requested)
## Why
This PR switches the `shell_command` zsh-fork path over to
`codex-shell-escalation` so the new shell tool can use the shared
exec-wrapper/escalation protocol instead of the `zsh_exec_bridge`
implementation that was introduced in
https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on
UNIX domain sockets, which is not as tamper-proof as the FD-based
approach in `codex-shell-escalation`.
## What Changed
- Added a Unix zsh-fork runtime adapter in `core`
(`core/src/tools/runtimes/shell/unix_escalation.rs`) that:
- runs zsh-fork commands through
`codex_shell_escalation::run_escalate_server`
- bridges exec-policy / approval decisions into `ShellActionProvider`
- executes escalated commands via a `ShellCommandExecutor` that calls
`process_exec_tool_call`
- Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to
select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving
the generic `shell` tool path unchanged.
- Removed the `zsh_exec_bridge`-based session service and deleted
`core/src/zsh_exec_bridge/mod.rs`.
- Moved exec-wrapper entrypoint dispatch to `arg0` by handling the
`codex-execve-wrapper` arg0 alias there, and removed the old
`codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and
`app-server` mains.
- Added the needed `codex-shell-escalation` dependencies for `core` and
`arg0`.
## Tests
- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-app-server turn_start_shell_zsh_fork --
--nocapture`
- verifies zsh-fork command execution and approval flows through the new
backend
- includes subcommand approve/decline coverage using the shared zsh
DotSlash fixture in `app-server/tests/suite/zsh`
- To test manually, I added the following to `~/.codex/config.toml`:
```toml
zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh"
[features]
shell_zsh_fork = true
```
Then I ran `just c` to run the dev build of Codex with these changes and
sent it the message:
```
run `echo $0`
```
And it replied with:
```
echo $0 printed:
/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh
In this tool context, $0 reflects the script path used to invoke the shell, not just zsh.
```
so the tool appears to be wired up correctly.
## Notes
- The zsh subcommand-decline integration test now uses `rm` under a
`WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is
auto-allowed by the new `shell-escalation` policy path, which no longer
produces subcommand approval prompts.
## Summary
Introduces the initial implementation of Feature::RequestPermissions.
RequestPermissions allows the model to request that a command be run
inside the sandbox, with additional permissions, like writing to a
specific folder. Eventually this will include other rules as well, and
the ability to persist these permissions, but this PR is already quite
large - let's get the core flow working and go from there!
<img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
/>
## Testing
- [x] Added tests
- [x] Tested locally
- [x] Feature
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.
update logic to rely on bundled models instead, update tests.
## Why
We already plan to remove the shell-tool MCP path, and doing that
cleanup first makes the follow-on `shell-escalation` work much simpler.
This change removes the last remaining reason to keep
`codex-rs/exec-server` around by moving the `codex-execve-wrapper`
binary and shared shell test fixtures to the crates/tests that now own
that functionality.
## What Changed
### Delete `codex-rs/exec-server`
- Remove the `exec-server` crate, including the MCP server binary,
MCP-specific modules, and its test support/test suite
- Remove `exec-server` from the `codex-rs` workspace and update
`Cargo.lock`
### Move `codex-execve-wrapper` into `codex-rs/shell-escalation`
- Move the wrapper implementation into `shell-escalation`
(`src/unix/execve_wrapper.rs`)
- Add the `codex-execve-wrapper` binary entrypoint under
`shell-escalation/src/bin/`
- Update `shell-escalation` exports/module layout so the wrapper
entrypoint is hosted there
- Move the wrapper README content from `exec-server` to
`shell-escalation/README.md`
### Move shared shell test fixtures to `app-server`
- Move the DotSlash `bash`/`zsh` test fixtures from
`exec-server/tests/suite/` to `app-server/tests/suite/`
- Update `app-server` zsh-fork tests to reference the new fixture paths
### Keep `shell-tool-mcp` as a shell-assets package
- Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm
artifact contains only patched Bash/Zsh payloads (no Rust binaries)
- Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`,
and docs to reflect the shell-assets-only package shape
- `shell-tool-mcp-ci.yml` does not need changes because it is already
JS-only
## Verification
- `cargo shear`
- `cargo clippy -p codex-shell-escalation --tests`
- `just clippy`
## Why
The zsh integration tests were still brittle in two ways:
- they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so
they often did not exercise the patched zsh fork that `shell-tool-mcp`
ships
- once the tests consistently used the vendored zsh fork, they exposed
real Linux-specific zsh-fork issues in CI
In particular, the Linux failures were not just test noise:
- the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux
`codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode
could receive malformed arguments
- the
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
test uses the zsh exec bridge (which talks to the parent over a Unix
socket), but Linux restricted sandbox seccomp denies `connect(2)`,
causing timeouts on `ubuntu-24.04` x86/arm
This PR makes the zsh tests consistently run against the intended
vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI
signal is meaningful.
## What Changed
- Added a single shared test-only DotSlash file for the patched zsh fork
at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing
`bash` test resource).
- Updated both app-server and exec-server zsh tests to use that shared
DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH`
dependency).
- Updated the app-server zsh-fork test helper to resolve the shared
DotSlash zsh and avoid silently falling back to host zsh.
- Kept the app-server zsh-fork tests configured via `config.toml`, using
a test wrapper path where needed to force `zsh -df` (and rewrite `-lc`
to `-c`) for the subcommand-decline test.
- Hardened the app-server subcommand-decline zsh-fork test for CI
variability:
- tolerate an extra `/responses` POST with a no-op mock response
- tolerate non-target approval ordering while remaining strict on the
two `/usr/bin/true` approvals and decline behavior
- use `DangerFullAccess` on Linux for this one test because it validates
zsh approval flow, not Linux sandbox socket restrictions
- Fixed zsh-fork process launching on Linux by preserving `req.arg0` in
`ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox`
arg0 dispatch continues to work.
- Moved `maybe_run_zsh_exec_wrapper_mode()` under
`arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode
handling coexists correctly with arg0-dispatched helper modes.
- Consolidated duplicated `dotslash -- fetch` resolution logic into
shared test support (`core/tests/common/lib.rs`).
- Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to
use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh
differences by:
- resolving an absolute `git` path
- running `git init --quiet .`
- asserting success / `.git` creation instead of relying on banner text
## Verification
- `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture`
- `cargo test -p codex-exec-server accept_elicitation -- --nocapture`
- `bazel test //codex-rs/exec-server:exec-server-all-test
--test_output=streamed --test_arg=--nocapture
--test_arg=accept_elicitation_for_prompt_rule_with_zsh`
- CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 -
x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm -
aarch64-unknown-linux-gnu` passed in [run
22291424358](https://github.com/openai/codex/actions/runs/22291424358)
## Why
`codex-rs/core/src/lib.rs` re-exported a broad set of types and modules
from `codex-protocol` and `codex-shell-command`. That made it easy for
workspace crates to import those APIs through `codex-core`, which in
turn hides dependency edges and makes it harder to reduce compile-time
coupling over time.
This change removes those public re-exports so call sites must import
from the source crates directly. Even when a crate still depends on
`codex-core` today, this makes dependency boundaries explicit and
unblocks future work to drop `codex-core` dependencies where possible.
## What Changed
- Removed public re-exports from `codex-rs/core/src/lib.rs` for:
- `codex_protocol::protocol` and related protocol/model types (including
`InitialHistory`)
- `codex_protocol::config_types` (`protocol_config_types`)
- `codex_shell_command::{bash, is_dangerous_command, is_safe_command,
parse_command, powershell}`
- Migrated workspace Rust call sites to import directly from:
- `codex_protocol::protocol`
- `codex_protocol::config_types`
- `codex_protocol::models`
- `codex_shell_command`
- Added explicit `Cargo.toml` dependencies (`codex-protocol` /
`codex-shell-command`) in crates that now import those crates directly.
- Kept `codex-core` internal modules compiling by using `pub(crate)`
aliases in `core/src/lib.rs` (internal-only, not part of the public
API).
- Updated the two utility crates that can already drop a `codex-core`
dependency edge entirely:
- `codex-utils-approval-presets`
- `codex-utils-cli`
## Verification
- `cargo test -p codex-utils-approval-presets`
- `cargo test -p codex-utils-cli`
- `cargo check --workspace --all-targets`
- `just clippy`
## Summary
- switch a few app-server `turn_start` tests from
`codex/event/task_complete` waits to `turn/completed` waits
- avoid matching unrelated/background `task_complete` events
- keep this flaky test fix separate from the /title feature PR
## Why
On Windows ARM CI, these tests can return early after observing a
generic `codex/event/task_complete` notification from another task. That
can leave the mock Responses server with fewer calls than expected and
fail the test with a wiremock verification mismatch.
Using `turn/completed` matches the app-server turn lifecycle
notification the tests actually care about.
## Validation
- `cargo test -p codex-app-server
turn_start_updates_sandbox_and_cwd_between_turns_v2 -- --nocapture`
- `cargo test -p codex-app-server turn_start_exec_approval_ --
--nocapture`
- `just fmt`
Exposes through the app server updated names set for a thread. This
enables other surfaces to use the core as the source of truth for thread
naming. `threadName` is gathered using the helper functions used to
interact with `session_index.jsonl`, and is hydrated in:
- `thread/list`
- `thread/read`
- `thread/resume`
- `thread/unarchive`
- `thread/rollback`
We don't do this for `thread/start` and `thread/fork`.
## Why
`app/list` emits `app/list/updated` after whichever async load finishes
first (directory connectors or accessible tools). This test assumed the
directory-backed update always arrived first because it injected a tools
delay, but that assumption is not stable when the process-global Codex
Apps tools cache is already warm. In that case the accessible-tools path
can return immediately and the first notification shape flips, which
makes the assertion flaky.
Relevant code paths:
-
[`codex-rs/app-server/src/codex_message_processor.rs`](13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034))
(concurrent loads + per-load `app/list/updated` notifications)
-
[`codex-rs/core/src/mcp_connection_manager.rs`](13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197))
(Codex Apps tools cache hit path)
## What Changed
Updated
`suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags`
in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either
valid first `app/list/updated` payload:
- the directory-first snapshot
- the accessible-tools-first snapshot
The test still keeps the later assertions strict:
- the second `app/list/updated` notification must be the fully merged
result
- the final `app/list` response must match the same merged result
I also added an inline comment explaining why the first notification is
intentionally order-insensitive.
## Verification
- `cargo test -p codex-app-server`
## Why
Several tests intentionally exercise behavior while a turn is still
active. The cleanup sequence for those tests (`turn/interrupt` + waiting
for `codex/event/turn_aborted`) was duplicated across files, which made
the rationale easy to lose and the pattern easy to apply inconsistently.
This change centralizes that cleanup in one place with a single
explanatory doc comment.
## What Changed
### Added shared helper
In `codex-rs/app-server/tests/common/mcp_process.rs`:
- Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`.
- Added a doc comment explaining why explicit interrupt + terminal wait
is required for tests that intentionally leave a turn in-flight.
### Migrated call sites
Replaced duplicated interrupt/aborted blocks with the helper in:
- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`
- `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs`
- `turn_start_shell_zsh_fork_executes_command_v2`
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- `codex-rs/app-server/tests/suite/v2/turn_steer.rs`
- `turn_steer_returns_active_turn_id`
### Existing cleanup retained
In `codex-rs/app-server/tests/suite/v2/turn_start.rs`:
- `turn_start_accepts_local_image_input` continues to explicitly wait
for `turn/completed` so the turn lifecycle is fully drained before test
exit.
## Verification
- `cargo test -p codex-app-server`
## Why
`thread_resume` tests can intentionally create an in-flight turn, assert
a `thread/resume` error path, and return immediately. That leaves turn
work active during teardown, which can surface as intermittent `LEAK`
failures.
Sample output that motivated this investigation (reported during test
runs):
```text
LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch
```
## What Changed
Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`:
- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`
Both tests now:
1. capture the running turn id from `TurnStartResponse`
2. assert the expected `thread/resume` error
3. call `turn/interrupt` for that running turn
4. wait for `codex/event/turn_aborted` before returning
## Why This Is The Correct Fix
These tests are specifically validating resume behavior while a turn is
active. They should also own cleanup of that active turn before exiting.
Explicitly interrupting and waiting for the terminal abort notification
removes teardown races and avoids relying on process-drop behavior to
clean up in-flight work.
## Repro / Verification
Repro command used for investigation:
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) | test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)'
```
Observed before this change: intermittent `LEAK` in
`thread_resume_rejects_history_when_thread_is_running`.
Also verified with:
- `cargo test -p codex-app-server`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269).
* #12271
* __->__ #12269
## Why
`cargo nextest` was intermittently reporting `LEAK` for
`codex-app-server` tests even when assertions passed. This adds noise
and flakiness to local/CI signals.
Sample output used as the basis of this investigation:
```text
LEAK [ 7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1
LEAK [ 7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model
LEAK [ 7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests
LEAK [ 8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2
LEAK [ 8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item
LEAK [ 8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2
LEAK [ 6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2
LEAK [ 6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2
```
## How I Reproduced
I focused on the suspect tests and ran them under `nextest` stress mode
with leak reporting enabled.
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) | test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) | test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) | test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) | test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) | test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) | test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)'
```
This reproduced intermittent `LEAK` statuses while tests still passed.
## What Changed
In `codex-rs/app-server/tests/common/mcp_process.rs`:
- Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown
can explicitly close stdin.
- In `Drop`, close stdin first to trigger EOF-based graceful shutdown.
- Wait briefly for graceful exit.
- If still running, fall back to `start_kill()` and the existing bounded
`try_wait()` loop.
- Updated send-path handling to bail if stdin is already closed.
## Why This Is the Right Fix
The leak signal was caused by child-process teardown timing, not
test-logic assertion failure. The helper previously relied mostly on
force-kill timing in `Drop`; that can race with nextest leak detection.
Closing stdin first gives `codex-app-server` a deterministic, graceful
shutdown path before force-kill. Keeping the force-kill fallback
preserves robustness if graceful shutdown does not complete in time.
## Verification
- `cargo test -p codex-app-server`
- Re-ran the stress repro above after this change: no `LEAK` statuses
observed.
- Additional high-signal stress run also showed no leaks:
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)'
```
## Summary
- stabilize
`thread_resume_rejoins_running_thread_even_with_override_mismatch` by
using a valid delayed second SSE response instead of an intentionally
truncated stream
- set `RUST_MIN_STACK=4194304` for spawned app-server test processes in
`McpProcess` to avoid stack-sensitive CI overflows in detached review
tests
## Why
- the thread-resume assertion could race with a mocked stream-disconnect
error and intermittently observe `systemError`
- detached review startup is stack-sensitive in some CI environments;
pinning a larger stack in the test harness removes that flake without
changing product behavior
## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
- `cargo test -p codex-app-server --test all
suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`