## Summary
- show an info message when switching collaboration modes changes the
effective model or reasoning
- include the target mode in the message (for example `... for Plan
mode.`)
- add TUI tests for model-change and reasoning-only change notifications
on mode switch
<img width="715" height="184" alt="Screenshot 2026-02-20 at 2 01 40 PM"
src="https://github.com/user-attachments/assets/18d1beb3-ab87-4e1c-9ada-a10218520420"
/>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## What
Updates the optional `cargo-nextest` install command in
`docs/install.md`:
- `cargo install cargo-nextest` -> `cargo install --locked
cargo-nextest`
## Why
The current docs command can fail during source install because recent
`cargo-nextest` releases intentionally require `--locked`.
Repro (macOS, but likely not platform-specific):
- `cargo install cargo-nextest`
- Fails with a compile error from `locked-tripwire` indicating:
- `Nextest does not support being installed without --locked`
- suggests `cargo install --locked cargo-nextest`
Using the locked command succeeds:
- `cargo install --locked cargo-nextest`
## How
Single-line docs change in `docs/install.md` to match current
`cargo-nextest` install requirements.
## Validation
- Reproduced failure locally using a temporary `CARGO_HOME` directory
(clean Cargo home)
- Example command used: `CARGO_HOME=/tmp/cargo-home-test cargo install
cargo-nextest`
- Confirmed success with `cargo install --locked cargo-nextest`
## Summary
- switch the review test SSE mock helper to use the shared hermetic mock
server setup
- ensure review tests always have a default `/v1/models` stub during
Codex session bootstrap
- remove the race that caused intermittent `/v1/models` connection
failures and flaky ETag refresh assertions
## Testing
- `just fmt`
- `cargo test -p codex-core --test all
refresh_models_on_models_etag_mismatch_and_avoid_duplicate_models_fetch`
- `cargo test -p codex-core --test all
review_uses_custom_review_model_from_config`
- repeated both targeted tests 5x in a loop
- `cargo clippy -p codex-core --tests -- -D warnings`
Hardens codex-rs/app-server connection lifecycle and outbound routing
for websocket clients. Fixes some FUD I was having
- Added per-connection disconnect signaling (CancellationToken) for
websocket transports.
- Split websocket handling into independent inbound/outbound tasks
coordinated by cancellation.
- Changed outbound routing so websocket connections use non-blocking
try_send; slow/full websocket writers are disconnected instead of
stalling broadcast delivery.
- Kept stdio behavior blocking-on-send (no forced disconnect) so local
stdio clients are not dropped when queues are temporarily full.
- Simplified outbound router flow by removing deferred
pending_closed_connections handling.
- Added guards to drop incoming response/notification/error messages
from unknown connections.
- Fixed listener teardown race in thread listener tasks using a
listener_generation check so stale tasks do not clear newer listeners.
Fixes
https://linear.app/openai/issue/CODEX-4966/multiclient-handle-slow-notification-consumers
## Tests
Added/updated transport tests covering:
- broadcast does not block on a slow/full websocket connection
- stdio connection waits instead of disconnecting on full queue
I (maxj) have tested manually and will retest before landing
Summary
- ensure destructive tool annotations short-circuit to require approval
- simplify approval logic to only require read/write + open-world when
destructive is false
- update the unit test to cover the new destructive behavior
Testing
- Not run (not requested)
## Summary
Install Node in the Bazel remote execution image using the version
pinned in `codex-rs/node-version.txt`.
## Why
`js_repl` tests run under Bazel remote execution and require a modern
Node runtime. Runner-level `setup-node` does not guarantee Node is
available (or recent enough) inside the remote worker container.
## What changed
- Updated `.github/workflows/Dockerfile.bazel` to install Node from
official tarballs at image build time.
- Added `xz-utils` for extracting `.tar.xz` archives.
- Copied `codex-rs/node-version.txt` into the image build context and
used it as the single source of truth for Node version.
- Added architecture mapping for multi-arch builds:
- `amd64 -> x64`
- `arm64 -> arm64`
- Verified install during image build with:
- `node --version`
- `npm --version`
## Impact
- Bazel remote workers should now have the required Node version
available for `js_repl` tests.
- Keeps Node version synchronized with repo policy via
`codex-rs/node-version.txt`.
## Testing
- Verified Dockerfile changes and build steps locally (build-time
commands are deterministic and fail fast on unsupported arch/version
fetch issues).
## Follow-up
- Rebuild and publish the Bazel runner image for both `linux/amd64` and
`linux/arm64`.
- Update image digests in `rbe.bzl` to roll out this runtime update in
CI.
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/12300
- ✅ `2` https://github.com/openai/codex/pull/12275
- 👉 `3` https://github.com/openai/codex/pull/12205
- ⏳ `4` https://github.com/openai/codex/pull/12185
- ⏳ `5` https://github.com/openai/codex/pull/10673
## Problem
Users without Codex access can hit a confusing local login loop. In the
denial case, the callback could fall through to generic behavior
(including a plain "Missing authorization code" page) instead of clearly
explaining that access was denied.
<img width="842" height="464" alt="Screenshot 2026-02-19 at 11 43 45 PM"
src="https://github.com/user-attachments/assets/f7a25e1d-e480-4ac2-b0ff-8bfe31003e66"
/>
<img width="842" height="464" alt="Screenshot 2026-02-19 at 11 44 53 PM"
src="https://github.com/user-attachments/assets/8a4fe6e4-b27b-483c-9f0c-60164933221d"
/>
## Scope
This PR improves local login error clarity only. It does not change
entitlement policy, RBAC rules, or who is allowed to use Codex.
## What Changed
- The local OAuth callback handler now parses `error` and
`error_description` on `/auth/callback` and exits the callback loop with
a real failure.
- Callback failures render a branded local Codex error page instead of a
generic/plain page.
- `access_denied` + `missing_codex_entitlement` is now mapped to an
explicit user-facing message telling the user Codex is not enabled for
their workspace and to contact their workspace administrator for access.
- Unknown OAuth callback errors continue to use a generic error page
while preserving the OAuth error code/details for debugging.
- Added the login error page template to Bazel assets so the local
binary can render it in Bazel builds.
## Non-goals
- No TUI onboarding/toast changes in this PR.
- No backend entitlement or policy changes.
## Tests
- Added an end-to-end `codex-login` test for `access_denied` +
`missing_codex_entitlement` and verified the page shows the actionable
admin guidance.
- Added an end-to-end `codex-login` test for a generic `access_denied`
reason to verify we keep a generic fallback page/message.
## Summary
Adds support for a Unix socket escape hatch so we can bypass socket
allowlisting when explicitly enabled.
## Description
* added a new flag, `network.dangerously_allow_all_unix_sockets` as an
explicit escape hatch
* In codex-network-proxy, enabling that flag now allows any absolute
Unix socket path from x-unix-socket instead of requiring each path to be
explicitly allowlisted. Relative paths are still rejected.
* updated the macOS seatbelt path in core so it enforces the same Unix
socket behavior:
* allowlisted sockets generate explicit network* subpath rules
* allow-all generates a broad network* (subpath "/") rule
---------
Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>
## Summary
Tighten the `js_repl` freeform Lark grammar to block the most common
malformed payload wrappers before they reach runtime validation.
## What Changed
- Replaced the overly permissive `js_repl` freeform grammar (`start:
/[\s\S]*/`) with a structured grammar that still supports:
- plain JS source
- optional first-line `// codex-js-repl:` pragma followed by JS source
- Added grammar-level filtering for common bad payload shapes by
rejecting inputs whose first significant token starts with:
- `{` (JSON object wrapper like `{"code":"..."}`)
- `"` (quoted code string)
- `` ``` `` (markdown code fences)
- Implemented the grammar without regex lookahead/lookbehind because the
API-side Lark regex engine does not support look-around.
- Added a unit test to validate the grammar shape and guard against
reintroducing unsupported lookaround.
## Why
`js_repl` is a freeform tool, but the model sometimes emits wrapped
payloads (JSON, quoted strings, markdown fences) instead of raw
JavaScript. We already reject those at runtime, but this change moves
the constraint into the tool grammar so the model is less likely to
generate invalid tool-call payloads in the first place.
## Testing
- `cargo test -p codex-core
js_repl_freeform_grammar_blocks_common_non_js_prefixes`
- `cargo test -p codex-core parse_freeform_args_rejects_`
## Notes
- This intentionally over-blocks a few uncommon valid JS starts (for
example top-level `{ ... }` blocks or top-level quoted directives like
`"use strict";`) in exchange for preventing the common wrapped-payload
mistakes.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12300
- ⏳ `2` https://github.com/openai/codex/pull/12275
- ⏳ `3` https://github.com/openai/codex/pull/12205
- ⏳ `4` https://github.com/openai/codex/pull/12185
- ⏳ `5` https://github.com/openai/codex/pull/10673
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.
- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.
Example:
- 3 calls:
- a.com (line 443)
- b.com (line 443)
- a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443
## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
- add `LOG_FORMAT=json` support for app-server tracing logs via
`tracing_subscriber`'s built-in JSON formatter
- keep the default human-readable format unchanged and keep `RUST_LOG`
filtering behavior
- document the env var and update lockfile
Summary
- simplify the macOS sleep inhibitor FFI by replacing `dlopen` / `dlsym`
/ `transmute` with normal IOKit extern calls and `SAFETY` comments
- switch to cfg-selected platform implementations
(`imp::SleepInhibitor`) instead of `Box<dyn ...>`
- check in minimal IOKit bindings generated with `bindgen` and include
them from the macOS backend
- enable direct IOKit linkage in Bazel macOS builds by registering
`IOKit` in the Bazel `osx.framework(...)` toolchain extension list
- update `Cargo.lock` and `MODULE.bazel.lock` after removing the
build-time `bindgen` dependency path
Testing
- `just fmt`
- `cargo clippy -p codex-utils-sleep-inhibitor --all-targets -- -D
warnings`
- `cargo test -p codex-utils-sleep-inhibitor`
- `bazel test //codex-rs/utils/sleep-inhibitor:all --test_output=errors`
- `just bazel-lock-update`
- `just bazel-lock-check`
Context
- follow-up to #11711 addressing Ryan's review comments
- `bindgen` is used to generate the checked-in bindings file, but not at
build time
Summary
- capture the origin for each configured MCP server and expose it via
the connection manager
- plumb MCP server name/origin into tool logging and emit
codex.tool_result events with those fields
- add unit coverage for origin parsing and extend OTEL tests to assert
empty MCP fields for non-MCP tools
- currently not logging full urls or url paths to prevent logging
potentially sensitive data
Testing
- Not run (not requested)
## Summary
- add reasoning effort constants for the memories phase one and phase
two agents
- wire the constants into phase1 request creation and phase2 agent
configuration so the default efforts are always applied
## Testing
- Not run (not requested)
## Summary
- Add `rollout_summary_file: <generated>.md` to each thread header in
`raw_memories.md` so Phase 2 can reliably reference the canonical
rollout summary filename.
- Update the memory prompts/templates (`stage_one_system`,
`consolidation`, `read_path`) for the new task-oriented raw-memory /
MEMORY.md schema and stronger consolidation guidance.
## Details
- `codex-rs/core/src/memories/storage.rs`
- Writes the generated `rollout_summary_file` path into the per-thread
metadata header when rebuilding `raw_memories.md`.
- `codex-rs/core/src/memories/tests.rs`
- Verifies the canonical `rollout_summary_file` header is present and
ordered after `updated_at`/`cwd` in `raw_memories.md`.
- Verifies task-structured raw-memory content is preserved while the
canonical header is added.
- `codex-rs/core/templates/memories/*.md`
- Updates the stage-1 raw-memory format to task-grouped sections
(`task`, `task_group`, `task_outcome`).
- Updates Phase 2 consolidation guidance around recency (`updated_at`),
task-oriented `MEMORY.md` blocks, and richer evidence-backed
consolidation.
- Tweaks the quick memory pass wording to emphasize topics/workflows in
addition to keywords.
## Testing
- `cargo test -p codex-core memories`
We now write MCP tools from installed apps to disk cache so that they
can be picked up instantly at startup. We still do a fresh fetch from
remote MCP server but it's non blocking unless there's a cache miss.
- [x] Store apps tool cache in disk to reduce startup time.
thread/resume response includes latest turn with all items, in band so
no events are stale or lost
Testing
- e2e tested using app-server-test-client using flow described in
"Testing Thread Rejoin Behavior" in
codex-rs/app-server-test-client/README.md
- e2e tested in codex desktop by reconnecting to a running turn
## Why
`app/list` emits `app/list/updated` after whichever async load finishes
first (directory connectors or accessible tools). This test assumed the
directory-backed update always arrived first because it injected a tools
delay, but that assumption is not stable when the process-global Codex
Apps tools cache is already warm. In that case the accessible-tools path
can return immediately and the first notification shape flips, which
makes the assertion flaky.
Relevant code paths:
-
[`codex-rs/app-server/src/codex_message_processor.rs`](13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034))
(concurrent loads + per-load `app/list/updated` notifications)
-
[`codex-rs/core/src/mcp_connection_manager.rs`](13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197))
(Codex Apps tools cache hit path)
## What Changed
Updated
`suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags`
in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either
valid first `app/list/updated` payload:
- the directory-first snapshot
- the accessible-tools-first snapshot
The test still keeps the later assertions strict:
- the second `app/list/updated` notification must be the fully merged
result
- the final `app/list` response must match the same merged result
I also added an inline comment explaining why the first notification is
intentionally order-insensitive.
## Verification
- `cargo test -p codex-app-server`
## Why
Several tests intentionally exercise behavior while a turn is still
active. The cleanup sequence for those tests (`turn/interrupt` + waiting
for `codex/event/turn_aborted`) was duplicated across files, which made
the rationale easy to lose and the pattern easy to apply inconsistently.
This change centralizes that cleanup in one place with a single
explanatory doc comment.
## What Changed
### Added shared helper
In `codex-rs/app-server/tests/common/mcp_process.rs`:
- Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`.
- Added a doc comment explaining why explicit interrupt + terminal wait
is required for tests that intentionally leave a turn in-flight.
### Migrated call sites
Replaced duplicated interrupt/aborted blocks with the helper in:
- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`
- `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs`
- `turn_start_shell_zsh_fork_executes_command_v2`
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- `codex-rs/app-server/tests/suite/v2/turn_steer.rs`
- `turn_steer_returns_active_turn_id`
### Existing cleanup retained
In `codex-rs/app-server/tests/suite/v2/turn_start.rs`:
- `turn_start_accepts_local_image_input` continues to explicitly wait
for `turn/completed` so the turn lifecycle is fully drained before test
exit.
## Verification
- `cargo test -p codex-app-server`
## Why
`thread_resume` tests can intentionally create an in-flight turn, assert
a `thread/resume` error path, and return immediately. That leaves turn
work active during teardown, which can surface as intermittent `LEAK`
failures.
Sample output that motivated this investigation (reported during test
runs):
```text
LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch
```
## What Changed
Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`:
- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`
Both tests now:
1. capture the running turn id from `TurnStartResponse`
2. assert the expected `thread/resume` error
3. call `turn/interrupt` for that running turn
4. wait for `codex/event/turn_aborted` before returning
## Why This Is The Correct Fix
These tests are specifically validating resume behavior while a turn is
active. They should also own cleanup of that active turn before exiting.
Explicitly interrupting and waiting for the terminal abort notification
removes teardown races and avoids relying on process-drop behavior to
clean up in-flight work.
## Repro / Verification
Repro command used for investigation:
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) | test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)'
```
Observed before this change: intermittent `LEAK` in
`thread_resume_rejects_history_when_thread_is_running`.
Also verified with:
- `cargo test -p codex-app-server`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269).
* #12271
* __->__ #12269
## Summary
Implements the `ConfigToml.permissions.network` and uses it to populate
`NetworkProxyConfig`. We now parse a new nested permissions/network
config shape which is converted into the proxy’s runtime config.
When managed requirements exist, we still apply those constraints on top
of user settings (so managed policy still wins).
* Cleaned up the old constructor path so it now accepts both user config
+ managed constraints directly.
* Updated the reload path so live proxy config reloads respect
[permissions.network] too, while still supporting the existing top-level
[network] format.
### Behavior
- User-defined `[permissions.network]` values are now honored.
- Managed constraints still take effect and are validated against the
resulting policy.
If a first-party proc_macro crate has tests/binaries that would get
autogenerated by the macro, it was being handled incorrectly. Found by
an external OS contributor!
## Summary
Implements a configurable MCP OAuth callback URL override for `codex mcp
login` and app-server OAuth login flows, including support for non-local
callback endpoints (for example, devbox ingress URLs).
## What changed
- Added new config key: `mcp_oauth_callback_url` in
`~/.codex/config.toml`.
- OAuth authorization now uses `mcp_oauth_callback_url` as
`redirect_uri` when set.
- Callback handling validates the callback path against the configured
redirect URI path.
- Listener bind behavior is now host-aware:
- local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to
`127.0.0.1`
- non-local callback URL hosts bind to `0.0.0.0`
- `mcp_oauth_callback_port` remains supported and is used for the
listener port.
- Wired through:
- CLI MCP login flow
- App-server MCP OAuth login flow
- Skill dependency OAuth login flow
- Updated config schema and config tests.
## Why
Some environments need OAuth callbacks to land on a specific reachable
URL (for example ingress in remote devboxes), not loopback. This change
allows that while preserving local defaults for existing users.
## Backward compatibility
- No behavior change when `mcp_oauth_callback_url` is unset.
- Existing `mcp_oauth_callback_port` behavior remains intact.
- Local callback flows continue binding to loopback by default.
## Testing
- `cargo test -p codex-rmcp-client callback -- --nocapture`
- `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture`
- `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client`
## Example config
```toml
mcp_oauth_callback_port = 5555
mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"
## Summary
- The experimental Bazel CI builds fail on all platforms because askama
resolves template paths relative to `CARGO_MANIFEST_DIR`, which points
outside the Bazel sandbox. This produces errors like:
```
error: couldn't read
`codex-rs/core/src/memories/../../../../../../../../../../../work/codex/codex/codex-rs/core/templates/memories/consolidation.md`:
No such file or directory
```
- Replaced `#[derive(Template)]` + `#[template(path = "...")]` with
`include_str!` + `str::replace()` for the three affected templates
(`consolidation.md`, `stage_one_input.md`, `read_path.md`).
`include_str!` resolves paths relative to the source file, which works
correctly in both Cargo and Bazel builds.
- The templates only use simple `{{ variable }}` substitution with no
control flow or filters, so no askama functionality is lost.
- Removes the `askama` dependency from `codex-core` since it was the
only crate using it. The workspace-level dependency definition is left
in place.
- This matches the existing pattern used throughout the codebase — e.g.
`codex-rs/core/src/memories/mod.rs` already uses
`include_str!("../../templates/memories/stage_one_system.md")` for the
fourth template file.
## Test plan
- [ ] Verify Bazel (experimental) CI passes on all platforms
- [ ] Verify rust-ci (Cargo) builds and tests continue to pass
- [ ] Verify `cargo test -p codex-core` passes locally
## Why
`cargo nextest` was intermittently reporting `LEAK` for
`codex-app-server` tests even when assertions passed. This adds noise
and flakiness to local/CI signals.
Sample output used as the basis of this investigation:
```text
LEAK [ 7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1
LEAK [ 7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model
LEAK [ 7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests
LEAK [ 8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2
LEAK [ 8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item
LEAK [ 8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2
LEAK [ 6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2
LEAK [ 6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2
```
## How I Reproduced
I focused on the suspect tests and ran them under `nextest` stress mode
with leak reporting enabled.
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) | test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) | test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) | test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) | test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) | test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) | test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)'
```
This reproduced intermittent `LEAK` statuses while tests still passed.
## What Changed
In `codex-rs/app-server/tests/common/mcp_process.rs`:
- Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown
can explicitly close stdin.
- In `Drop`, close stdin first to trigger EOF-based graceful shutdown.
- Wait briefly for graceful exit.
- If still running, fall back to `start_kill()` and the existing bounded
`try_wait()` loop.
- Updated send-path handling to bail if stdin is already closed.
## Why This Is the Right Fix
The leak signal was caused by child-process teardown timing, not
test-logic assertion failure. The helper previously relied mostly on
force-kill timing in `Drop`; that can race with nextest leak detection.
Closing stdin first gives `codex-app-server` a deterministic, graceful
shutdown path before force-kill. Keeping the force-kill fallback
preserves robustness if graceful shutdown does not complete in time.
## Verification
- `cargo test -p codex-app-server`
- Re-ran the stress repro above after this change: no `LEAK` statuses
observed.
- Additional high-signal stress run also showed no leaks:
```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)'
```
## Summary
- Require revised `<proposed_plan>` blocks in the same planning session
to be complete replacements, not partial/delta plans.
- Scope that cumulative replacement rule to the current planning session
only.
- Clarify that after leaving Plan mode (for example switching to Default
mode to implement) or when explicitly asked for a new plan, the model
should produce a new self-contained plan without inheriting prior plan
blocks unless requested.
## Testing
- Not run (prompt/template text-only change).
Summary
- avoid emitting metrics for features marked as `Stage::Removed`
- keep feature metrics aligned with active and planned states only
Testing
- Not run (not requested)