With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`
I made sure that `codex exec` correctly kill all the processes
This PR adds the following field to `Config`:
```rust
pub network: Option<NetworkProxy>,
```
Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).
This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
There are two concepts of apps that we load in the harness:
- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.
Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.
In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.
- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.
If `NetworkConstraints` is set, then include the relevant settings on `<environment_context>`. Example:
```xml
<environment_context>
<cwd>/repo</cwd>
<shell>bash</shell>
<network enabled="true">
<allowed>api.example.com</allowed>
<allowed>*.openai.com</allowed>
<denied>blocked.example.com</denied>
</network>
</environment_context>
```
Currently, `codex-network-proxy` depends on `codex-core`, but this
should be the other way around. As a first step, refactor out
`ConfigReloader`, which should make it easier to move
`codex-rs/network-proxy/src/state.rs` to `codex-core` in a subsequent
commit.
- Plumb input modalities from model catalog through the openai model
protocol. Default to text and image.
- Conditionally add the view_image tool only if input modalities support
image.
Given that we have https://github.com/openai/codex/pull/10977, the
existing "Verify config schema fixture" step seems unnecessary. Further,
because it happens as part of the `tag-check` job (which is meant to be
fast), it slows down the entire build process because it delays the more
expensive steps from starting.
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes
For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests
Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
Fixes#9050
When a draft is stashed with Ctrl+C, we now persist the full draft state
(text elements, local image paths, and pending paste payloads) in local
history. Up/Down recall rehydrates placeholder elements and attachments
so styling remains correct and large pastes still expand on submit.
Persistent (cross‑session) history remains text‑only.
Backtrack prefills now reuse the selected user message’s text elements
and local image paths, so image placeholders/attachments rehydrate when
rolling back.
External editor replacements keep only attachments whose placeholders
remain and then normalize image placeholders to `[Image #1]..[Image #N]`
to keep the attachment mapping consistent.
Docs:
- docs/tui-chat-composer.md
Testing:
- just fix -p codex-tui
- cargo test -p codex-tui
Co-authored-by: Eric Traut <etraut@openai.com>
#10958 introduced experimental support for a network config in
`/etc/codex/requirements.toml`, so this extends `/debug-config` to
surface this information, if set, which should make it easier to debug.
## Summary
- make `turn/start` normalize
`collaborationMode.settings.developer_instructions: null` to the
built-in instructions for the selected mode
- prevent app-server clients from accidentally clearing mode-switch
developer instructions by sending `null`
- document this behavior in the v2 protocol and app-server docs
## What changed
- `codex-rs/app-server/src/codex_message_processor.rs`
- added a small `normalize_turn_start_collaboration_mode` helper
- in `turn_start`, apply normalization before `OverrideTurnContext`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- extended `turn_start_accepts_collaboration_mode_override_v2` to assert
the outgoing request includes default-mode instruction text when the
client sends `developer_instructions: null`
- `codex-rs/app-server-protocol/src/protocol/v2.rs`
- clarified `TurnStartParams.collaboration_mode` docs:
`settings.developer_instructions: null` means use built-in mode
instructions
- regenerated schema fixture:
- `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
- docs:
- `codex-rs/app-server/README.md`
- `codex-rs/docs/codex_mcp_interface.md`
Hardens PTY Python REPL test and make MCP test startup deterministic
**Summary**
- `utils/pty/src/tests.rs`
- Added a REPL readiness handshake (`wait_for_python_repl_ready`) that
repeatedly sends a marker and waits for it in PTY output before sending
test commands.
- Updated `pty_python_repl_emits_output_and_exits` to:
- wait for readiness first,
- preserve startup output,
- append output collected through process exit.
- Reduces Windows/ConPTY flakiness from early stdin writes racing REPL
startup.
- `mcp-server/tests/suite/codex_tool.rs`
- Avoid remote model refresh during MCP test startup, reducing
timeout-prone nondeterminism.
Summary
- wrap `shell -lc` executions that use a snapshot with the session shell
so the saved environment is sourced before delegating to the original
shell
- escape single quotes in the generated wrapper and add tests covering
Bash/Zsh/sh session bootstrapping
Testing
- Not run (not requested)
Summary
- add the new resume_agent collab tool path through core, protocol, and
the app server API, including the resume events
- update the schema/TypeScript definitions plus docs so resume_agent
appears in generated artifacts and README
- note that resumed agents rehydrate rollout history without overwriting
their base instructions
Testing
- Not run (not requested)
This PR makes it possible to disable live web search via an enterprise
config even if the user is running in `--yolo` mode (though cached web
search will still be available). To do this, create
`/etc/codex/requirements.toml` as follows:
```toml
# "live" is not allowed; "disabled" is allowed even though not listed explicitly.
allowed_web_search_modes = ["cached"]
```
Or set `requirements_toml_base64` MDM as explained on
https://developers.openai.com/codex/security/#locations.
### Why
- Enforce admin/MDM/`requirements.toml` constraints on web-search
behavior, independent of user config and per-turn sandbox defaults.
- Ensure per-turn config resolution and review-mode overrides never
crash when constraints are present.
### What
- Add `allowed_web_search_modes` to requirements parsing and surface it
in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
fixtures updated.
- Define a requirements allowlist type (`WebSearchModeRequirement`) and
normalize semantics:
- `disabled` is always implicitly allowed (even if not listed).
- An empty list is treated as `["disabled"]`.
- Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
requirements via `ConstrainedWithSource<WebSearchMode>`.
- Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
- Prefer `Live → Cached → Disabled` when
`SandboxPolicy::DangerFullAccess` is active (subject to requirements),
unless the user preference is explicitly `Disabled`.
- Otherwise, honor the user’s preferred mode, falling back to an allowed
mode when necessary.
- Update TUI `/debug-config` and app-server mapping to display
normalized `allowed_web_search_modes` (including implicit `disabled`).
- Fix web-search integration tests to assert cached behavior under
`SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
`live` when allowed).
This PR changes stdio MCP child processes to run in their own process
group
* Add guarded teardown in codex-rmcp-client: send SIGTERM to the group
first, then SIGKILL after a short grace period.
* Add terminate_process_group helper in process_group.rs.
* Add Unix regression test in process_group_cleanup.rs to verify wrapper
+ grandchild are reaped on client drop.
Addresses reported MCP process/thread storm: #10581
## Summary
Stabilize v2 review integration tests by making them hermetic with
respect to model discovery.
`app-server` review tests were intermittently timing out in CI
(especially on Windows runners) because their test config allowed remote
model refresh. During `thread/start`, the test process could issue live
`/v1/models` requests, introducing external network latency and
nondeterministic timing before review flow assertions.
This change disables remote model fetching in the review test config
helper used by these tests.
Summary:
- Rename config table from network_proxy to network.
- Flatten allowed_domains, denied_domains, allow_unix_sockets, and
allow_local_binding onto NetworkProxySettings.
- Update runtime, state constraints, tests, and README to the new config
shape.
TLDR: use new message phase field emitted by preamble-supported models
to determine whether an AgentMessage is mid-turn commentary. if so,
restore the status indicator afterwards to indicate the turn has not
completed.
### Problem
`commit_tick` hides the status indicator while streaming assistant text.
For preamble-capable models, that text can be commentary mid-turn, so
hiding was correct during streaming but restore timing mattered:
- restoring too aggressively caused jitter/flashing
- not restoring caused indicator to stay hidden before subsequent work
(tool calls, web search, etc.)
### Fix
- Add optional `phase` to `AgentMessageItem` and propagate it from
`ResponseItem::Message`
- Keep indicator hidden during streamed commit ticks, restore only when:
- assistant item completes as `phase=commentary`, and
- stream queues are idle + task is still running.
- Treat `phase=None` as final-answer behavior (no restore) to keep
existing behavior for non-preamble models
### Tests
Add/update tests for:
- no idle-tick restore without commentary completion
- commentary completion restoring status before tool begin
- snapshot coverage for preamble/status behavior
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
This PR makes `Config.apps `experimental-only and fixes a TS schema
post-processing bug that removed needed imports. The bug happened
because import pruning only checked the inner type body after filtering,
not the full alias, so `JsonValue` got dropped from `Config.ts`. We now
prune against the full alias body and added a regression test for this
scenario.
## What changed
- In `codex-rs/core/src/skills/injection.rs`, we now honor explicit
`UserInput::Skill { name, path }` first, then fall back to text mentions
only when safe.
- In `codex-rs/tui/src/bottom_pane/chat_composer.rs`, mention selection
is now token-bound (selected mention is tied to the specific inserted
`$token`), and we snapshot bindings at submit time so selection is not
lost.
- In `codex-rs/tui/src/chatwidget.rs` and
`codex-rs/tui/src/bottom_pane/mod.rs`, submit/queue paths now consume
the submit-time mention snapshot (instead of rereading cleared composer
state).
- In `codex-rs/tui/src/mention_codec.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer_history.rs`, history now
round-trips mention targets so resume restores the same selected
duplicate.
- In `codex-rs/tui/src/bottom_pane/skill_popup.rs` and
`codex-rs/tui/src/bottom_pane/chat_composer.rs`, duplicate labels are
normalized to `[Repo]` / `[App]`, app rows no longer show `Connected -`,
and description space is a bit wider.
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 56 56 PM"
src="https://github.com/user-attachments/assets/346a7eb2-a342-4a49-aec8-68dfec0c7d89"
/>
<img width="550" height="163" alt="Screenshot 2026-02-05 at 9 57 09 PM"
src="https://github.com/user-attachments/assets/5e04d9af-cccf-4932-98b3-c37183e445ed"
/>
## Before vs now
- Before: selecting a duplicate could still submit the default/repo
match, and resume could lose which duplicate was originally selected.
- Now: the exact selected target (skill path or app id) is preserved
through submit, queue/restore, and resume.
## Manual test
1. Build and run this branch locally:
- `cd /Users/daniels/code/codex/codex-rs`
- `cargo build -p codex-cli --bin codex`
- `./target/debug/codex`
2. Open mention picker with `$` and pick a duplicate entry (not the
first one).
3. Confirm duplicate UI:
- repo duplicate rows show `[Repo]`
- app duplicate rows show `[App]`
- app description does **not** start with `Connected -`
4. Submit the prompt, then press Up to restore draft and submit again.
Expected: it keeps the same selected duplicate target.
5. Use `/resume` to reopen the session and send again.
Expected: restored mention still resolves to the same duplicate target.
These fields had always been documented as experimental/unstable with
docstrings, but now let's actually use the `experimental` annotation to
be more explicit.
- thread/start.experimentalRawEvents
- thread/resume.history
- thread/resume.path
- thread/fork.path
- turn/start.collaborationMode
- account/login/start.chatgptAuthTokens
## Summary
When replaying compacted history (especially `replacement_history` from
remote compaction), we should not keep stale developer messages from
older session state. This PR trims developer-
role messages from compacted replacement history and reinjects fresh
developer instructions derived from current turn/session state.
This aligns compaction replay behavior with the intended "fresh
instructions after summary" model.
## Problem
Compaction replay had two paths:
- `Compacted { replacement_history: None }`: rebuilt with fresh initial
context
- `Compacted { replacement_history: Some(...) }`: previously used raw
replacement history as-is
The second path could carry stale developer instructions
(permissions/personality/collab-mode guidance) across session changes.
## What Changed
### 1) Added helper to refresh compacted developer instructions
- **File:** `codex-rs/core/src/compact.rs`
- **Function:** `refresh_compacted_developer_instructions(...)`
Behavior:
- remove all `ResponseItem::Message { role: "developer", .. }` from
compacted history
- append fresh developer messages from current
`build_initial_context(...)`
### 2) Applied helper in remote compaction flow
- **File:** `codex-rs/core/src/compact_remote.rs`
- After receiving compact endpoint output, refresh developer
instructions before replacing history and persisting
`replacement_history`.
### 3) Applied helper while reconstructing history from rollout
- **File:** `codex-rs/core/src/codex.rs`
- In `reconstruct_history_from_rollout(...)`, when processing
`Compacted` entries with `replacement_history`, refresh developer
instructions instead of directly replacing with raw history.
## Non-Goals / Follow-up
This PR does **not** address the existing first-turn-after-resume
double-injection behavior.
A follow-up PR will handle resume-time dedup/idempotence separately.
If you want, I can also give you a shorter “squash-merge friendly”
version of the description.
## Codex author
`codex fork 019c25e6-706e-75d1-9198-688ec00a8256`
## Problem
The first user turn can pay websocket handshake latency even when a
session has already started. We want to reduce that initial delay while
preserving turn semantics and avoiding any prompt send during startup.
Reviewer feedback also called out duplicated connect/setup paths and
unnecessary preconnect state complexity.
## Mental model
`ModelClient` owns session-scoped transport state. During session
startup, it can opportunistically warm one websocket handshake slot. A
turn-scoped `ModelClientSession` adopts that slot once if available,
restores captured sticky turn-state, and otherwise opens a websocket
through the same shared connect path.
If startup preconnect is still in flight, first turn setup awaits that
task and treats it as the first connection attempt for the turn.
Preconnect is handshake-only. The first `response.create` is still sent
only when a turn starts.
## Non-goals
This change does not make preconnect required for correctness and does
not change prompt/turn payload semantics. It also does not expand
fallback behavior beyond clearing preconnect state when fallback
activates.
## Tradeoffs
The implementation prioritizes simpler ownership and shared connection
code over header-match gating for reuse. The single-slot cache keeps
lifecycle straightforward but only benefits the immediate next turn.
Awaiting in-flight preconnect has the same app-level connect-timeout
semantics as existing websocket connect behavior (no new timeout class
introduced by this PR).
## Architecture
`core/src/client.rs`:
- Added session-level preconnect lifecycle state (`Idle` / `InFlight` /
`Ready`) carrying one warmed websocket plus optional captured
turn-state.
- Added `pre_establish_connection()` startup warmup and `preconnect()`
handshake-only setup.
- Deduped auth/provider resolution into `current_client_setup()` and
websocket handshake wiring into `connect_websocket()` /
`build_websocket_headers()`.
- Updated turn websocket path to adopt preconnect first, await in-flight
preconnect when present, then create a new websocket only when needed.
- Ensured fallback activation clears warmed preconnect state.
- Added documentation for lifecycle, ownership, sticky-routing
invariants, and timeout semantics.
`core/src/codex.rs`:
- Session startup invokes `model_client.pre_establish_connection(...)`.
- Turn metadata resolution uses the shared timeout helper.
`core/src/turn_metadata.rs`:
- Centralized shared timeout helper used by both turn-time metadata
resolution and startup preconnect metadata building.
`core/tests/common/responses.rs` + websocket test suites:
- Added deterministic handshake waiting helper (`wait_for_handshakes`)
with bounded polling.
- Added startup preconnect and in-flight preconnect reuse coverage.
- Fallback expectations now assert exactly two websocket attempts in
covered scenarios (startup preconnect + turn attempt before fallback
sticks).
## Observability
Preconnect remains best-effort and non-fatal. Existing
websocket/fallback telemetry remains in place, and debug logs now make
preconnect-await behavior and preconnect failures easier to reason
about.
## Tests
Validated with:
1. `just fmt`
2. `cargo test -p codex-core websocket_preconnect -- --nocapture`
3. `cargo test -p codex-core websocket_fallback -- --nocapture`
4. `cargo test -p codex-core
websocket_first_turn_waits_for_inflight_preconnect -- --nocapture`
## Summary
Add explicit, model-visible network policy decision metadata to blocked
proxy responses/errors.
Introduces a standardized prefix line: `CODEX_NETWORK_POLICY_DECISION
{json}`
and wires it through blocked paths for:
- HTTP requests
- HTTPS CONNECT
- SOCKS5 TCP/UDP denials
## Why
The model should see *why* a request was blocked
(reason/source/protocol/host/port) so it can choose the correct next
action.
## Notes
- This PR is intentionally independent of config-layering/network-rule
runtime integration.
- Focus is blocked decision surface only.
## Summary
This PR fixes a UI/streaming race when nudged or steer-enabled messages
are queued during an active Plan stream.
Previously, `submit_user_message_with_mode` switched collaboration mode
immediately (via `set_collaboration_mask`) even when the message was
queued. If that happened mid-Plan stream, `active_mode_kind` could flip
away from Plan before the turn finished, causing subsequent
`on_plan_delta` updates to be ignored in the UI.
Now, mode switching is deferred until the queued message is actually
submitted.
## What changed
- Added a per-message deferred mode override on `UserMessage`:
- `collaboration_mode_override: Option<CollaborationModeMask>`
- Updated `submit_user_message_with_mode` to:
- create a `UserMessage` carrying the mode override
- queue or submit that message without mutating global mode immediately
- Updated `submit_user_message` to:
- apply `collaboration_mode_override` just before constructing/sending
`Op::UserTurn`
- Kept queueing condition scoped to active Plan stream rendering:
- queue only while plan output is actively streaming in TUI
(`plan_stream_controller.is_some()`)
## Why
This preserves Plan mode for the remainder of the in-flight Plan turn,
so streamed plan deltas continue rendering correctly, while still
ensuring the follow-up queued message is sent with the intended
collaboration mode.
## Behavior after this change
- If a nudged/steer submission happens while Plan output is actively
streaming:
- message is queued
- UI stays in Plan mode for the running turn
- once dequeued/submitted, mode override is applied and the message is
sent in the intended mode
- If no Plan stream is active:
- submission proceeds immediately and mode override is applied as before
## Tests
Added/updated coverage in `tui/src/chatwidget/tests.rs`:
- `submit_user_message_with_mode_queues_while_plan_stream_is_active`
- asserts mode remains Plan while queued
- asserts mode switches to Code when queued message is actually
submitted
- `submit_user_message_with_mode_submits_when_plan_stream_is_not_active`
- `steer_enter_queues_while_plan_stream_is_active`
- `steer_enter_submits_when_plan_stream_is_not_active`
Also updated existing `UserMessage { ... }` test fixtures to include the
new field.
## Codex author
`codex fork 019c1047-d5d5-7c92-a357-6009604dc7e8`