Motivation
- Today, a newly connected client has no direct way to determine the
current runtime status of threads from read/list responses alone.
- This forces clients to infer state from transient events, which can
lead to stale or inconsistent UI when reconnecting or attaching late.
Changes
- Add `status` to `thread/read` responses.
- Add `statuses` to `thread/list` responses.
- Emit `thread/status/changed` notifications with `threadId` and the new
status.
- Track runtime status for all loaded threads and default unknown
threads to `idle`.
- Update protocol/docs/tests/schema fixtures for the revised API.
Testing
- Validated protocol API changes with automated protocol tests and
regenerated schema/type fixtures.
- Validated app-server behavior with unit and integration test suites,
including status transitions and notifications.
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.
The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051
- https://github.com/openai/codex/pull/12052👈
### Summary
This PR introduces a feature-gated native shell runtime path that routes
shell execution through a patched zsh exec bridge, removing MCP-specific
behavior from the shell hot path while preserving existing
CommandExecution lifecycle semantics.
When shell_zsh_fork is enabled, shell commands run via patched zsh with
per-`execve` interception through EXEC_WRAPPER. Core receives wrapper
IPC requests over a Unix socket, applies existing approval policy, and
returns allow/deny before the subcommand executes.
### What’s included
**1) New zsh exec bridge runtime in core**
- Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for
EXEC_WRAPPER invocations.
- Per-execution Unix-socket IPC handling for wrapper requests/responses.
- Approval callback integration using existing core approval
orchestration.
- Streaming stdout/stderr deltas to existing command output event
pipeline.
- Error handling for malformed IPC, denial/abort, and execution
failures.
**2) Session lifecycle integration**
SessionServices now owns a `ZshExecBridge`.
Session startup initializes bridge state; shutdown tears it down
cleanly.
**3) Shell runtime routing (feature-gated)**
When `shell_zsh_fork` is enabled:
- Build execution env/spec as usual.
- Add wrapper socket env wiring.
- Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of
the regular shell path.
- Non-zsh-fork behavior remains unchanged.
**4) Config + feature wiring**
- Added `Feature::ShellZshFork` (under development).
- Added config support for `zsh_path` (optional absolute path to patched
zsh):
- `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema.
- Session startup validates that `zsh_path` exists/usable when zsh-fork
is enabled.
- Added startup test for missing `zsh_path` failure mode.
**5) Seatbelt/sandbox updates for wrapper IPC**
- Extended seatbelt policy generation to optionally allow outbound
connection to explicitly permitted Unix sockets.
- Wired sandboxing path to pass wrapper socket path through to seatbelt
policy generation.
- Added/updated seatbelt tests for explicit socket allow rule and
argument emission.
**6) Runtime entrypoint hooks**
- This allows the same binary to act as the zsh wrapper subprocess when
invoked via `EXEC_WRAPPER`.
**7) Tool selection behavior**
- ToolsConfig now prefers ShellCommand type when shell_zsh_fork is
enabled.
- Added test coverage for precedence with unified-exec enabled.
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051👈
- https://github.com/openai/codex/pull/12052
With upcoming support for a fork of zsh that allows us to intercept
`execve` and run execpolicy checks for each subcommand as part of a
`CommandExecution`, it will be possible for there to be multiple
approval requests for a shell command like `/path/to/zsh -lc 'git status
&& rg \"TODO\" src && make test'`.
To support that, this PR introduces a new `approval_id` field across
core, protocol, and app-server so that we can associate approvals
properly for subcommands.
### Summary
Ensure that we use the model value from the response header only so that
we are guaranteed with the correct slug name. We are no longer checking
against the model value from response so that we are less likely to have
false positive.
There are two different treatments - for SSE we use the header from the
response and for websocket we check top-level events.
* Add v2 server notifications `thread/archived` and `thread/unarchived`
with a `threadId` payload.
* Wire new events into `thread/archive` and `thread/unarchive` success
paths.
* Update app-server protocol/schema/docs accordingly.
Testing:
- Updated archive/unarchive end-to-end tests to verify both
notifications are emitted with the expected thread id payload.
rm `remote_models` feature flag.
We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.
Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
### Summary
Builiding off
5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0),
we have created a `model/rerouted` notification that captures the event
so that consumers can render as expected. Keep the `EventMsg::Warning`
path in core so that this does not affect TUI rendering.
`model/rerouted` is meant to be generic to account for future usage
including capacity planning etc.
Add per-turn notice when a request is downgraded to a fallback model due
to cyber safety checks.
**Changes**
- codex-api: Emit a ServerModel event based on the openai-model response
header and/or response payload (SSE + WebSocket), including when the
model changes mid-stream.
- core: When the server-reported model differs from the requested model,
emit a single per-turn warning explaining the reroute to gpt-5.2 and
directing users to Trusted
Access verification and the cyber safety explainer.
- app-server (v2): Surface these cyber model-routing warnings as
synthetic userMessage items with text prefixed by Warning: (and document
this behavior).
## Summary
This feature is now reasonably stable, let's remove it so we can
simplify our upcoming iterations here.
## Testing
- [x] Existing tests pass
### What
It's currently unclear when the harness falls back to the default,
generic `ModelInfo`. This happens when the `remote_models` feature is
disabled or the model is truly unknown, and can lead to bad performance
and issues in the harness.
Add a user-facing warning when this happens so they are aware when their
setup is broken.
### Tests
Added tests, tested locally.
### What
to unblock filtering models in VSCE, change `model/list` app-server
endpoint to send all models + visibility field `showInPicker` so
filtering can be done in VSCE if desired.
### Tests
Updated tests.
## Summary
- always rejoin an in-memory running thread on `thread/resume`, even
when overrides are present
- reject `thread/resume` when `history` is provided for a running thread
- reject `thread/resume` when `path` mismatches the running thread
rollout path
- warn (but do not fail) on override mismatches for running threads
- add more `thread_resume` integration tests and fixes; including
restart-based resume-with-overrides coverage
## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all thread_resume`
- manual test with app-server-test-client
https://github.com/openai/codex/pull/11755
- manual test both stdio and websocket in app
When `app/list` is called with `force_refetch=True`, we should seed the
results with what is already cached instead of starting from an empty
list. Otherwise when we send app/list/updated events, the client will
first see an empty list of accessible apps and then get the updated one.
## Summary
This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.
Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.
No context injection/diff behavior changes are included in this PR.
## Why this change
The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.
This PR addresses both with a minimal, reviewable change.
## Changes
### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.
Files:
- `codex-rs/protocol/src/protocol.rs`
### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.
Files:
- `codex-rs/core/src/codex.rs`
### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
- sampling request path
- compaction path
Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`
### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
- deserializing old payloads with no `network`
- serializing when `network` is present
Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.
## Why
`suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
was failing on Windows CI due to an unrelated process crash during shell
snapshot initialization (`tokio-runtime-worker` stack overflow).
This review test suite validates review API behavior and should not
depend on shell snapshot behavior. Keeping shell snapshot enabled in
this fixture made the test flaky for reasons outside the scenario under
test.
## What Changed
- Updated the review suite test config in
`codex-rs/app-server/tests/suite/v2/review.rs` to set:
- `shell_snapshot = false`
This keeps the review tests focused on review behavior by disabling
shell snapshot initialization in this fixture.
## Verification
- `cargo test -p codex-app-server`
- Confirmed the previously failing Windows CI job for this test now
passes on this PR.
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.
It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.
## What
- Added `ReadOnlyAccess` in protocol with:
- `Restricted { include_platform_defaults, readable_roots }`
- `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
- `ReadOnly { access: ReadOnlyAccess }`
- `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.
## Compatibility / rollout
- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
## Why
`suite::v2::app_list::list_apps_uses_thread_feature_flag_when_thread_id_is_provided`
has been flaky in CI. The test exercises `thread/start`, which
initializes `codex_apps`. In CI/Linux, that path can reach OS
keyring-backed MCP OAuth credential lookup (`Codex MCP Credentials`) and
intermittently abort the MCP process (observed stack overflow in
`zbus`), causing the test to fail before the assertion logic runs.
## What Changed
- Updated the test config in
`codex-rs/app-server/tests/suite/v2/app_list.rs` to set
`mcp_oauth_credentials_store = "file"` in both relevant config-writing
paths:
- The in-test config override inside
`list_apps_uses_thread_feature_flag_when_thread_id_is_provided`
- `write_connectors_config(...)`, which is used by the v2 `app_list`
test suite
- This keeps test coverage focused on thread-scoped app feature flags
while removing OS keyring/DBus dependency from this test path.
## How It Was Verified
- `cargo test -p codex-app-server`
- `cargo test -p codex-app-server
list_apps_uses_thread_feature_flag_when_thread_id_is_provided --
--nocapture`
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.
This reverts commit 47356ff83c.
## Summary
To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
- inbound handling (`transport_event_rx`)
- outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks
## Validation
Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests
<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>
---------
Co-authored-by: jif-oai <jif@openai.com>
## Why
`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.
## Net Change
- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.
## Outcome
- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
Added multi-limit support end-to-end by carrying limit_name in
rate-limit snapshots and handling multiple buckets instead of only
codex.
Extended /usage client parsing to consume additional_rate_limits
Updated TUI /status and in-memory state to store/render per-limit
snapshots
Extended app-server rate-limit read response: kept rate_limits and added
rate_limits_by_name.
Adjusted usage-limit error messaging for non-default codex limit buckets
Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.
This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.
This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.
example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```
backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.
Summary
- add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
in all fixtures and constructors
- wire the new flag into websocket selection so models that opt in
always use websocket transport even when the feature gate is off
Testing
- Not run (not requested)
…ount_id and chatgpt_plan_type
### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012
Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).
The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
Summary
- add platform-aware defaults for shell command timeouts so Windows
tests get longer waits
- keep medium timeout longer on Windows to ensure flakiness is reduced
Testing
- Not run (not requested)
## Summary
- add deterministic child-process cleanup to both test `McpProcess`
helpers
- keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
polling in `Drop`
- document the failure mode and why this avoids nondeterministic `LEAK`
flakes
## Why
`cargo nextest` leak detection can intermittently report `LEAK` when a
spawned server outlives test teardown, making CI flaky.
## Testing
- `just fmt`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
## Failing CI Reference
- Original failing job:
https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
We support requirements on Unix, loading from
`/etc/codex/requirements.toml`. On MacOS, we also support MDM.
Now, on Windows, we'll load requirements from
`%ProgramData%\OpenAI\Codex\requirements.toml`
There are two concepts of apps that we load in the harness:
- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.
Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.
In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.
- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes
For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests
Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
## Summary
- make `turn/start` normalize
`collaborationMode.settings.developer_instructions: null` to the
built-in instructions for the selected mode
- prevent app-server clients from accidentally clearing mode-switch
developer instructions by sending `null`
- document this behavior in the v2 protocol and app-server docs
## What changed
- `codex-rs/app-server/src/codex_message_processor.rs`
- added a small `normalize_turn_start_collaboration_mode` helper
- in `turn_start`, apply normalization before `OverrideTurnContext`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- extended `turn_start_accepts_collaboration_mode_override_v2` to assert
the outgoing request includes default-mode instruction text when the
client sends `developer_instructions: null`
- `codex-rs/app-server-protocol/src/protocol/v2.rs`
- clarified `TurnStartParams.collaboration_mode` docs:
`settings.developer_instructions: null` means use built-in mode
instructions
- regenerated schema fixture:
- `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
- docs:
- `codex-rs/app-server/README.md`
- `codex-rs/docs/codex_mcp_interface.md`
## Summary
Stabilize v2 review integration tests by making them hermetic with
respect to model discovery.
`app-server` review tests were intermittently timing out in CI
(especially on Windows runners) because their test config allowed remote
model refresh. During `thread/start`, the test process could issue live
`/v1/models` requests, introducing external network latency and
nondeterministic timing before review flow assertions.
This change disables remote model fetching in the review test config
helper used by these tests.
Summary
- add a `required` flag for MCP servers everywhere config/CLI data is
touched so mandatory helpers can be round-tripped
- have `codex exec` and `codex app-server` thread start/resume fail fast
when required MCPs fail to initialize
This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.
## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.
Having a dedicated `turn/steer` API eliminates all those quirks.
`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.