Command-approval clients currently infer which choices to show from
side-channel fields like `networkApprovalContext`,
`proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
the request shape harder to evolve, and it forces each client to
replicate the server's heuristics instead of receiving the exact
decision list for the prompt.
This PR introduces a mapping between `CommandExecutionApprovalDecision`
and `codex_protocol::protocol::ReviewDecision`:
```rust
impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
fn from(value: CoreReviewDecision) -> Self {
match value {
CoreReviewDecision::Approved => Self::Accept,
CoreReviewDecision::ApprovedExecpolicyAmendment {
proposed_execpolicy_amendment,
} => Self::AcceptWithExecpolicyAmendment {
execpolicy_amendment: proposed_execpolicy_amendment.into(),
},
CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
CoreReviewDecision::NetworkPolicyAmendment {
network_policy_amendment,
} => Self::ApplyNetworkPolicyAmendment {
network_policy_amendment: network_policy_amendment.into(),
},
CoreReviewDecision::Abort => Self::Cancel,
CoreReviewDecision::Denied => Self::Decline,
}
}
}
```
And updates `CommandExecutionRequestApprovalParams` to have a new field:
```rust
available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
```
when, if specified, should make it easier for clients to display an
appropriate list of options in the UI.
This makes it possible for `CoreShellActionProvider::prompt()` in
`unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
adding support for `ApprovedForSession` when approving a skill script,
which was previously missing in the TUI.
Note this results in a significant change to `exec_options()` in
`approval_overlay.rs`, as the displayed options are now derived from
`available_decisions: &[ReviewDecision]`.
## What Changed
- Add `available_decisions` to
[`ExecApprovalRequestEvent`](de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)),
including helpers to derive the legacy default choices when older
senders omit the field.
- Map `codex_protocol::protocol::ReviewDecision` to app-server
`CommandExecutionApprovalDecision` and expose the ordered list as
experimental `availableDecisions` in
[`CommandExecutionRequestApprovalParams`](de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)).
- Thread optional `available_decisions` through the core approval path
so Unix shell escalation can explicitly request `ApprovedForSession` for
session-scoped approvals instead of relying on client heuristics.
[`unix_escalation.rs`](de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214))
- Update the TUI approval overlay to build its buttons from the ordered
decision list, while preserving the legacy fallback when
`available_decisions` is missing.
- Update the app-server README, test client output, and generated schema
artifacts to document and surface the new field.
## Testing
- Add `approval_overlay.rs` coverage for explicit decision lists,
including the generic `ApprovedForSession` path and network approval
options.
- Update `chatwidget/tests.rs` and app-server protocol tests to populate
the new optional field and keep older event shapes working.
## Developers Docs
- If we document `item/commandExecution/requestApproval` on
[developers.openai.com/codex](https://developers.openai.com/codex), add
experimental `availableDecisions` as the preferred source of approval
choices and note that older servers may omit it.
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags
## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`
Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.
This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.
Closed threads will also be removed from `thread/loaded/list`.
Previously, clients would call `thread/start` with dynamic_tools set,
and when a model invokes a dynamic tool, it would just make the
server->client `item/tool/call` request and wait for the client's
response to complete the tool call. This works, but it doesn't have an
`item/started` or `item/completed` event.
Now we are doing this:
- [new] emit `item/started` with `DynamicToolCall` populated with the
call arguments
- send an `item/tool/call` server request
- [new] once the client responds, emit `item/completed` with
`DynamicToolCall` populated with the response.
Also, with `persistExtendedHistory: true`, dynamic tool calls are now
reconstructable in `thread/read` and `thread/resume` as
`ThreadItem::DynamicToolCall`.
Add experimental `thread/realtime/*` v2 requests and notifications, then
route app-server realtime events through that thread-scoped surface with
integration coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
Migration Behavior
* Config
* Migrates settings.json into config.toml
* Only adds fields when config.toml is missing, or when those fields are
missing from the existing file
* Supported mappings:
env -> shell_environment_policy
sandbox.enabled = true -> sandbox_mode = "workspace-write"
* Skills
* Copies home and repo .claude/skills into .agents/skills
* Existing skill directories are not overwritten
* SKILL.md content is rewritten from Claude-related terms to Codex
* AgentsMd
* Repo only
* Migrates CLAUDE.md into AGENTS.md
* Detect/import only proceed when AGENTS.md is missing or present but
empty
* Content is rewritten from Claude-related terms to Codex
Add service name to the app-server so that the app can use it's own
service name
This is on thread level because later we might plan the app-server to
become a singleton on the computer
## Summary
Simplify network approvals by removing per-attempt proxy correlation and
moving to session-level approval dedupe keyed by (host, protocol, port).
Instead of encoding attempt IDs into proxy credentials/URLs, we now
treat approvals as a destination policy decision.
- Concurrent calls to the same destination share one approval prompt.
- Different destinations (or same host on different ports) get separate
prompts.
- Allow once approves the current queued request group only.
- Allow for session caches that (host, protocol, port) and auto-allows
future matching requests.
- Never policy continues to deny without prompting.
Example:
- 3 calls:
- a.com (line 443)
- b.com (line 443)
- a.com (line 443)
=> 2 prompts total (a, b), second a waits on the first decision.
- a.com:80 is treated separately from a.com line 443
## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-core tools::network_approval::tests`
- `cargo test -p codex-core` (unit tests pass; existing
integration-suite failures remain in this environment)
- add `LOG_FORMAT=json` support for app-server tracing logs via
`tracing_subscriber`'s built-in JSON formatter
- keep the default human-readable format unchanged and keep `RUST_LOG`
filtering behavior
- document the env var and update lockfile
Motivation
- Today, a newly connected client has no direct way to determine the
current runtime status of threads from read/list responses alone.
- This forces clients to infer state from transient events, which can
lead to stale or inconsistent UI when reconnecting or attaching late.
Changes
- Add `status` to `thread/read` responses.
- Add `statuses` to `thread/list` responses.
- Emit `thread/status/changed` notifications with `threadId` and the new
status.
- Track runtime status for all loaded threads and default unknown
threads to `idle`.
- Update protocol/docs/tests/schema fixtures for the revised API.
Testing
- Validated protocol API changes with automated protocol tests and
regenerated schema/type fixtures.
- Validated app-server behavior with unit and integration test suites,
including status transitions and notifications.
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.
The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
zsh fork PR stack:
- https://github.com/openai/codex/pull/12051👈
- https://github.com/openai/codex/pull/12052
With upcoming support for a fork of zsh that allows us to intercept
`execve` and run execpolicy checks for each subcommand as part of a
`CommandExecution`, it will be possible for there to be multiple
approval requests for a shell command like `/path/to/zsh -lc 'git status
&& rg \"TODO\" src && make test'`.
To support that, this PR introduces a new `approval_id` field across
core, protocol, and app-server so that we can associate approvals
properly for subcommands.
* Add v2 server notifications `thread/archived` and `thread/unarchived`
with a `threadId` payload.
* Wire new events into `thread/archive` and `thread/unarchive` success
paths.
* Update app-server protocol/schema/docs accordingly.
Testing:
- Updated archive/unarchive end-to-end tests to verify both
notifications are emitted with the expected thread id payload.
### Summary
Builiding off
5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0),
we have created a `model/rerouted` notification that captures the event
so that consumers can render as expected. Keep the `EventMsg::Warning`
path in core so that this does not affect TUI rendering.
`model/rerouted` is meant to be generic to account for future usage
including capacity planning etc.
Add per-turn notice when a request is downgraded to a fallback model due
to cyber safety checks.
**Changes**
- codex-api: Emit a ServerModel event based on the openai-model response
header and/or response payload (SSE + WebSocket), including when the
model changes mid-stream.
- core: When the server-reported model differs from the requested model,
emit a single per-turn warning explaining the reroute to gpt-5.2 and
directing users to Trusted
Access verification and the cyber safety explainer.
- app-server (v2): Surface these cyber model-routing warnings as
synthetic userMessage items with text prefixed by Warning: (and document
this behavior).
### What
to unblock filtering models in VSCE, change `model/list` app-server
endpoint to send all models + visibility field `showInPicker` so
filtering can be done in VSCE if desired.
### Tests
Updated tests.
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).
### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.
Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.
### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).
This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)
In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit
For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.
And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.
#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.
This reverts commit 47356ff83c.
## Summary
To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
- inbound handling (`transport_event_rx`)
- outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks
## Validation
Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests
<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>
---------
Co-authored-by: jif-oai <jif@openai.com>
With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`
I made sure that `codex exec` correctly kill all the processes
There are two concepts of apps that we load in the harness:
- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.
Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.
In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.
- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.
## Summary
- make `turn/start` normalize
`collaborationMode.settings.developer_instructions: null` to the
built-in instructions for the selected mode
- prevent app-server clients from accidentally clearing mode-switch
developer instructions by sending `null`
- document this behavior in the v2 protocol and app-server docs
## What changed
- `codex-rs/app-server/src/codex_message_processor.rs`
- added a small `normalize_turn_start_collaboration_mode` helper
- in `turn_start`, apply normalization before `OverrideTurnContext`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- extended `turn_start_accepts_collaboration_mode_override_v2` to assert
the outgoing request includes default-mode instruction text when the
client sends `developer_instructions: null`
- `codex-rs/app-server-protocol/src/protocol/v2.rs`
- clarified `TurnStartParams.collaboration_mode` docs:
`settings.developer_instructions: null` means use built-in mode
instructions
- regenerated schema fixture:
- `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
- docs:
- `codex-rs/app-server/README.md`
- `codex-rs/docs/codex_mcp_interface.md`
Summary
- add the new resume_agent collab tool path through core, protocol, and
the app server API, including the resume events
- update the schema/TypeScript definitions plus docs so resume_agent
appears in generated artifacts and README
- note that resumed agents rehydrate rollout history without overwriting
their base instructions
Testing
- Not run (not requested)
This PR makes it possible to disable live web search via an enterprise
config even if the user is running in `--yolo` mode (though cached web
search will still be available). To do this, create
`/etc/codex/requirements.toml` as follows:
```toml
# "live" is not allowed; "disabled" is allowed even though not listed explicitly.
allowed_web_search_modes = ["cached"]
```
Or set `requirements_toml_base64` MDM as explained on
https://developers.openai.com/codex/security/#locations.
### Why
- Enforce admin/MDM/`requirements.toml` constraints on web-search
behavior, independent of user config and per-turn sandbox defaults.
- Ensure per-turn config resolution and review-mode overrides never
crash when constraints are present.
### What
- Add `allowed_web_search_modes` to requirements parsing and surface it
in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
fixtures updated.
- Define a requirements allowlist type (`WebSearchModeRequirement`) and
normalize semantics:
- `disabled` is always implicitly allowed (even if not listed).
- An empty list is treated as `["disabled"]`.
- Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
requirements via `ConstrainedWithSource<WebSearchMode>`.
- Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
- Prefer `Live → Cached → Disabled` when
`SandboxPolicy::DangerFullAccess` is active (subject to requirements),
unless the user preference is explicitly `Disabled`.
- Otherwise, honor the user’s preferred mode, falling back to an allowed
mode when necessary.
- Update TUI `/debug-config` and app-server mapping to display
normalized `allowed_web_search_modes` (including implicit `disabled`).
- Fix web-search integration tests to assert cached behavior under
`SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
`live` when allowed).
This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.
## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.
Having a dedicated `turn/steer` API eliminates all those quirks.
`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.
- Adds --listen <URL> to codex app-server with two listen modes:
- stdio:// (default, existing behavior)
- ws://IP:PORT (new websocket transport)
- Refactors message routing to be connection-aware:
- Tracks per-connection session state (initialize/experimental
capability)
- Routes responses/errors to the originating connection
- Broadcasts server notifications/requests to initialized connections
- Updates initialization semantics to be per connection (not
process-global), and updates app-server docs accordingly.
- Adds websocket accept/read/write handling (JSON-RPC per text frame,
ping/pong handling, connection lifecycle events).
Testing
- Unit tests for transport URL parsing and targeted response/error
routing.
- New websocket integration test validating:
- per-connection initialization requirements
- no cross-connection response leakage
- same request IDs on different connections route independently.
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397
Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.
## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).
Old shape:
`{ "output": "dynamic-ok", "success": true }`
New shape:
```
{
"contentItems": [
{ "type": "inputText", "text": "dynamic-ok" },
{ "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
]
"success": true
}
```
- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.
Adding a `--experimental` flag to the `generate-ts` fct in the
app-sever.
It can be called through one of those 2 command
```
just write-app-server-schema --experimental
codex app-server generate-ts --experimental
```
## Problem being solved
- We need a single, reliable way to mark app-server API surface as
experimental so that:
1. the runtime can reject experimental usage unless the client opts in
2. generated TS/JSON schemas can exclude experimental methods/fields for
stable clients.
Right now that’s easy to drift or miss when done ad-hoc.
## How to declare experimental methods and fields
- **Experimental method**: add `#[experimental("method/name")]` to the
`ClientRequest` variant in `client_request_definitions!`.
- **Experimental field**: on the params struct, derive `ExperimentalApi`
and annotate the field with `#[experimental("method/name.field")]` + set
`inspect_params: true` for the method variant so
`ClientRequest::experimental_reason()` inspects params for experimental
fields.
## How the macro solves it
- The new derive macro lives in
`codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
`#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
attributes.
- **Structs**:
- Generates `ExperimentalApi::experimental_reason(&self)` that checks
only annotated fields.
- The “presence” check is type-aware:
- `Option<T>`: `is_some_and(...)` recursively checks inner.
- `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
- `bool`: must be `true`.
- Other types: considered present (returns `true`).
- Registers each experimental field in an `inventory` with `(type_name,
serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
that type. Field names are converted from `snake_case` to `camelCase`
for schema/TS filtering.
- **Enums**:
- Generates an exhaustive `match` returning `Some(reason)` for annotated
variants and `None` otherwise (no wildcard arm).
- **Wiring**:
- Runtime gating uses `ExperimentalApi::experimental_reason()` in
`codex-rs/app-server/src/message_processor.rs` to reject requests unless
`InitializeParams.capabilities.experimental_api == true`.
- Schema/TS export filters use the inventory list and
`EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
strip experimental methods/fields when `experimental_api` is false.