wire plugin marketplace metadata through app-server endpoints:
- `plugin/list` has `installPolicy` and `authPolicy`
- `plugin/install` has plugin-level `authPolicy`
`plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
installing.
added tests.
Summary
- add a custom deserializer so `[tools].web_search` can be a bool
(treated as disabled) or a config object
- extend core and app-server tests to cover bool handling in TOML config
Testing
- Not run (not requested)
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
### Summary
This PR adds first-class ephemeral support to thread/fork, bringing it
in line with thread/start. The goal is to support one-off completions on
full forked threads without persisting them as normal user-visible
threads.
### Testing
Fixes a Codex app bug where quitting the app mid-run could leave the
reopened thread stuck in progress and non-interactable. On cold thread
resume, app-server could return an idle thread with a replayed turn
still marked in progress. This marks incomplete replayed turns as
interrupted unless the thread is actually active.
## What changed
- This PR changes only the flaky test setup for
`turn_start_notify_payload_includes_initialize_client_name`.
- Instead of shelling out to `python3` to write the notify payload, the
test uses the first-party `codex-app-server-test-notify-capture` helper.
- The helper writes `notify.json` atomically and the test waits for the
file to exist before reading it.
## Why this fixes the flake
- The old test depended on an external Python interpreter being present
and behaving consistently on every CI runner.
- It also raced the file write: the test could observe the path before
the payload had been fully written, which produced partial reads and
intermittent assertion failures.
- Moving the write into a repo-owned helper removes the external
dependency, and atomic write-plus-wait makes the handoff deterministic.
## Scope
- Test-only change.
## Summary
- stop reserving a localhost port in the websocket tests before spawning
the server
- let the app-server bind `127.0.0.1:0` itself and read back the actual
bound websocket address from stderr
- update the websocket test helpers and callers to use the discovered
address
## Why this fixes the flake
The previous harness reserved a port in the test process, dropped it,
and then asked the server process to bind that same address. On busy
runners there is a race between releasing the reservation and the child
process rebinding it, which can produce sporadic startup failures.
Binding to port `0` inside the server removes that race entirely, and
waiting for the server to report the real bound address makes the tests
connect only after the listener is actually ready.
### Purpose
While trying to build out CLI-Tools for the agent to use under skills we
have found that those tools sometimes need to invoke a user elicitation.
These elicitations are handled out of band of the codex app-server but
need to indicate to the exec manager that the command running is not
going to progress on the usual timeout horizon.
### Example
Model calls universal exec:
`$ download-credit-card-history --start-date 2026-01-19 --end-date
2026-02-19 > credit_history.jsonl`
download-cred-card-history might hit a hosted/preauthenticated service
to fetch data. That service might decide that the request requires an
end user approval the access to the personal data. It should be able to
signal to the running thread that the command in question is blocked on
user elicitation. In that case we want the exec to continue, but the
timeout to not expire on the tool call, essentially freezing time until
the user approves or rejects the command at which point the tool would
signal the app-server to decrement the outstanding elicitation count.
Now timeouts would proceed as normal.
### What's Added
- New v2 RPC methods:
- thread/increment_elicitation
- thread/decrement_elicitation
- Protocol updates in:
- codex-rs/app-server-protocol/src/protocol/common.rs
- codex-rs/app-server-protocol/src/protocol/v2.rs
- App-server handlers wired in:
- codex-rs/app-server/src/codex_message_processor.rs
### Behavior
- Counter starts at 0 per thread.
- increment atomically increases the counter.
- decrement atomically decreases the counter; decrement at 0 returns
invalid request.
- Transition rules:
- 0 -> 1: broadcast pause state, pausing all active stopwatches
immediately.
- \>0 -> >0: remain paused.
- 1 -> 0: broadcast unpause state, resuming stopwatches.
- Core thread/session logic:
- codex-rs/core/src/codex_thread.rs
- codex-rs/core/src/codex.rs
- codex-rs/core/src/mcp_connection_manager.rs
### Exec-server stopwatch integration
- Added centralized stopwatch tracking/controller:
- codex-rs/exec-server/src/posix/stopwatch_controller.rs
- Hooked pause/unpause broadcast handling + stopwatch registration:
- codex-rs/exec-server/src/posix/mcp.rs
- codex-rs/exec-server/src/posix/stopwatch.rs
- codex-rs/exec-server/src/posix.rs
Healthcheck endpoints for the websocket server
- serve `GET /readyz` and `GET /healthz` from the same listener used for
`--listen ws://...`
- switch the websocket listener over to `axum` upgrade handling instead
of manual socket parsing
- add websocket transport coverage for the health endpoints and document
the new behavior
Testing
- integration tests
- built and tested e2e
```
> curl -i http://127.0.0.1:9234/readyz
HTTP/1.1 200 OK
content-length: 0
date: Fri, 06 Mar 2026 19:20:23 GMT
> curl -i http://127.0.0.1:9234/healthz
HTTP/1.1 200 OK
content-length: 0
date: Fri, 06 Mar 2026 19:20:24 GMT
```
## Summary
request_permissions flows should support persisting results for the
session.
Open Question: Still deciding if we need within-turn approvals - this
adds complexity but I could see it being useful
## Testing
- [x] Updated unit tests
---------
Co-authored-by: Codex <noreply@openai.com>
add `plugin/uninstall` app-server endpoint to fully rm plugin from
plugins cache dir and rm entry from user config file.
plugin-enablement is session-scoped, so uninstalls are only picked up in
new sessions (like installs).
added tests.
## What changed
- The thread-resume replay tests now use unchecked mock sequencing so
the replay flow can complete before the test asserts.
- They also poll outbound `/responses` request counts and fail
immediately if replay emits an unexpected extra request.
## Why this fixes the flake
- The previous version asserted while the replay machinery was still
mid-flight, so the test was sometimes checking an intermediate state
instead of the completed behavior.
- Strict mock sequencing made that problem worse by forcing the test to
care about exact sub-step timing rather than about the end result.
- Letting replay settle and then asserting on stabilized request counts
makes the test validate the real contract: the replay path finishes and
does not send extra model requests.
## Scope
- Test-only change.
## What changed
- run the two plan-mode app-server tests on a multi-thread Tokio runtime
instead of the default single-thread test runtime
- stop relying on wiremock teardown expectations for `/responses` and
explicitly wait for the expected request count after the turn completes
## Why this fixes the flake
- this failure was showing up on Windows ARM as a late wiremock panic
saying the mock server saw zero `/responses` calls, but the real issue
was that the test could stall around app-server startup and only fail
during teardown
- moving these tests to the same multi-thread runtime used by the other
collaboration-mode app-server tests removes that startup scheduling race
- asserting the `/responses` count directly makes the test
deterministic: we now wait for the real POST instead of depending on a
drop-time verification that can hide the underlying timing issue
## Scope
- test-only change; no production logic changes
## What changed
- `turn_start_shell_zsh_fork_executes_command_v2` now keeps the shell
command alive with a file marker until the interrupt arrives instead of
using a command that can finish too quickly.
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
now waits for `turn/completed` before sending a fallback interrupt and
accepts the real terminal outcomes observed across platforms.
## Why this fixes the flake
- The original tests assumed a narrow ordering window: the child command
would still be running when the interrupt happened, and completion would
always arrive in one specific order.
- In CI, especially across different shells and runner speeds, those
assumptions break. Sometimes the child finishes before the interrupt;
sometimes the protocol completes while the fallback path is still arming
itself.
- Holding the command open until the interrupt and waiting for the
explicit protocol completion event makes the tests synchronize on the
behavior under test instead of on wall-clock timing.
## Scope
- Test-only change.
## What changed
- The auth/account/fuzzy-file-search test configs disable unrelated
`shell_snapshot` setup.
- The fuzzy-file-search fixture set was reduced so the stop-updates test
does less incidental work before reaching the assertion.
## Why this fixes the flake
- These failures were caused by cumulative timeout pressure, not by a
missing product-level delay.
- The old tests were paying for shell snapshot initialization and extra
fixture volume that were not part of the behavior being validated.
- Removing that incidental work keeps the same coverage but shortens the
critical path enough that the tests finish comfortably inside the
existing timeout budget, which is the right fix versus simply extending
the timeout.
## Scope
- Test-only change.
## Summary
- make
`list_apps_waits_for_accessible_data_before_emitting_directory_updates`
accept the two valid notification paths the server can emit
- keep rejecting the real bug this test is meant to catch: a
directory-only `app/list/updated` notification before accessible app
data is available
## Why this fixes the flake
The old test used a fixed `150ms` silence window and assumed the first
notification after that window had to be the fully merged final update.
In CI, scheduling occasionally lets accessible app data arrive before
directory data, so the first valid notification can be an
accessible-only interim update. That made the test fail even though the
server behavior was correct.
This change makes the test deterministic by reading notifications until
the final merged payload arrives. Any interim update is only accepted if
it contains accessible apps only; if the server ever emits inaccessible
directory data before accessible data is ready, the test still fails
immediately.
## Change type
- test-only; no production app-list logic changes
Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.
The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.
## Summary
- restore the guardian review request snapshot test and its tracked
snapshot after it was dropped from `main`
- make Bazel Rust unit-test wrappers resolve runfiles correctly on
manifest-only platforms like macOS and point Insta at the real workspace
root
- harden the shell-escalation socket-closure assertion so the musl Bazel
test no longer depends on fd reuse behavior
## Verification
- cargo test -p codex-core
guardian_review_request_layout_matches_model_visible_request_snapshot
- cargo test -p codex-shell-escalation
- bazel test //codex-rs/exec:exec-unit-tests
//codex-rs/shell-escalation:shell-escalation-unit-tests
Supersedes #13894.
---------
Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
Co-authored-by: viyatb-oai <viyatb@openai.com>
Co-authored-by: Codex <noreply@openai.com>
- production logic plus tests; cancel running tasks before clearing
pending turn state
- suppress follow-up model requests after cancellation and assert on
stabilized request counts instead of fixed sleeps
## Summary
- require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
- reject relative cwd values at request parsing instead of normalizing
them later in the setup flow
- add RPC-layer coverage for relative cwd rejection and update the
checked-in protocol schemas/docs
## Why
windowsSandbox/setupStart was carrying the client-provided cwd as a raw
PathBuf for command_cwd while config derivation normalized the same
value into an absolute policy_cwd.
That left room for relative-path ambiguity in the setup path, especially
for inputs like cwd: "repo". Making the RPC accept only absolute paths
removes that split entirely: the handler now receives one
already-validated absolute path and uses it for both config derivation
and setup.
This keeps the trust model unchanged. Trusted clients could already
choose the session cwd; this change is only about making the setup RPC
reject relative paths so command_cwd and policy_cwd cannot diverge.
## Testing
- cargo test -p codex-app-server windows_sandbox_setup (run locally by
user)
- cargo test -p codex-app-server-protocol windows_sandbox (run locally
by user)
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
Previously, we could only configure whether web search was on/off.
This PR enables sending along a web search config, which includes all
the stuff responsesapi supports: filters, location, etc.
1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level
This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test.
The interrupt path can race with the follow-up `/responses` request that
reports the aborted tool call, so the test now allows that extra no-op
response instead of assuming there will only ever be one request. The
assertions still stay focused on the behavior the test actually cares
about: starting the zsh-forked command correctly.
Testing:
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2
-- --exact --nocapture`
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.
checks are best effort based on `codex_apps` loading, much like
`app/list`.
#### Tests
Added integration tests, tested locally.
## Summary
- group recent work by git repo when available, otherwise by directory
- render recent work as bounded user asks with per-thread cwd context
- exclude hidden files and directories from workspace trees
## Note-- added plugin mentions via @, but that conflicts with file
mentions
depends and builds upon #13433.
- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.
### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.
This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.
Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).
This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.
### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.
In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.
In practice an elicitation is often triggered by an MCP tool call, but
not always.
### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.
### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.
add `web_search_tool_type` on model_info that can be populated from
backend. will be used to filter which models can use `web_search` with
images and which cant.
added small unit test.
## Summary
- ensure `thread.resume` reuses the stored `gitInfo` instead of
rebuilding it from the live working tree
- persist and apply thread git metadata through the resume flow and add
a regression test covering branch mismatch cases
## Testing
- Not run (not requested)
The electron app doesn't start up the app-server in a particular
workspace directory.
So sandbox setup happens in the app-installed directory instead of the
project workspace.
This allows the app do specify the workspace cwd so that the sandbox
setup actually sets up the ACLs instead of exiting fast and then having
the first shell command be slow.
This adds a first-class app-server v2 `skills/changed` notification for
the existing skills live-reload signal.
Before this change, clients only had the legacy raw
`codex/event/skills_update_available` event. With this PR, v2 clients
can listen for a typed JSON-RPC notification instead of depending on the
legacy `codex/event/*` stream, which we want to remove soon.
## Summary
Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.
When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.
## What changed
- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
- Preserves original PNG/JPEG/WebP bytes.
- Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
- select original-resolution mode only when:
- the feature flag is enabled, and
- the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.
## Behavior
### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.
### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
- ⏳ `2` https://github.com/openai/codex/pull/13331
- ⏳ `3` https://github.com/openai/codex/pull/13049
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly
## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
- `suite::grep_files::grep_files_tool_collects_matches`
- `suite::grep_files::grep_files_tool_reports_empty_results`
## Summary
This removes the old app-server v1 methods and notifications we no
longer need, while keeping the small set the main codex app client still
depends on for now.
The remaining legacy surface is:
- `initialize`
- `getConversationSummary`
- `getAuthStatus`
- `gitDiffToRemote`
- `fuzzyFileSearch`
- `fuzzyFileSearch/sessionStart`
- `fuzzyFileSearch/sessionUpdate`
- `fuzzyFileSearch/sessionStop`
And the raw `codex/event/*` notifications emitted from core. These
notifications will be removed in a followup PR.
## What changed
- removed deprecated v1 request variants from the protocol and
app-server dispatcher
- removed deprecated typed notifications: `authStatusChange`,
`loginChatGptComplete`, and `sessionConfigured`
- updated the app-server test client to use v2 flows instead of deleted
v1 flows
- deleted legacy-only app-server test suites and added focused coverage
for `getConversationSummary`
- regenerated app-server schema fixtures and updated the MCP interface
docs to match the remaining compatibility surface
## Testing
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`
followup to https://github.com/openai/codex/pull/13212 to expose fast
tier controls to app server
(majority of this PR is generated schema jsons - actual code is +69 /
-35 and +24 tests )
- add service tier fields to the app-server protocol surfaces used by
thread lifecycle, turn start, config, and session configured events
- thread service tier through the app-server message processor and core
thread config snapshots
- allow runtime config overrides to carry service tier for app-server
callers
cleanup:
- Removing useless "legacy" code supporting "standard" - we moved to
None | "fast", so "standard" is not needed.
Currently we emit `thread/status/changed` with `Idle` status right
before sending `thread/started` event (which also has `Idle` status in
it).
It feels that there is no point in that as client has no way to know
prior state of the thread as it didn't exist yet, so silence these kinds
of notifications.
This is a follow-up for https://github.com/openai/codex/pull/13047
## Why
We had a race where `turn/started` could be observed before the thread
had actually transitioned to `Active`. This was because we eagerly
emitted `turn/started` in the request handler for `turn/start` (and
`review/start`).
That was showing up as flaky `thread/resume` tests, but the real issue
was broader: a client could see `turn/started` and still get back an
idle thread immediately afterward.
The first idea was to eagerly call
`thread_watch_manager.note_turn_started(...)` from the `turn/start`
request path. That turns out to be unsafe, because
`submit(Op::UserInput)` only queues work. If a turn starts and completes
quickly, request-path bookkeeping can race with the real lifecycle
events and leave stale running state behind.
**The real fix** is to move `turn/started` to emit only after the turn
_actually_ starts, so we do that by waiting for the
`EventMsg::TurnStarted` notification emitted by codex core. We do this
for both `turn/start` and `review/start`.
I also verified this change is safe for our first-party codex apps -
they don't have any assumptions that `turn/started` is emitted before
the RPC response to `turn/start` (which is correct anyway).
I also removed `single_client_mode` since it isn't really necessary now.
## Testing
- `cargo test -p codex-app-server thread_resume -- --nocapture`
- `cargo test -p codex-app-server
'suite::v2::turn_start::turn_start_emits_notifications_and_accepts_model_override'
-- --exact --nocapture`
- `cargo test -p codex-app-server`
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket
---------
Co-authored-by: Codex <noreply@openai.com>