Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.
The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.
## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface
## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR
## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`
## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core
## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks
## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
- reject relative cwd values at request parsing instead of normalizing
them later in the setup flow
- add RPC-layer coverage for relative cwd rejection and update the
checked-in protocol schemas/docs
## Why
windowsSandbox/setupStart was carrying the client-provided cwd as a raw
PathBuf for command_cwd while config derivation normalized the same
value into an absolute policy_cwd.
That left room for relative-path ambiguity in the setup path, especially
for inputs like cwd: "repo". Making the RPC accept only absolute paths
removes that split entirely: the handler now receives one
already-validated absolute path and uses it for both config derivation
and setup.
This keeps the trust model unchanged. Trusted clients could already
choose the session cwd; this change is only about making the setup RPC
reject relative paths so command_cwd and policy_cwd cannot diverge.
## Testing
- cargo test -p codex-app-server windows_sandbox_setup (run locally by
user)
- cargo test -p codex-app-server-protocol windows_sandbox (run locally
by user)
This fixes a schema export bug where two different `WebSearchAction`
types were getting merged under the same name in the app-server v2 JSON
schema bundle.
The problem was that v2 thread items use the app-server API's
`WebSearchAction` with camelCase variants like `openPage`, while
`ThreadResumeParams.history` and
`RawResponseItemCompletedNotification.item` pull in the upstream
`ResponseItem` graph, which uses the Responses API snake_case shape like
`open_page`. During bundle generation we were flattening nested
definitions into the v2 namespace by plain name, so the later definition
could silently overwrite the earlier one.
That meant clients generating code from the bundled schema could end up
with the wrong `WebSearchAction` definition for v2 thread history. In
practice this shows up on web search items reconstructed from rollout
files with persisted extended history.
This change does two things:
- Gives the upstream Responses API schema a distinct JSON schema name:
`ResponsesApiWebSearchAction`
- Makes namespace-level schema definition collisions fail loudly instead
of silently overwriting
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
Previously, we could only configure whether web search was on/off.
This PR enables sending along a web search config, which includes all
the stuff responsesapi supports: filters, location, etc.
1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level
#### What
Add structured `@plugin` parsing and TUI support for plugin mentions.
- Core: switch from plain-text `@display_name` parsing to structured
`plugin://...` mentions via `UserInput::Mention` and
`[$...](plugin://...)` links in text, same pattern as apps/skills.
- TUI: add plugin mention popup, autocomplete, and chips when typing
`$`. Load plugin capability summaries and feed them into the composer;
plugin mentions appear alongside skills and apps.
- Generalize mention parsing to a sigil parameter, still defaults to `$`
<img width="797" height="119" alt="image"
src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
/>
Builds on #13510. Currently clients have to build their own `id` via
`plugin@marketplace` and filter plugins to show by `enabled`, but we
will add `id` and `available` as fields returned from `plugin/list`
soon.
####Tests
Added tests, verified locally.
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.
checks are best effort based on `codex_apps` loading, much like
`app/list`.
#### Tests
Added integration tests, tested locally.
## Summary
- delete the network proxy admin server and its runtime listener/task
plumbing
- remove the admin endpoint config, runtime, requirement, protocol,
schema, and debug-surface fields
- update proxy docs to reflect the remaining HTTP and SOCKS listeners
only
## Summary
This PR:
1. fixes a deserialization mismatch for macOS automation permissions in
approval payloads by making core parsing accept both supported wire
shapes for bundle IDs.
2. added `#[serde(default)]` to `MacOsSeatbeltProfileExtensions` so
omitted fields deserialize to secure defaults.
## Why this change is needed
`MacOsAutomationPermission` uses `#[serde(try_from =
"MacOsAutomationPermissionDe")]`, so deserialization is controlled by
`MacOsAutomationPermissionDe`. After we aligned v2
`additionalPermissions.macos.automations` to the core shape, approval
payloads started including `{ "bundle_ids": [...] }` in some paths.
`MacOsAutomationPermissionDe` previously accepted only `"none" | "all"`
or a plain array, so object-shaped bundle IDs failed with `data did not
match any variant of untagged enum MacOsAutomationPermissionDe`. This
change restores compatibility by accepting both forms while preserving
existing normalization behavior (trim values and map empty bundle lists
to `None`).
## Validation
saw this error went away when running
```
cargo run -p codex-app-server-test-client -- \
--codex-bin ./target/debug/codex \
-c 'approval_policy="on-request"' \
-c 'features.shell_zsh_fork=true' \
-c 'zsh_path="/tmp/codex-zsh-fork/package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh"' \
send-message-v2 --experimental-api \
'Use $apple-notes and run scripts/notes_info now.'
```
:
```
Error: failed to deserialize ServerRequest from JSONRPCRequest
Caused by:
data did not match any variant of untagged enum MacOsAutomationPermissionDe
```
## Note-- added plugin mentions via @, but that conflicts with file
mentions
depends and builds upon #13433.
- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.
### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.
This adds a first-class server request for MCP server elicitations:
`mcpServer/elicitation/request`.
Until now, MCP elicitation requests only showed up as a raw
`codex/event/elicitation_request` event from core. That made it hard for
v2 clients to handle elicitations using the same request/response flow
as other server-driven interactions (like shell and `apply_patch`
tools).
This also updates the underlying MCP elicitation request handling in
core to pass through the full MCP request (including URL and form data)
so we can expose it properly in app-server.
### Why not `item/mcpToolCall/elicitationRequest`?
This is because MCP elicitations are related to MCP servers first, and
only optionally to a specific MCP tool call.
In the MCP protocol, elicitation is a server-to-client capability: the
server sends `elicitation/create`, and the client replies with an
elicitation result. RMCP models it that way as well.
In practice an elicitation is often triggered by an MCP tool call, but
not always.
### What changed
- add `mcpServer/elicitation/request` to the v2 app-server API
- translate core `codex/event/elicitation_request` events into the new
v2 server request
- map client responses back into `Op::ResolveElicitation` so the MCP
server can continue
- update app-server docs and generated protocol schema
- add an end-to-end app-server test that covers the full round trip
through a real RMCP elicitation flow
- The new test exercises a realistic case where an MCP tool call
triggers an elicitation, the app-server emits
mcpServer/elicitation/request, the client accepts it, and the tool call
resumes and completes successfully.
### app-server API flow
- Client starts a thread with `thread/start`.
- Client starts a turn with `turn/start`.
- App-server sends `item/started` for the `mcpToolCall`.
- While that tool call is in progress, app-server sends
`mcpServer/elicitation/request`.
- Client responds to that request with `{ action: "accept" | "decline" |
"cancel" }`.
- App-server sends `serverRequest/resolved`.
- App-server sends `item/completed` for the mcpToolCall.
- App-server sends `turn/completed`.
- If the turn is interrupted while the elicitation is pending,
app-server still sends `serverRequest/resolved` before the turn
finishes.
Support marketplace.json that points to a local file, with
```
"source":
{
"source": "local",
"path": "./plugin-1"
},
```
Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
### Overview
This PR:
- Updates `app-server-test-client` to load OTEL settings from
`$CODEX_HOME/config.toml` and initializes its own OTEL provider.
- Add real client root spans to app-server test client traces.
This updates `codex-app-server-test-client` so its Datadog traces
reflect the full client-driven flow instead of a set of server spans
stitched together under a synthetic parent.
Before this change, the test client generated a fake `traceparent` once
and reused it for every JSON-RPC request. That kept the requests in one
trace, but there was no real client span at the top, so Datadog ended up
showing the sequence in a slightly misleading way, where all RPCs were
anchored under `initialize`.
Now the test client:
- loads OTEL settings from the normal Codex config path, including
`$CODEX_HOME/config.toml` and existing --config overrides
- initializes tracing the same way other Codex binaries do when trace
export is enabled
- creates a real client root span for each scripted command
- creates per-request client spans for JSON-RPC methods like
`initialize`, `thread/start`, and `turn/start`
- injects W3C trace context from the current client span into
request.trace instead of reusing a fabricated carrier
This gives us a cleaner trace shape in Datadog:
- one trace URL for the whole scripted flow
- a visible client root span
- proper client/server parent-child relationships for each app-server
request
The electron app doesn't start up the app-server in a particular
workspace directory.
So sandbox setup happens in the app-installed directory instead of the
project workspace.
This allows the app do specify the workspace cwd so that the sandbox
setup actually sets up the ACLs instead of exiting fast and then having
the first shell command be slow.
## Summary
Changes the permission profile shape from a bare network boolean to a
nested object.
Before:
```yaml
permissions:
network: true
```
After:
```yaml
permissions:
network:
enabled: true
```
This also updates the shared Rust and app-server protocol types so
`PermissionProfile.network` is no longer `Option<bool>`, but
`Option<NetworkPermissions>` with `enabled: Option<bool>`.
## What Changed
- Updated `PermissionProfile` in `codex-rs/protocol/src/models.rs`:
- `pub network: Option<bool>` -> `pub network:
Option<NetworkPermissions>`
- Added `NetworkPermissions` with:
- `pub enabled: Option<bool>`
- Changed emptiness semantics so `network` is only considered empty when
`enabled` is `None`
- Updated skill metadata parsing to accept `permissions.network.enabled`
- Updated core permission consumers to read
`network.enabled.unwrap_or(false)` where a concrete boolean is needed
- Updated app-server v2 protocol types and regenerated schema/TypeScript
outputs
- Updated docs to mention `additionalPermissions.network.enabled`
## Why
Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.
This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.
It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.
The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.
## What Changed
Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:
```toml
[features]
personality = true
unified_exec = false
```
Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.
- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.
## Verification
- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.
## Docs
`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
## Summary
`PermissionProfile.network` could not be preserved when additional or
compiled permissions resolved to
`SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
field. This change makes read-only + network
enabled representable directly and threads that through the protocol,
app-server v2 mirror, and permission-
merging logic.
## What changed
- Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
protocol and app-server v2 protocol.
- Kept backward compatibility by defaulting the new field to false, so
legacy read-only payloads still
deserialize unchanged.
- Updated `has_full_network_access()` and sandbox summaries to respect
read-only network access.
- Preserved PermissionProfile.network when:
- compiling skill permission profiles into sandbox policies
- normalizing additional permissions
- merging additional permissions into existing sandbox policies
- Updated the approval overlay to show network in the rendered
permission rule when requested.
- Regenerated app-server schema fixtures for the new v2 wire shape.
This adds a first-class app-server v2 `skills/changed` notification for
the existing skills live-reload signal.
Before this change, clients only had the legacy raw
`codex/event/skills_update_available` event. With this PR, v2 clients
can listen for a typed JSON-RPC notification instead of depending on the
legacy `codex/event/*` stream, which we want to remove soon.
## Summary
Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.
When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.
## What changed
- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
- Preserves original PNG/JPEG/WebP bytes.
- Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
- select original-resolution mode only when:
- the feature flag is enabled, and
- the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.
## Behavior
### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.
### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
- ⏳ `2` https://github.com/openai/codex/pull/13331
- ⏳ `3` https://github.com/openai/codex/pull/13049
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly
## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
- `suite::grep_files::grep_files_tool_collects_matches`
- `suite::grep_files::grep_files_tool_reports_empty_results`
Realized EventMsg generated types were unintentionally removed as part
of this PR: https://github.com/openai/codex/pull/13375
Turns out our TypeScript export pipeline relied on transitively reaching
`EventMsg`. We should still export `EventMsg` explicitly since we're
still emitting `codex/event/*` events (for now, but getting dropped soon
as well).
## Summary
This removes the old app-server v1 methods and notifications we no
longer need, while keeping the small set the main codex app client still
depends on for now.
The remaining legacy surface is:
- `initialize`
- `getConversationSummary`
- `getAuthStatus`
- `gitDiffToRemote`
- `fuzzyFileSearch`
- `fuzzyFileSearch/sessionStart`
- `fuzzyFileSearch/sessionUpdate`
- `fuzzyFileSearch/sessionStop`
And the raw `codex/event/*` notifications emitted from core. These
notifications will be removed in a followup PR.
## What changed
- removed deprecated v1 request variants from the protocol and
app-server dispatcher
- removed deprecated typed notifications: `authStatusChange`,
`loginChatGptComplete`, and `sessionConfigured`
- updated the app-server test client to use v2 flows instead of deleted
v1 flows
- deleted legacy-only app-server test suites and added focused coverage
for `getConversationSummary`
- regenerated app-server schema fixtures and updated the MCP interface
docs to match the remaining compatibility surface
## Testing
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
## Summary
- add an `--experimental` flag to the export binary and thread the
option through TypeScript and JSON schema generation
- flatten the v2 schema bundle into a datamodel-code-generator-friendly
`codex_app_server_protocol.v2.schemas.json` export
- retarget shared helper refs to namespaced v2 definitions, add coverage
for the new export behavior, and vendor the generated schema fixtures
## Validation
- `cargo test -p codex-app-server-protocol` (71 unit tests and bin
targets passed locally; the final schema fixture integration target was
revalidated via fresh schema regeneration and a tree diff)
- `./target/debug/write_schema_fixtures --schema-root <tmpdir>`
- `diff -rq app-server-protocol/schema <tmpdir>`
## Tickets
- None
followup to https://github.com/openai/codex/pull/13212 to expose fast
tier controls to app server
(majority of this PR is generated schema jsons - actual code is +69 /
-35 and +24 tests )
- add service tier fields to the app-server protocol surfaces used by
thread lifecycle, turn start, config, and session configured events
- thread service tier through the app-server message processor and core
thread config snapshots
- allow runtime config overrides to carry service tier for app-server
callers
cleanup:
- Removing useless "legacy" code supporting "standard" - we moved to
None | "fast", so "standard" is not needed.
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket
---------
Co-authored-by: Codex <noreply@openai.com>
### Overview
This PR adds the first piece of tracing for app-server JSON-RPC
requests.
There are two main changes:
- JSON-RPC requests can now take an optional W3C trace context at the
top level via a `trace` field (`traceparent` / `tracestate`).
- app-server now creates a dedicated request span for every inbound
JSON-RPC request in `MessageProcessor`, and uses the request-level trace
context as the parent when present.
For compatibility with existing flows, app-server still falls back to
the TRACEPARENT env var when there is no request-level traceparent.
This PR is intentionally scoped to the app-server boundary. In a
followup, we'll actually propagate trace context through the async
handoff into core execution spans like run_turn, which will make
app-server traces much more useful.
### Spans
A few details on the app-server span shape:
- each inbound request gets its own server span
- span/resource names are based on the JSON-RPC method (`initialize`,
`thread/start`, `turn/start`, etc.)
- spans record transport (stdio vs websocket), request id, connection
id, and client name/version when available
- `initialize` stores client metadata in session state so later requests
on the same connection can reuse it
This change fixes a Codex app account-state sync bug where clients could
know the user was signed in but still miss the ChatGPT subscription
tier, which could lead to incorrect upgrade messaging for paid users.
The root cause was that `account/updated` only carried `authMode` while
plan information was available separately via `account/read` and
rate-limit snapshots, so this update adds `planType` to
`account/updated`, populates it consistently across login and refresh
paths.
Replay pending client requests after `thread/resume` and emit resolved
notifications when those requests clear so approval/input UI state stays
in sync after reconnects and across subscribed clients.
Affected RPCs:
- `item/commandExecution/requestApproval`
- `item/fileChange/requestApproval`
- `item/tool/requestUserInput`
Motivation:
- Resumed clients need to see pending approval/input requests that were
already outstanding before the reconnect.
- Clients also need an explicit signal when a pending request resolves
or is cleared so stale UI can be removed on turn start, completion, or
interruption.
Implementation notes:
- Use pending client requests from `OutgoingMessageSender` in order to
replay them after `thread/resume` attaches the connection, using
original request ids.
- Emit `serverRequest/resolved` when pending requests are answered
or cleared by lifecycle cleanup.
- Update the app-server protocol schema, generated TypeScript bindings,
and README docs for the replay/resolution flow.
High-level test plan:
- Added automated coverage for replaying pending command execution and
file change approval requests on `thread/resume`.
- Added automated coverage for resolved notifications in command
approval, file change approval, request_user_input, turn start, and turn
interrupt flows.
- Verified schema/docs updates in the relevant protocol and app-server
tests.
Manual testing:
- Tested reconnect/resume with multiple connections.
- Confirmed state stayed in sync between connections.
## Why
`PermissionProfile` should describe filesystem roots as absolute paths
at the type level. Using `PathBuf` in `FileSystemPermissions` made the
shared type too permissive and blurred together three different
deserialization cases:
- skill metadata in `agents/openai.yaml`, where relative paths should
resolve against the skill directory
- app-server API payloads, where callers should have to send absolute
paths
- local tool-call payloads for commands like `shell_command` and
`exec_command`, where `additional_permissions.file_system` may
legitimately be relative to the command `workdir`
This change tightens the shared model without regressing the existing
local command flow.
## What Changed
- changed `protocol::models::FileSystemPermissions` and the app-server
`AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
- wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
relative permission roots in `agents/openai.yaml` resolve against the
containing skill directory
- kept app-server/API deserialization strict, so relative
`additionalPermissions.fileSystem.*` paths are rejected at the boundary
- restored cwd/workdir-relative deserialization for local tool-call
payloads by parsing `shell`, `shell_command`, and `exec_command`
arguments under an `AbsolutePathBufGuard` rooted at the resolved command
working directory
- simplified runtime additional-permission normalization so it only
canonicalizes and deduplicates absolute roots instead of trying to
recover relative ones later
- updated the app-server schema fixtures, `app-server/README.md`, and
the affected transport/TUI tests to match the final behavior
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
## Summary
This changes `custom_tool_call_output` to use the same output payload
shape as `function_call_output`, so freeform tools can return either
plain text or structured content items.
The main goal is to let `js_repl` return image content from nested
`view_image` calls in its own `custom_tool_call_output`, instead of
relying on a separate injected message.
## What changed
- Changed `custom_tool_call_output.output` from `string` to
`FunctionCallOutputPayload`
- Updated freeform tool plumbing to preserve structured output bodies
- Updated `js_repl` to aggregate nested tool content items and attach
them to the outer `js_repl` result
- Removed the old `js_repl` special case that injected `view_image`
results as a separate pending user image message
- Updated normalization/history/truncation paths to handle multimodal
`custom_tool_call_output`
- Regenerated app-server protocol schema artifacts
## Behavior
Direct `view_image` calls still return a `function_call_output` with
image content.
When `view_image` is called inside `js_repl`, the outer `js_repl`
`custom_tool_call_output` now carries:
- an `input_text` item if the JS produced text output
- one or more `input_image` items from nested tool results
So the nested image result now stays inside the `js_repl` tool output
instead of being injected as a separate message.
## Compatibility
This is intended to be backward-compatible for resumed conversations.
Older histories that stored `custom_tool_call_output.output` as a plain
string still deserialize correctly, and older histories that used the
previous injected-image-message flow also continue to resume.
Added regression coverage for resuming a pre-change rollout containing:
- string-valued `custom_tool_call_output`
- legacy injected image message history
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12948
Command-approval clients currently infer which choices to show from
side-channel fields like `networkApprovalContext`,
`proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
the request shape harder to evolve, and it forces each client to
replicate the server's heuristics instead of receiving the exact
decision list for the prompt.
This PR introduces a mapping between `CommandExecutionApprovalDecision`
and `codex_protocol::protocol::ReviewDecision`:
```rust
impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
fn from(value: CoreReviewDecision) -> Self {
match value {
CoreReviewDecision::Approved => Self::Accept,
CoreReviewDecision::ApprovedExecpolicyAmendment {
proposed_execpolicy_amendment,
} => Self::AcceptWithExecpolicyAmendment {
execpolicy_amendment: proposed_execpolicy_amendment.into(),
},
CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
CoreReviewDecision::NetworkPolicyAmendment {
network_policy_amendment,
} => Self::ApplyNetworkPolicyAmendment {
network_policy_amendment: network_policy_amendment.into(),
},
CoreReviewDecision::Abort => Self::Cancel,
CoreReviewDecision::Denied => Self::Decline,
}
}
}
```
And updates `CommandExecutionRequestApprovalParams` to have a new field:
```rust
available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
```
when, if specified, should make it easier for clients to display an
appropriate list of options in the UI.
This makes it possible for `CoreShellActionProvider::prompt()` in
`unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
adding support for `ApprovedForSession` when approving a skill script,
which was previously missing in the TUI.
Note this results in a significant change to `exec_options()` in
`approval_overlay.rs`, as the displayed options are now derived from
`available_decisions: &[ReviewDecision]`.
## What Changed
- Add `available_decisions` to
[`ExecApprovalRequestEvent`](de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)),
including helpers to derive the legacy default choices when older
senders omit the field.
- Map `codex_protocol::protocol::ReviewDecision` to app-server
`CommandExecutionApprovalDecision` and expose the ordered list as
experimental `availableDecisions` in
[`CommandExecutionRequestApprovalParams`](de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)).
- Thread optional `available_decisions` through the core approval path
so Unix shell escalation can explicitly request `ApprovedForSession` for
session-scoped approvals instead of relying on client heuristics.
[`unix_escalation.rs`](de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214))
- Update the TUI approval overlay to build its buttons from the ordered
decision list, while preserving the legacy fallback when
`available_decisions` is missing.
- Update the app-server README, test client output, and generated schema
artifacts to document and surface the new field.
## Testing
- Add `approval_overlay.rs` coverage for explicit decision lists,
including the generic `ApprovedForSession` path and network approval
options.
- Update `chatwidget/tests.rs` and app-server protocol tests to populate
the new optional field and keep older event shapes working.
## Developers Docs
- If we document `item/commandExecution/requestApproval` on
[developers.openai.com/codex](https://developers.openai.com/codex), add
experimental `availableDecisions` as the preferred source of approval
choices and note that older servers may omit it.
This reverts commit https://github.com/openai/codex/pull/12633. We no
longer need this PR, because we favor sending normal exec command
approval server request with `additional_permissions` of skill
permissions instead
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags
## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`
Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.
This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.
Closed threads will also be removed from `thread/loaded/list`.
Previously, clients would call `thread/start` with dynamic_tools set,
and when a model invokes a dynamic tool, it would just make the
server->client `item/tool/call` request and wait for the client's
response to complete the tool call. This works, but it doesn't have an
`item/started` or `item/completed` event.
Now we are doing this:
- [new] emit `item/started` with `DynamicToolCall` populated with the
call arguments
- send an `item/tool/call` server request
- [new] once the client responds, emit `item/completed` with
`DynamicToolCall` populated with the response.
Also, with `persistExtendedHistory: true`, dynamic tool calls are now
reconstructable in `thread/read` and `thread/resume` as
`ThreadItem::DynamicToolCall`.