- add model and reasoning effort to app-server collab spawn items and
notifications
- regenerate app-server protocol schemas for the new fields
---------
Co-authored-by: Codex <noreply@openai.com>
## Stack
fix: fail closed for unsupported split windows sandboxing #14172
fix: preserve split filesystem semantics in linux sandbox #14173
-> fix: align core approvals with split sandbox policies #14171
refactor: centralize filesystem permissions precedence #14174
## Why This PR Exists
This PR is intentionally narrower than the title may suggest.
Most of the original split-permissions migration already landed in the
earlier `#13434 -> #13453` stack. In particular:
- `#13439` already did the broad runtime plumbing for split filesystem
and network policies.
- `#13445` already moved `apply_patch` safety onto filesystem-policy
semantics.
- `#13448` already switched macOS Seatbelt generation to split policies.
- `#13449` and `#13453` already handled Linux helper and bubblewrap
enforcement.
- `#13440` already introduced the first protocol-side helpers for
deriving effective filesystem access.
The reason this PR still exists is that after the follow-on
`[permissions]` work and the new shared precedence helper in `#14174`, a
few core approval paths were still deciding behavior from the legacy
`SandboxPolicy` projection instead of the split filesystem policy that
actually carries the carveouts.
That means this PR is mostly a cleanup and alignment pass over the
remaining core consumers, not a fresh sandbox backend migration.
## What Is Actually New Here
- make unmatched-command fallback decisions consult
`FileSystemSandboxPolicy` instead of only legacy `DangerFullAccess` /
`ReadOnly` / `WorkspaceWrite` categories
- thread `file_system_sandbox_policy` into the shell, unified-exec, and
intercepted-exec approval paths so they all use the same split-policy
semantics
- keep `apply_patch` safety on the same effective-access rules as the
shared protocol helper, rather than letting it drift through
compatibility projections
- add loader-level regression coverage proving legacy `sandbox_mode`
config still builds split policies and round-trips back without semantic
drift
## What This PR Does Not Do
This PR does not introduce new platform backend enforcement on its own.
- Linux backend parity remains in `#14173`.
- Windows fail-closed handling remains in `#14172`.
- The shared precedence/model changes live in `#14174`.
## Files To Focus On
- `core/src/exec_policy.rs`: unmatched-command fallback and approval
rendering now read the split filesystem policy directly
- `core/src/tools/sandboxing.rs`: default exec-approval requirement keys
off `FileSystemSandboxPolicy.kind`
- `core/src/tools/handlers/shell.rs`: shell approval requests now carry
the split filesystem policy
- `core/src/unified_exec/process_manager.rs`: unified-exec approval
requests now carry the split filesystem policy
- `core/src/tools/runtimes/shell/unix_escalation.rs`: intercepted exec
fallback now uses the same split-policy approval semantics
- `core/src/safety.rs`: `apply_patch` safety keeps using effective
filesystem access rather than legacy sandbox categories
- `core/src/config/config_tests.rs`: new regression coverage for legacy
`sandbox_mode` no-drift behavior through the split-policy loader
## Notes
- `core/src/codex.rs` and `core/src/codex_tests.rs` are just small
fallout updates for `RequestPermissionsResponse.scope`; they are not the
point of the PR.
- If you reviewed the earlier `#13439` / `#13445` stack, the main review
question here is simply: “are there any remaining approval or
patch-safety paths that still reconstruct semantics from legacy
`SandboxPolicy` instead of consuming the split filesystem policy
directly?”
## Testing
- cargo test -p codex-core
legacy_sandbox_mode_config_builds_split_policies_without_drift
- cargo test -p codex-core request_permissions
- cargo test -p codex-core intercepted_exec_policy
- cargo test -p codex-core
restricted_sandbox_requires_exec_approval_on_request
- cargo test -p codex-core
unmatched_on_request_uses_split_filesystem_policy_for_escalation_prompts
- cargo test -p codex-core explicit_
- cargo clippy -p codex-core --tests -- -D warnings
## Description
This PR trims `app-server-protocol`'s v1 surface down to the small set
of legacy types we still actually use.
Unfortunately, we can't delete all of them yet because:
- a few one-off v1 RPCs are still used by the Codex app
- a few of these app-server-protocol v1 types are actually imported by
core crates
This change deletes that unused RPC surface, keeps the remaining
compatibility types in place, and makes the crate root re-export only
the v1 structs that downstream crates still depend on.
## Why
The main goal here is to make the legacy protocol surface match reality.
Leaving a large pile of dead v1 structs in place makes it harder to tell
which compatibility paths are still intentional, and it keeps old
schema/types around even though nothing should be building against them
anymore.
This also gives us a cleaner boundary for future cleanup. Instead of
re-exporting all of `protocol::v1::*`, the crate now explicitly exposes
only the v1 types that are still live, which makes it much easier to see
what remains and delete more safely later.
## What changed
- Deleted the unused v1 RPC/request/response structs from
`app-server-protocol/src/protocol/v1.rs`.
- Kept the small set of v1 compatibility types that are still live,
including:
- `initialize`
- `getConversationSummary`
- `getAuthStatus`
- `gitDiffToRemote`
- legacy approval payloads
- config-related structs still used by downstream crates
- Replaced the blanket `pub use protocol::v1::*` export in
`app-server-protocol/src/lib.rs` with an explicit list of the remaining
supported v1 types.
- Regenerated the schema/type artifacts, which also updated the
`InitializeCapabilities` opt-out example to use `thread/started` instead
of the old `codex/event/session_configured` example.
## Validation
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
## Follow-up
The next cleanup is to keep shrinking the remaining v1 compatibility
surface as callers migrate off it. Once the remaining consumers stop
importing these legacy types, we should be able to remove more of the v1
module and eventually stop exporting it from the crate root entirely.
## Why
to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.
## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
## Description
This PR stops emitting legacy `codex/event/*` notifications from the
public app-server transports.
It's been a long time coming! app-server was still producing a raw
notification stream from core, alongside the typed app-server
notifications and server requests, for compatibility reasons. Now,
external clients should no longer be depending on those legacy
notifications, so this change removes them from the stdio and websocket
contract and updates the surrounding docs, examples, and tests to match.
### Caveat
I left the "in-process" version of app-server alone for now, since
`codex exec` was recently based on top of app-server via this in-process
form here: https://github.com/openai/codex/pull/14005
Seems like `codex exec` still consumes some legacy notifications
internally, so this branch only removes `codex/event/*` from app-server
over stdio and websockets.
## Follow-up
Once `codex exec` is fully migrated off `codex/event/*` notifications,
we'll be able to stop emitting them entirely entirely instead of just
filtering it at the external transport boundary.
Prevent binaries >500KB from being committed. And maintain an allowlist
if we need to bypass on a case-by-case basis.
I checked the currently tracked binary-like assets in the repo. There
are only 5 obvious committed binaries by extension/MIME type:
- `.github/codex-cli-splash.png`: `838,131` bytes, about `818 KiB`
- `codex-rs/vendor/bubblewrap/bubblewrap.jpg`: `40,239` bytes, about `39
KiB`
-
`codex-rs/skills/src/assets/samples/skill-creator/assets/skill-creator.png`:
`1,563` bytes
- `codex-rs/skills/src/assets/samples/openai-docs/assets/openai.png`:
`1,429` bytes
-
`codex-rs/skills/src/assets/samples/skill-installer/assets/skill-installer.png`:
`1,086` bytes
So `500 KB` looks like a good default for this repo. It would only trip
on one existing intentional asset, which keeps the allowlist small and
the policy easy to understand.
Here's a smoke-test from a throwaway branch that tries to commit a large
binary:
https://github.com/openai/codex/actions/runs/22971558828/job/66689330435?pr=14383
## Summary
This PR narrows original image detail handling to a single opt-in
feature:
- `image_detail_original` lets the model request `detail: "original"` on
supported models
- Omitting `detail` preserves the default resized behavior
The model only sees `detail: "original"` guidance when the active model
supports it:
- JS REPL instructions include the guidance and examples only on
supported models
- `view_image` only exposes a `detail` parameter when the feature and
model can use it
The image detail API is intentionally narrow and consistent across both
paths:
- `view_image.detail` supports only `"original"`; otherwise omit the
field
- `codex.emitImage(..., detail)` supports only `"original"`; otherwise
omit the field
- Unsupported explicit values fail clearly at the API boundary instead
of being silently reinterpreted
- Unsupported explicit `detail: "original"` requests fall back to normal
behavior when the feature is disabled or the model does not support
original detail
## Summary
- only trigger multi-agent fast-switch shortcuts when the composer is
empty
- keep the Option+b/f fallback for terminals that encode Option+arrow
that way
- document why the empty-composer gate preserves expected word-wise
editing behavior
## Testing
- just fmt
- cargo test -p codex-tui
Co-authored-by: Codex <noreply@openai.com>
## Summary
This PR adds two read-only path helpers to `js_repl`:
- `codex.cwd`
- `codex.homeDir`
They are exposed alongside the existing `codex.tmpDir` helper so the
REPL can reference basic host path context without reopening direct
`process` access.
## Implementation
- expose `codex.cwd` and `codex.homeDir` from the js_repl kernel
- make `codex.homeDir` come from the kernel process environment
- pass session dependency env through js_repl kernel startup so
`codex.homeDir` matches the env a shell-launched process would see
- keep existing shell `HOME` population behavior unchanged
- update js_repl prompt/docs and add runtime/integration coverage for
the new helpers
## Summary
- switch the local HTTP proxy listener from Rama's auto server to
explicit HTTP/1 so CONNECT clients skip the version-sniffing pre-read
path
- move rustls crypto-provider bootstrap into the HTTP proxy runner so
direct callers do not need hidden global init
- add a regression test that exercises a plain HTTP/1 CONNECT request
against a live loopback listener
## Summary
- defer fresh-session `build_initial_context()` until the first real
turn instead of seeding model-visible context during startup
- rely on the existing `reference_context_item == None` turn-start path
to inject full initial context on that first real turn (and again after
baseline resets such as compaction)
- add a regression test for `InitialHistory::New` and update affected
deterministic tests / snapshots around developer-message layout,
collaboration instructions, personality updates, and compact request
shapes
## Notes
- this PR does not add any special empty-thread `/compact` behavior
- most of the snapshot churn is the direct result of moving the initial
model-visible context from startup to the first real turn, so first-turn
request layouts no longer contain a pre-user startup copy of permissions
/ environment / other developer-visible context
- remote manual `/compact` with no prior user still skips the remote
compact request; local first-turn `/compact` still issues a compact
request, but that request now reflects the lack of startup-seeded
context
---------
Co-authored-by: Codex <noreply@openai.com>
- tell agents when a role pins model or reasoning effort so they know
those settings are not changeable
- add prompt-builder coverage for the locked-setting notes
## Summary
- add a per-turn `codex.turn.network_proxy` metric constant
- emit the metric from turn completion using the live managed proxy
enabled state
- add focused tests for active and inactive tag emission
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
wire plugin marketplace metadata through app-server endpoints:
- `plugin/list` has `installPolicy` and `authPolicy`
- `plugin/install` has plugin-level `authPolicy`
`plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
installing.
added tests.
## Summary
This PR fixes OTLP HTTP trace export in runtimes where the previous
exporter setup was unreliable, especially around app-server usage. It
also removes the old `codex_otel::otel_provider` compatibility shim and
switches remaining call sites over to the crate-root
`codex_otel::OtelProvider` export.
## What changed
- Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
- Add an async HTTP client path for trace export when we are already
inside a multi-thread Tokio runtime.
- Make provider shutdown flush traces before tearing down the tracer
provider.
- Add loopback coverage that verifies traces are actually sent to
`/v1/traces`:
- outside Tokio
- inside a multi-thread Tokio runtime
- inside a current-thread Tokio runtime
- Remove the `codex_otel::otel_provider` shim and update remaining
imports.
## Why
I hit cases where spans were being created correctly but never made it
to the collector. The issue turned out to be in exporter/runtime
behavior rather than the span plumbing itself. This PR narrows that gap
and gives us regression coverage for the actual export path.
Summary
- add a custom deserializer so `[tools].web_search` can be a bool
(treated as disabled) or a config object
- extend core and app-server tests to cover bool handling in TOML config
Testing
- Not run (not requested)
## What changed
- Drop failed websocket connections immediately after a terminal stream
error instead of awaiting a graceful close handshake before forwarding
the error to the caller.
- Keep the success path and the closed-connection guard behavior
unchanged.
## Why this fixes the flake
- The failing integration test waits for the second websocket stream to
surface the model error before issuing a follow-up request.
- On slower runners, the old error path awaited
`ws_stream.close().await` before sending the error downstream. If that
close handshake stalled, the test kept waiting for an error that had
already happened server-side and nextest timed it out.
- Dropping the failed websocket immediately makes the terminal error
observable right away and marks the session closed so the next request
reconnects cleanly instead of depending on a best-effort close
handshake.
## Code or test?
- This is a production logic fix in `codex-api`. The existing websocket
integration test already exercises the regression path.
- include the requested sub-agent model and reasoning effort in the
spawn begin event\n- render that metadata next to the spawned agent name
and role in the TUI transcript
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- update the code-mode handler, runner, instructions, and error text to
refer to the `exec` tool name everywhere that used to say `code_mode`
- ensure generated documentation strings and tool specs describe `exec`
and rely on the shared `PUBLIC_TOOL_NAME`
- refresh the suite tests so they invoke `exec` instead of the old name
Testing
- Not run (not requested)
## Summary
- update the guardian prompting
- clarify the guardian rejection message so an action may still proceed
if the user explicitly approves it after being informed of the risk
## Testing
- cargo run on selected examples
## What changed
- keep the explicit stdin-close behavior after writing so the child
still receives EOF deterministically
- on Windows, stop using `python -c` for the round-trip assertion and
instead run a native `cmd.exe` pipeline that reads one line from stdin
with `set /p` and echoes it back
- send `
` on Windows so the stdin payload matches the platform-native line
ending the shell reader expects
## Why this fixes flakiness
The failing branch-local flake was not in `spawn_pipe_process` itself.
The child exited cleanly, but the Windows ARM runner sometimes produced
an empty stdout string when the test used Python as the stdin consumer.
That makes the test sensitive to Python startup and stdin-close timing
rather than the pipe primitive we actually want to validate. Switching
the Windows path to a native `cmd.exe` reader keeps the assertion
focused on our pipe behavior: bytes written to stdin should come back on
stdout before EOF closes the process. The explicit `
` write removes line-ending ambiguity on Windows.
## Scope
- test-only
- no production logic change
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
Pass more params to /compact. This should give us parity with the
/responses endpoint to improve caching.
I'm torn about the MCP await. Blocking will give us parity but it seems
like we explicitly don't block on MCPs. Happy either way
### Summary
This PR adds first-class ephemeral support to thread/fork, bringing it
in line with thread/start. The goal is to support one-off completions on
full forked threads without persisting them as normal user-visible
threads.
### Testing
Summary
- document how code-mode can import `output_text`/`output_image` and
ensure `add_content` stays compatible
- add a synthetic `@openai/code_mode` module that appends content items
and validates inputs
- cover the new behavior with integration tests for structured text and
image outputs
Testing
- Not run (not requested)
- clarify the `close_agent` tool description so it nudges models to
close agents they no longer need
- keep the change scoped to the tool spec text only
Co-authored-by: Codex <noreply@openai.com>
- raise the sdk workflow job timeout from 10 to 15 minutes to reduce
false cancellations near the current limit
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- document that `@openai/code_mode` exposes
`set_max_output_tokens_per_exec_call` and that `code_mode` truncates the
final Rust-side output when the budget is exceeded
- enforce the configured budget in the Rust tool runner, reusing
truncation helpers so text-only outputs follow the unified-exec wrapper
and mixed outputs still fit within the limit
- ensure the new behavior is covered by a code-mode integration test and
string spec update
Testing
- Not run (not requested)
Summary
- drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers
to keep MCP tooling focused on the final result shape
- wire the new schema definitions through code mode, context, handlers,
and spec modules so MCP tools serialize the exact output shape expected
by the model
- extend code mode tests to cover multiple MCP call scenarios and ensure
the serialized data matches the new schema
- refresh JS runner helpers and protocol models alongside the schema
changes
Testing
- Not run (not requested)
- add `model` and `reasoning_effort` to the `spawn_agent` schema so the
values pass through
- validate requested models against `model.model` and only check that
the selected model supports the requested reasoning effort
---------
Co-authored-by: Codex <noreply@openai.com>