Add service name to the app-server so that the app can use it's own
service name
This is on thread level because later we might plan the app-server to
become a singleton on the computer
## Summary
- Preserve each skill’s raw permissions block as a permission_profile on
SkillMetadata during skill loading.
- Keep compiling that same metadata into the existing runtime
Permissions object, so current enforcement
behavior stays intact.
- When zsh-fork intercepts execution of a script that belongs to a
skill, include the skill’s
permission_profile in the exec approval request.
- This lets approval UIs show the extra filesystem access the skill
declared when prompting for approval.
## Why
In the `shell_zsh_fork` flow, `codex-shell-escalation` receives the
executable path exactly as the shell passed it to `execve()`. That path
is not guaranteed to be absolute.
For commands such as `./scripts/hello-mbolin.sh`, if the shell was
launched with a different `workdir`, resolving the intercepted `file`
against the server process working directory makes policy checks and
skill matching inspect the wrong executable. This change pushes that fix
a step further by keeping the normalized path typed as `AbsolutePathBuf`
throughout the rest of the escalation pipeline.
That makes the absolute-path invariant explicit, so later code cannot
accidentally treat the resolved executable path as an arbitrary
`PathBuf`.
## What Changed
- record the wrapper process working directory as an `AbsolutePathBuf`
- update the escalation protocol so `workdir` is explicitly absolute
while `file` remains the raw intercepted exec path
- resolve a relative intercepted `file` against the request `workdir` as
soon as the server receives the request
- thread `AbsolutePathBuf` through `EscalationPolicy`,
`CoreShellActionProvider`, and command normalization helpers so the
resolved executable path stays type-checked as absolute
- replace the `path-absolutize` dependency in `codex-shell-escalation`
with `codex-utils-absolute-path`
- add a regression test that covers a relative `file` with a distinct
`workdir`
## Verification
- `cargo test -p codex-shell-escalation`
Direct skill-script matches force `Decision::Prompt`, so skill-backed
scripts require explicit approval before they run. (Note "allow for
session" is not supported in this PR, but will be done in a follow-up.)
In the process of implementing this, I fixed an important bug:
`ShellZshFork` is supposed to keep ordinary allowed execs on the
client-side `Run` path so later `execve()` calls are still intercepted
and reviewed. After the shell-escalation port, `Decision::Allow` still
mapped to `Escalate`, which moved `zsh` to server-side execution too
early. That broke the intended flow for skill-backed scripts and made
the approval prompt depend on the wrong execution path.
## What changed
- In `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs`,
`Decision::Allow` now returns `Run` unless escalation is actually
required.
- Removed the zsh-specific `argv[0]` fallback. With the `Allow -> Run`
fix in place, zsh's later `execve()` of the script is intercepted
normally, so the skill match happens on the script path itself.
- Kept the skill-path handling in `determine_action()` focused on the
direct `program` match path.
## Verification
- Updated `shell_zsh_fork_prompts_for_skill_script_execution` in
`codex-rs/core/tests/suite/skill_approval.rs` (gated behind `cfg(unix)`)
to:
- run under `SandboxPolicy::new_workspace_write_policy()` instead of
`DangerFullAccess`
- assert the approval command contains only the script path
- assert the approved run returns both stdout and stderr markers in the
shell output
- Ran `cargo test -p codex-core
shell_zsh_fork_prompts_for_skill_script_execution -- --nocapture`
## Manual Testing
Run the dev build:
```
just codex --config zsh_path=/Users/mbolin/code/codex2/codex-rs/app-server/tests/suite/zsh --enable shell_zsh_fork
```
I have created `/Users/mbolin/.agents/skills/mbolin-test-skill` with:
```
├── scripts
│ └── hello-mbolin.sh
└── SKILL.md
```
The skill:
```
---
name: mbolin-test-skill
description: Used to exercise various features of skills.
---
When this skill is invoked, run the `hello-mbolin.sh` script and report the output.
```
The script:
```
set -e
# Note this script will fail if run with network disabled.
curl --location openai.com
```
Use `$mbolin-test-skill` to invoke the skill manually and verify that I
get prompted to run `hello-mbolin.sh`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12730).
* #12750
* __->__ #12730
## Summary
Remove js_repl/node test-skip paths and make Node setup explicit in CI
so js_repl tests always run instead of silently skipping.
## Why
We had multiple “expediency” skip paths that let js_repl tests pass
without actually exercising Node-backed behavior. This reduced CI signal
and hid runtime/environment regressions.
## What changed
### CI
- Added Node setup using `codex-rs/node-version.txt` in:
- `.github/workflows/rust-ci.yml`
- `.github/workflows/bazel.yml`
- Added a Unix PATH copy step in Bazel workflow to expose the setup-node
binary in common paths.
### js_repl test harness
- Added explicit js_repl sandbox test configuration helpers in:
- `codex-rs/core/src/tools/js_repl/mod.rs`
- `codex-rs/core/src/tools/handlers/js_repl.rs`
- Added Linux arg0 dispatch glue for js_repl tests so sandbox subprocess
entrypoint behavior is correct under Linux test execution.
### Removed skip behavior
- Deleted runtime guard function and early-return skips in js_repl tests
(`can_run_js_repl_runtime_tests` and related per-test short-circuits).
- Removed view_image integration test skip behavior:
- dropped `skip_if_no_network!(Ok(()))`
- removed “skip on Node missing/too old” branch after js_repl output
inspection.
## Impact
- js_repl/node tests now consistently execute and fail loudly when the
environment is not correctly provisioned.
- CI has stronger signal for js_repl regressions instead of false green
from conditional skips.
## Testing
- `cargo test -p codex-core` (locally) to validate js_repl
unit/integration behavior with skips removed.
- CI expected to surface any remaining environment/runtime gaps directly
(rather than masking them).
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/12300
- ✅ `2` https://github.com/openai/codex/pull/12275
- ✅ `3` https://github.com/openai/codex/pull/12205
- ✅ `4` https://github.com/openai/codex/pull/12407
- ✅ `5` https://github.com/openai/codex/pull/12372
- 👉 `6` https://github.com/openai/codex/pull/12185
- ⏳ `7` https://github.com/openai/codex/pull/10673
## Why
`ExecApprovalRequestEvent` can carry a distinct `approval_id` for
subcommand approvals, including the `execve`-intercepted zsh-fork path.
The session registers the pending approval callback under `approval_id`
when one is present, but `ChatWidget` was stashing `call_id` in the
approval modal state. When the user approved the command in the TUI, the
response was sent back with the wrong identifier, so the pending
approval could not be matched and the approval callback would not
resolve.
Note `approval_id` was introduced in
https://github.com/openai/codex/pull/12051.
## What changed
- In `tui/src/chatwidget.rs`, `ChatWidget` now uses
`ExecApprovalRequestEvent::effective_approval_id()` when constructing
`ApprovalRequest::Exec`.
- That preserves the existing behavior for normal shell and
`unified_exec` approvals, where `approval_id` is absent and the
effective id still falls back to `call_id`.
- For subcommand approvals that provide a distinct `approval_id`, the
TUI now sends back the same key that
`Session::request_command_approval()` registered.
## Verification
- Traced the approval flow end to end to confirm the same effective
approval id is now used on both sides of the round trip:
- `Session::request_command_approval()` registers the pending callback
under `approval_id.unwrap_or(call_id)`.
- `ChatWidget` now emits `Op::ExecApproval` with that same effective id.
## Summary
Stabilize `js_repl` runtime test setup in CI and move tool-facing
`js_repl` behavior coverage into integration tests.
This is a test/CI change only. No production `js_repl` behavior change
is intended.
## Why
- Bazel test sandboxes (especially on macOS) could resolve a different
`node` than the one installed by `actions/setup-node`, which caused
`js_repl` runtime/version failures.
- `js_repl` runtime tests depend on platform-specific
sandbox/test-harness behavior, so they need explicit gating in a
base-stability commit.
- Several tests in the `js_repl` unit test module were actually
black-box/tool-level behavior tests and fit better in the integration
suite.
## Changes
- Add `actions/setup-node` to the Bazel and Rust `Tests` workflows,
using the exact version pinned in the repo’s Node version file.
- In Bazel (non-Windows), pass `CODEX_JS_REPL_NODE_PATH=$(which node)`
into test env so `js_repl` uses the `actions/setup-node` runtime inside
Bazel tests.
- Add a new integration test suite for `js_repl` tool behavior and
register it in the core integration test suite module.
- Move black-box `js_repl` behavior tests into the integration suite
(persistence/TLA, builtin tool invocation, recursive self-call
rejection, `process` isolation, blocked builtin imports).
- Keep white-box manager/kernel tests in the `js_repl` unit test module.
- Gate `js_repl` runtime tests to run only on macOS and only when a
usable Node runtime is available (skip on other platforms / missing Node
in this commit).
## Impact
- Reduces `js_repl` CI failures caused by Node resolution drift in
Bazel.
- Improves test organization by separating tool-facing behavior tests
from white-box manager/kernel tests.
- Keeps the base commit stable while expanding `js_repl` runtime
coverage.
#### [git stack](https://github.com/magus/git-stack-cli)
- ✅ `1` https://github.com/openai/codex/pull/12372
- 👉 `2` https://github.com/openai/codex/pull/12407
- ⏳ `3` https://github.com/openai/codex/pull/12185
- ⏳ `4` https://github.com/openai/codex/pull/10673
This PR replaces the old `additional_permissions.fs_read/fs_write` shape
with a shared `PermissionProfile`
model and wires it through the command approval, sandboxing, protocol,
and TUI layers. The schema is adopted from the
`SkillManifestPermissions`, which is also refactored to use this unified
struct. This helps us easily expose permission profiles in app
server/core as a follow-up.
## Summary
- Fix `js_repl` so `await codex.tool("view_image", { path })` actually
attaches the image to the active turn when called from inside the JS
REPL.
- Restore the behavior expected by the existing `js_repl`
image-attachment test.
- This is a follow-up to
[#12553](https://github.com/openai/codex/pull/12553), which changed
`view_image` to return structured image content.
## Root Cause
- [#12553](https://github.com/openai/codex/pull/12553) changed
`view_image` from directly injecting a pending user image message to
returning structured `function_call_output` content items.
- The nested tool-call bridge inside `js_repl` serialized that tool
response back to the JS runtime, but it did not mirror returned image
content into the active turn.
- As a result, `view_image` appeared to succeed inside `js_repl`, but no
`input_image` was actually attached for the outer turn.
## What Changed
- Updated the nested tool-call path in `js_repl` to inspect function
tool responses for structured content items.
- When a nested tool response includes `input_image` content, `js_repl`
now injects a corresponding user `Message` into the active turn before
returning the raw tool result back to the JS runtime.
- Kept the normal JSON result flow intact, so `codex.tool(...)` still
returns the original tool output object to JavaScript.
## Why
- `js_repl` documentation and tests already assume that `view_image` can
be used from inside the REPL to attach generated images to the model.
- Without this fix, the nested call path silently dropped that
attachment behavior.
linux musl build steps in `rust-release.yml` are [currently
broken](https://github.com/openai/codex/actions/runs/22367312571)
because of linking issues due to ubsan-calling types (`jitterentropy`)
leaking into the build.
add `AWS_LC_SYS_NO_JITTER_ENTROPY=1` to the musl build step to avoid
linking those ubsan-calling types. this is a more temporary fix, we need
to clean up ubsan usage upstream so they dont leak into release-build
steps anyways.
codex's more thorough explanation below:
[pr 9859](https://github.com/openai/codex/pull/9859) added [MITM
init](https://github.com/openai/codex/pull/9859/changes#diff-db782967007060c5520651633e1ea21681d64be21f2b791d3d84519860245b97R62-R68)
in network-proxy, which wires in cert generation code (rcgen/rustls).
this didnt bump/change dep versions, but it changed symbol reachability
at link time.
for musl builds, that made aws-lc-sys’s jitterentropy objects get pulled
into the final link. those objects contain UBSan calls
(__ubsan_handle_*). musl release linking is static (*-linux-musl-gcc,
-nodefaultlibs) and does not link a musl UBSan runtime, so link fails
with undefined __ubsan_*.
before, our custom musl CI UBSan steps (install libubsan1, RUSTC_WRAPPER
+ LD_PRELOAD, partial flag scrubbing) masked some sanitizer issues.
after this pr, more aws-lc code became link-reachable, and that band-aid
wasn't enough.
## Why
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously
located `codex-execve-wrapper` by scanning `PATH` and sibling
directories. That lookup is brittle and can select the wrong binary when
the runtime environment differs from startup assumptions.
We already pass `codex-linux-sandbox` from `codex-arg0`;
`codex-execve-wrapper` should use the same startup-driven path plumbing.
## What changed
- Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper
executable paths:
- `codex_linux_sandbox_exe`
- `main_execve_wrapper_exe`
- Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to
top-level binaries and preserve helper paths created in
`prepend_path_entry_for_codex_aliases()`.
- Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`,
`tui`, `app-server`, and `mcp-server`.
- Added `main_execve_wrapper_exe` to core configuration plumbing
(`Config`, `ConfigOverrides`, and `SessionServices`).
- Updated zsh-fork shell escalation to consume the configured
`main_execve_wrapper_exe` and removed path-sniffing fallback logic.
- Updated app-server config reload paths so reloaded configs keep the
same startup-provided helper executable paths.
## References
- [`Arg0DispatchPaths`
definition](e355b43d5c/codex-rs/arg0/src/lib.rs (L20-L24))
- [`arg0_dispatch_or_else()` forwarding both
paths](e355b43d5c/codex-rs/arg0/src/lib.rs (L145-L176))
- [zsh-fork escalation using configured wrapper
path](e355b43d5c/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L109-L150))
## Testing
- `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p
codex-mcp-server -p codex-app-server`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core tools::runtimes:🐚:unix_escalation:: --
--nocapture`
Rename `SkillMetadata.path` to `SkillMetadata.path_to_skills_md` for
clarity.
Would ideally change the type to `AbsolutePathBuf`, but that can be done
later.
## Summary
Improve `js_repl` behavior when the Node kernel hits a process-level
failure (for example, an uncaught exception or unhandled Promise
rejection).
Instead of only surfacing a generic `js_repl kernel exited unexpectedly`
after stdout EOF, `js_repl` now returns a clearer exec error for the
active request, then resets the kernel cleanly.
## Why
Some sandbox-denied operations can trigger Node errors that become
process-level failures (for example, an unhandled EventEmitter `'error'`
event). In that case:
- the kernel process exits,
- the host sees stdout EOF,
- the user gets a generic kernel-exit error,
- and the next request can briefly race with stale kernel state.
This change improves that failure mode without monkeypatching Node APIs.
## Changes
### Kernel-side (`js_repl` Node process)
- Add process-level handlers for:
- `uncaughtException`
- `unhandledRejection`
- When one of these fires:
- best-effort emit a normal `exec_result` error for the active exec
- include actionable guidance to catch/handle async errors (including
Promise rejections and EventEmitter `'error'` events)
- exit intentionally so the host can reset/restart the kernel
### Host-side (`JsReplManager`)
- Clear dead kernel state as soon as the stdout reader observes
unexpected kernel exit/EOF.
- This lets the next `js_repl` exec start a fresh kernel instead of
hitting a stale broken-pipe path.
### Tests
- Add regression coverage for:
- uncaught async exception -> exec error + kernel recovery on next exec
- Update forced-kernel-exit test to validate recovery behavior (next
exec restarts cleanly)
## Impact
- Better user-facing error for kernel crashes caused by
uncaught/unhandled async failures.
- Cleaner recovery behavior after kernel exit.
## Validation
- `cargo test -p codex-core --lib
tools::js_repl::tests::js_repl_uncaught_exception_returns_exec_error_and_recovers
-- --exact`
- `cargo test -p codex-core --lib
tools::js_repl::tests::js_repl_forced_kernel_exit_recovers_on_next_exec
-- --exact`
- `just fmt`
## Summary
- add graceful websocket app-server restart on Ctrl-C by draining until
no assistant turns are running
- stop the websocket acceptor and disconnect existing connections once
the drain condition is met
- add a websocket integration test that verifies Ctrl-C waits for an
in-flight turn before exit
## Verification
- `cargo check -p codex-app-server --quiet`
- `cargo test -p codex-app-server --test all
suite::v2::connection_handling_websocket`
- I (maxj) tested remote and local Codex.app
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`codex-shell-escalation` exposed a `codex-core`-specific adapter layer
(`ShellActionProvider`, `ShellPolicyFactory`, and `run_escalate_server`)
that existed only to bridge `codex-core` to `EscalateServer`. That
indirection increased API surface and obscured crate ownership without
adding behavior.
This change moves orchestration into `codex-core` so boundaries are
clearer: `codex-shell-escalation` provides reusable escalation
primitives, and `codex-core` provides shell-tool policy decisions.
Admittedly, @pakrym rightfully requested this sort of cleanup as part of
https://github.com/openai/codex/pull/12649, though this avoids moving
all of `codex-shell-escalation` into `codex-core`.
## What changed
- Made `EscalateServer` public and exported it from `shell-escalation`.
- Removed the adapter layer from `shell-escalation`:
- deleted `shell-escalation/src/unix/core_shell_escalation.rs`
- removed exports for `ShellActionProvider`, `ShellPolicyFactory`,
`EscalationPolicyFactory`, and `run_escalate_server`
- Updated `core/src/tools/runtimes/shell/unix_escalation.rs` to:
- create `Stopwatch`/cancellation in `codex-core`
- instantiate `EscalateServer` directly
- implement `EscalationPolicy` directly on `CoreShellActionProvider`
Net effect: same escalation flow with fewer wrappers and a smaller
public API.
## Verification
- Manually reviewed the old vs. new escalation call flow to confirm
timeout/cancellation behavior and approval policy decisions are
preserved while removing wrapper types.
Increase `IMAGE_BYTES_ESTIMATE` from 340 bytes to 7,373 bytes so the
existing 4-bytes/token heuristic yields an image estimate of ~1,844
tokens instead of ~85. This makes auto-compaction more conservative for
image-heavy transcripts and avoids underestimating context usage, which
can otherwise cause compaction to fail when there is not enough free
context remaining. The new value was chosen because that's the image
resolution cap used for our latest models.
Follow-up to [#12419](https://github.com/openai/codex/pull/12419).
Refs [#11845](https://github.com/openai/codex/issues/11845).
Fixes#12128
The docs indicates that `project_root_markers` are used to discover the
project root for local config as well as `AGENTS.md`. It looks like it
was never wired up to support the latter.
Summary
- resolve project docs by walking to the configured
`project_root_markers` (or defaults) instead of assuming the Git root,
while honoring CLI overrides and handling malformed configs
- fall back to the project’s canonical path chain and add a test that
makes sure custom markers upstream of `.git` are respected
- Add a hidden `realtime_conversation` feature flag and `/realtime`
slash command for start/stop live voice sessions.
- Reuse transcription composer/footer UI for live metering, stream mic
audio, play assistant audio, render realtime user text events, and
force-close on feature disable.
---------
Co-authored-by: Codex <noreply@openai.com>
Summary
- detect skill-invoking shell commands based on the original command
string, request approvals when needed, and cache positive decisions per
session
- keep implicit skill invocation emitted after approval and keep skill
approval decline messaging centralized to the shell handler
- expand and adjust skill approval tests to cover shell-based skill
scripts while matching the new detection expectations
Testing
- Not run (not requested)
## Problem
Diff lines used only foreground colors (green/red) with no background
tinting, making them hard to scan. The gutter (line numbers) also had no
theme awareness — dimmed text was fine on dark terminals but unreadable
on light ones.
## Mental model
Each diff line now has four styled layers: **gutter** (line number),
**sign** (`+`/`-`), **content** (text), and **line background** (full
terminal width). A `DiffTheme` enum (`Dark` / `Light`) is selected once
per render by probing the terminal's queried background via
`default_bg()`. A companion `DiffColorLevel` enum (`TrueColor` /
`Ansi256` / `Ansi16`) is derived from `stdout_color_level()` and gates
which palette is used. All style helpers dispatch on `(theme,
DiffLineType, color_level)` to pick the right colors.
| Theme Picker Wide | Theme Picker Narrow |
|---|---|
| <img width="1552" height="1012" alt="image"
src="https://github.com/user-attachments/assets/231b21b7-32d4-4727-80ed-7d01924954be"
/> | <img width="795" height="1012" alt="image"
src="https://github.com/user-attachments/assets/549cacdf-daec-43c9-ad64-2a28d16d140e"
/> |
| Dark BG - 16 colors | Dark BG - 256 colors | Dark BG - True Colors |
|---|---|---|
| <img width="1552" height="1012" alt="dark-16colors"
src="https://github.com/user-attachments/assets/fba36de3-c101-47d4-9e63-88cdd00410d0"
/> | <img width="1552" height="1012" alt="dark-256colors"
src="https://github.com/user-attachments/assets/f39e4307-c6b0-49c4-b4fe-bd26d3d8e41c"
/> | <img width="1552" height="1012" alt="dark-truecolor"
src="https://github.com/user-attachments/assets/1af4ec57-04bf-4dfb-8a44-0ab5e5aaaf18"
/> |
| Light BG - 16 colors | Light BG - 256 colors | Light BG - True Colors
|
|---|---|---|
| <img width="1552" height="1012" alt="light-16colors"
src="https://github.com/user-attachments/assets/2b5423d1-74b4-4b1e-8123-7c2488ff436b"
/> | <img width="1552" height="1012" alt="light-256colors"
src="https://github.com/user-attachments/assets/c94cff9a-8d3e-42c9-bbe7-079da39953a8"
/> | <img width="1552" height="1012" alt="light-truecolor"
src="https://github.com/user-attachments/assets/f73da626-725f-4452-99ee-69ef706df2c6"
/> |
## Non-goals
- No runtime theme switching beyond what `default_bg()` already
provides.
- No change to syntax highlighting theme selection or the highlight
module.
## Tradeoffs
- Three fixed palettes (truecolor RGB, 256-color indexed, 16-color
named) are maintained rather than using `best_color` nearest-match. This
is deliberate: `supports_color::on_cached(Stream::Stdout)` can misreport
capabilities once crossterm enters the alternate screen, so hand-picked
palette entries give better visual results than automatic quantization.
- Delete lines in the syntax-highlighted path get `Modifier::DIM` to
visually recede compared to insert lines. This trades some readability
of deleted code for scan-ability of additions.
- The theme picker's diff preview sets `preserve_side_content_bg: true`
on `ListSelectionView` so diff background tints survive into the side
panel. Other popups keep the default (`false`) to preserve their
reset-background look.
## Architecture
- **Color constants** are module-level `const` items grouped by palette
tier: `DARK_TC_*` / `LIGHT_TC_*` (truecolor RGB tuples), `DARK_256_*` /
`LIGHT_256_*` (xterm indexed), with named `Color` variants used for the
16-color tier.
- **`DiffTheme`** is a private enum; `diff_theme()` probes the terminal
and `diff_theme_for_bg()` is the testable pure-function version.
- **`DiffColorLevel`** is a private enum derived from `StdoutColorLevel`
via `diff_color_level()`.
- **Palette helpers** (`add_line_bg`, `del_line_bg`, `light_gutter_fg`,
`light_add_num_bg`, `light_del_num_bg`) each take `(DiffTheme,
DiffColorLevel)` or just `DiffColorLevel` and return a `Color`.
- **Style helpers** (`style_line_bg_for`, `style_gutter_for`,
`style_sign_add`, `style_sign_del`, `style_add`, `style_del`) each take
`(DiffLineType, DiffTheme, DiffColorLevel)` or `(DiffTheme,
DiffColorLevel)` and return a `Style`.
- **`push_wrapped_diff_line_inner_with_theme_and_color_level`** is the
innermost renderer, accepting both theme and color level so tests can
exercise any combination without depending on the terminal.
- **Line-level background** is applied via
`RtLine::from(...).style(line_bg)` so the tint extends across the full
terminal width, not just the text content.
- **Theme picker integration**: `ListSelectionView` gained a
`preserve_side_content_bg` flag. When `true`, the side panel skips
`force_bg_to_terminal_bg`, letting diff preview backgrounds render
faithfully.
## Observability
No new logging. Theme selection is deterministic from `default_bg()`,
which is already queried and cached at TUI startup.
## Tests
1. **`DiffTheme` is determined per `render_change` call** — if
`default_bg()` changes mid-render (e.g. `requery_default_colors()`
fires), different file chunks could render with different themes. Low
risk in practice since re-query only happens on explicit user action.
2. **16-color tier uses named `Color` variants** (`Color::Green`,
`Color::Red`, etc.) which the terminal maps to its own palette. On
unusual terminal themes these could clash with the background.
Acceptable since 16-color terminals already have unpredictable color
rendering.
3. **Light-theme `style_add` / `style_del` set bg but no fg** — on light
terminals, non-syntax-highlighted content uses the terminal's default
foreground against a pastel background. If the terminal's default fg
happens to be very light, contrast could suffer. This is an edge case
since light-terminal users typically have dark default fg.
4. **`preserve_side_content_bg` is a general-purpose flag but only used
by the theme picker** — if other popups start using side content with
intentional backgrounds they'll need to opt in explicitly. Not a real
risk today, just a note for future callers.
#### What
Try matching `\w+`-namespaced model after `longest prefix` as heuristic
to match `ModelInfo` from list of candidates.
This shouldn't regress existing behavior:
- `gpt-5.2-codex` -> `gpt-5.2` if `gpt-5.2-codex` not present
- `gpt-5.3` -> `gpt-5` if `gpt-5.3` not present
- `gpt-9` still doesn't match anything
while being more forgiving for custom prefixes:
- `oai/gpt-5.3-codex` -> `gpt-5.3-codex`
#### Tests
Added unit test.
Fixes#12175
If a user types an npm package name with multiple `@` symbols like `npx
-y @foo/bar@latest`, the TUI currently treats this as though it's
attempting to invoke the file picker.
### What changed
- **Generalized `@` token parsing**
- `current_prefixed_token(...)` now treats `@` as a token start **only
at a whitespace boundary** (or start-of-line).
- If the cursor is on a nested `@` inside an existing
whitespace-delimited token (for example `@scope/pkg@latest`), it keeps
the surrounding token active instead of starting a new token at the
second `@`.
- It also avoids misclassifying mid-word usages like `foo@bar` as an `@`
file token.
- **Enter behavior with file popup**
- If the file-search popup is open but has **no selected match**,
pressing `Enter` now closes the popup and falls through to normal submit
behavior.
- This prevents pasted strings containing `@...` from blocking
submission just because file-search was active with no actionable
selection.
### Testing
I manually built and tested the scenarios involved with the bug report
and related use of `@` mentions to verify no regressions
## Why
This PR switches the `shell_command` zsh-fork path over to
`codex-shell-escalation` so the new shell tool can use the shared
exec-wrapper/escalation protocol instead of the `zsh_exec_bridge`
implementation that was introduced in
https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on
UNIX domain sockets, which is not as tamper-proof as the FD-based
approach in `codex-shell-escalation`.
## What Changed
- Added a Unix zsh-fork runtime adapter in `core`
(`core/src/tools/runtimes/shell/unix_escalation.rs`) that:
- runs zsh-fork commands through
`codex_shell_escalation::run_escalate_server`
- bridges exec-policy / approval decisions into `ShellActionProvider`
- executes escalated commands via a `ShellCommandExecutor` that calls
`process_exec_tool_call`
- Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to
select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving
the generic `shell` tool path unchanged.
- Removed the `zsh_exec_bridge`-based session service and deleted
`core/src/zsh_exec_bridge/mod.rs`.
- Moved exec-wrapper entrypoint dispatch to `arg0` by handling the
`codex-execve-wrapper` arg0 alias there, and removed the old
`codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and
`app-server` mains.
- Added the needed `codex-shell-escalation` dependencies for `core` and
`arg0`.
## Tests
- `cargo test -p codex-core
shell_zsh_fork_prefers_shell_command_over_unified_exec`
- `cargo test -p codex-app-server turn_start_shell_zsh_fork --
--nocapture`
- verifies zsh-fork command execution and approval flows through the new
backend
- includes subcommand approve/decline coverage using the shared zsh
DotSlash fixture in `app-server/tests/suite/zsh`
- To test manually, I added the following to `~/.codex/config.toml`:
```toml
zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh"
[features]
shell_zsh_fork = true
```
Then I ran `just c` to run the dev build of Codex with these changes and
sent it the message:
```
run `echo $0`
```
And it replied with:
```
echo $0 printed:
/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh
In this tool context, $0 reflects the script path used to invoke the shell, not just zsh.
```
so the tool appears to be wired up correctly.
## Notes
- The zsh subcommand-decline integration test now uses `rm` under a
`WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is
auto-allowed by the new `shell-escalation` policy path, which no longer
produces subcommand approval prompts.
## Summary
Introduces the initial implementation of Feature::RequestPermissions.
RequestPermissions allows the model to request that a command be run
inside the sandbox, with additional permissions, like writing to a
specific folder. Eventually this will include other rules as well, and
the ability to persist these permissions, but this PR is already quite
large - let's get the core flow working and go from there!
<img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM"
src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368"
/>
## Testing
- [x] Added tests
- [x] Tested locally
- [x] Feature
Send a request with `generate: falls` but a full set of tools and
instructions to pre-warm inference.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- tighten the memory-use decision boundary so agents skip memory only
for clearly self-contained asks
- make the quick memory pass more explicit and bounded (including a
lightweight search budget)
- add structured `<memory_citation>` requirements and examples for final
replies
- clarify memory update guidance and end-state wording for memory lookup
## Why
The previous template was directionally correct, but still left room for
inconsistent memory lookup behavior and citation formatting. This change
makes the default behavior, quick-pass scope, and citation output
contract much more explicit.
## Testing
- not run (prompt/template text change only)
Co-authored-by: jif-oai <jif@openai.com>
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.
update logic to rely on bundled models instead, update tests.
Fixes#12216
Fixes a panic in `AbsolutePathBuf::parent()` when the process hits file
descriptor exhaustion (`EMFILE` / "Too many open files").
### Root cause
`AbsolutePathBuf::parent()` was re-validating the parent path via
`from_absolute_path(...).expect(...)`.
`from_absolute_path()` calls `path_absolutize::absolutize()`, which can
depend on `std::env::current_dir()`. Under `EMFILE`, that can fail,
causing `parent()` to panic even though the parent of an absolute path
is already known.
### Change
- Stop re-absolutizing the result of `self.0.parent()`
- Construct `AbsolutePathBuf` directly from the known parent path
- Keep an invariant check with `debug_assert!(p.is_absolute())`
### Why this is safe
`self` is already an `AbsolutePathBuf`, so `self.0` is
absolute/normalized. The parent of an absolute path is expected to be
absolute, so re-running fallible normalization here is unnecessary and
can introduce unrelated panics.
- use `skills_for_cwd` lookup to scope allowed skills and build
invocation context for downstream processing
- add detection in `stream_events_utils` to classify tool calls as
implicit skill invocations per the proposal (script runners, extensions,
`scripts` dirs, and SKILL.md reads)
- deduplicate invocations per turn and emit analytics/OTEL events on the
same background queue as explicit invokes
## Summary
The test exec_resume_last_respects_cwd_filter_and_all_flag makes one
session “newest” by resuming it, but rollout updated_at is stored/sorted
at second precision. On fast CI (especially Windows), the touch could
land in the same second as initial session creation, making ordering
nondeterministic.
This change adds a short sleep before the recency-touch step so the
resumed session is guaranteed to have a later updated_at, preserving the
intended assertion without changing product behavior.
## Summary
Persist network approval allow/deny decisions as `network_rule(...)`
entries in execpolicy (not proxy config)
It adds `network_rule` parsing + append support in `codex-execpolicy`,
including `decision="prompt"` (parse-only; not compiled into proxy
allow/deny lists)
- compile execpolicy network rules into proxy allow/deny lists and
update the live proxy state on approval
- preserve requirements execpolicy `network_rule(...)` entries when
merging with file-based execpolicy
- reject broad wildcard hosts (for example `*`) for persisted
`network_rule(...)`