Commit graph

4710 commits

Author SHA1 Message Date
Ahmed Ibrahim
10bf6008f4
Stabilize thread resume replay tests (#13885)
## What changed
- The thread-resume replay tests now use unchecked mock sequencing so
the replay flow can complete before the test asserts.
- They also poll outbound `/responses` request counts and fail
immediately if replay emits an unexpected extra request.

## Why this fixes the flake
- The previous version asserted while the replay machinery was still
mid-flight, so the test was sometimes checking an intermediate state
instead of the completed behavior.
- Strict mock sequencing made that problem worse by forcing the test to
care about exact sub-step timing rather than about the end result.
- Letting replay settle and then asserting on stabilized request counts
makes the test validate the real contract: the replay path finishes and
does not send extra model requests.

## Scope
- Test-only change.
2026-03-09 10:41:23 -07:00
Ahmed Ibrahim
0dc242a672
Order websocket initialize after handshake (#13943)
## What changed
- `app-server` now sends initialize notifications to the specific
websocket connection before that connection is marked outbound-ready.
- `message_processor` now exposes the forwarding hook needed to target
that initialize delivery path.

## Why this fixes the flake
- This was a real websocket ordering bug.
- The old code allowed “connection is ready for outbound broadcasts” to
become true before the initialize notification had been routed to the
intended client.
- On CI this showed up as a race where tests would occasionally miss or
misorder initialize delivery depending on scheduler timing.
- Sending initialize to the exact connection first, then exposing it to
the general outbound path, removes that race instead of hiding it with
timing slack.

## Scope
- Production logic change.
2026-03-09 10:27:19 -07:00
Ahmed Ibrahim
6b68d1ef66
Stabilize plan item app-server tests (#14058)
## What changed
- run the two plan-mode app-server tests on a multi-thread Tokio runtime
instead of the default single-thread test runtime
- stop relying on wiremock teardown expectations for `/responses` and
explicitly wait for the expected request count after the turn completes

## Why this fixes the flake
- this failure was showing up on Windows ARM as a late wiremock panic
saying the mock server saw zero `/responses` calls, but the real issue
was that the test could stall around app-server startup and only fail
during teardown
- moving these tests to the same multi-thread runtime used by the other
collaboration-mode app-server tests removes that startup scheduling race
- asserting the `/responses` count directly makes the test
deterministic: we now wait for the real POST instead of depending on a
drop-time verification that can hide the underlying timing issue

## Scope
- test-only change; no production logic changes
2026-03-09 10:24:18 -07:00
Ahmed Ibrahim
5d9db0f995
Stabilize PTY Python REPL test (#13883)
## What changed
- The PTY Python REPL test now starts Python with a startup marker
already embedded in argv.
- The test waits for that marker in PTY output before making assertions.

## Why this fixes the flake
- The old version tried to probe the live REPL almost immediately after
spawn.
- That races PTY initialization, Python startup, and prompt buffering,
all of which vary across platforms and CI load.
- By having the child process emit a known marker as part of its own
startup path, the test gets a deterministic synchronization point that
comes from the process under test rather than from guessed timing.

## Scope
- Test-only change.
2026-03-09 10:08:36 -07:00
Ahmed Ibrahim
6052558a01
Stabilize RMCP pid file cleanup test (#13881)
## What changed
- The pid-file cleanup test now keeps polling when the pid file exists
but is still empty.
- Assertions only proceed once the wrapper has actually written the
child pid.

## Why this fixes the flake
- File creation and pid writing are not atomic as one logical action
from the test’s point of view.
- The previous test sometimes won the race and read the file in the tiny
window after creation but before the pid bytes were flushed.
- Treating “empty file” as “not ready yet” synchronizes the test on the
real event we need: the wrapper has finished publishing the child pid.

## Scope
- Test-only change.
2026-03-09 10:01:34 -07:00
Ahmed Ibrahim
615ed0e437
Stabilize zsh fork app-server tests (#13872)
## What changed
- `turn_start_shell_zsh_fork_executes_command_v2` now keeps the shell
command alive with a file marker until the interrupt arrives instead of
using a command that can finish too quickly.
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
now waits for `turn/completed` before sending a fallback interrupt and
accepts the real terminal outcomes observed across platforms.

## Why this fixes the flake
- The original tests assumed a narrow ordering window: the child command
would still be running when the interrupt happened, and completion would
always arrive in one specific order.
- In CI, especially across different shells and runner speeds, those
assumptions break. Sometimes the child finishes before the interrupt;
sometimes the protocol completes while the fallback path is still arming
itself.
- Holding the command open until the interrupt and waiting for the
explicit protocol completion event makes the tests synchronize on the
behavior under test instead of on wall-clock timing.

## Scope
- Test-only change.
2026-03-09 09:38:16 -07:00
Ahmed Ibrahim
3f1280ce1c
Reduce app-server test timeout pressure (#13884)
## What changed
- The auth/account/fuzzy-file-search test configs disable unrelated
`shell_snapshot` setup.
- The fuzzy-file-search fixture set was reduced so the stop-updates test
does less incidental work before reaching the assertion.

## Why this fixes the flake
- These failures were caused by cumulative timeout pressure, not by a
missing product-level delay.
- The old tests were paying for shell snapshot initialization and extra
fixture volume that were not part of the behavior being validated.
- Removing that incidental work keeps the same coverage but shortens the
critical path enough that the tests finish comfortably inside the
existing timeout budget, which is the right fix versus simply extending
the timeout.

## Scope
- Test-only change.
2026-03-09 09:37:41 -07:00
Charley Cunningham
f23fcd6ced
guardian initial feedback / tweaks (#13897)
## Summary
- remove the remaining model-visible guardian-specific `on-request`
prompt additions so enabling the feature does not change the main
approval-policy instructions
- neutralize user-facing guardian wording to talk about automatic
approval review / approval requests rather than a second reviewer or
only sandbox escalations
- tighten guardian retry-context handling so agent-authored
`justification` stays in the structured action JSON and is not also
injected as raw retry context
- simplify guardian review plumbing in core by deleting dead
prompt-append paths and trimming some request/transcript setup code

## Notable Changes
- delete the dead `permissions/approval_policy/guardian.md` append path
and stop threading `guardian_approval_enabled` through model-facing
developer-instruction builders
- rename the experimental feature copy to `Automatic approval review`
and update the `/experimental` snapshot text accordingly
- make approval-review status strings generic across shell, patch,
network, and MCP review types
- forward real sandbox/network retry reasons for shell and unified-exec
guardian review, but do not pass agent-authored justification as raw
retry context
- simplify `guardian.rs` by removing the one-field request wrapper,
deduping reasoning-effort selection, and cleaning up transcript entry
collection

## Testing
- `just fmt`
- full validation left to CI

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-09 09:25:24 -07:00
Ahmed Ibrahim
2bc3e52a91
Stabilize app list update ordering test (#14052)
## Summary
- make
`list_apps_waits_for_accessible_data_before_emitting_directory_updates`
accept the two valid notification paths the server can emit
- keep rejecting the real bug this test is meant to catch: a
directory-only `app/list/updated` notification before accessible app
data is available

## Why this fixes the flake
The old test used a fixed `150ms` silence window and assumed the first
notification after that window had to be the fully merged final update.
In CI, scheduling occasionally lets accessible app data arrive before
directory data, so the first valid notification can be an
accessible-only interim update. That made the test fail even though the
server behavior was correct.

This change makes the test deterministic by reading notifications until
the final merged payload arrives. Any interim update is only accepted if
it contains accessible apps only; if the server ever emits inaccessible
directory data before accessible data is ready, the test still fails
immediately.

## Change type
- test-only; no production app-list logic changes
2026-03-09 00:16:13 -07:00
Dylan Hurd
06f82c123c
feat(tui) render request_permissions calls (#14004)
## Summary
Adds support for tui rendering of request_permission calls

<img width="724" height="245" alt="Screenshot 2026-03-08 at 9 04 07 PM"
src="https://github.com/user-attachments/assets/e1997825-a496-4bfb-bbda-43d0006460a5"
/>


## Testing
- [x] Added snapshot test
2026-03-09 04:24:04 +00:00
Dylan Hurd
05332b0e96
fix(bazel) add missing app-server-client BUILD.bazel (#14027)
## Summary
Adds missing BUILD.bazel file for the new app-server-client crate

## Testing
- [x] 🤞 that this gets bazel ci to pass
2026-03-09 03:42:54 +00:00
Jack Mousseau
e6b93841c5
Add request permissions tool (#13092)
Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.

The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.
2026-03-08 20:23:06 -07:00
Charley Cunningham
4ad3b59de3
tui: clarify pending steer follow-ups (#13841)
## Summary
- split the pending input preview into labeled pending-steer and queued
follow-up sections
- explain that pending steers submit after the next tool call and that
Esc can interrupt and send them immediately
- treat Esc as an interrupt-plus-resubmit path when pending steers
exist, with updated TUI snapshots and tests

Queues and steers:
<img width="1038" height="263" alt="Screenshot 2026-03-07 at 10 17
17 PM"
src="https://github.com/user-attachments/assets/4ef433ef-27a3-4b7c-ad69-2046f6eb89e6"
/>

After pressing Esc:
<img width="1046" height="320" alt="Screenshot 2026-03-07 at 10 17
21 PM"
src="https://github.com/user-attachments/assets/0f4d89e0-b6b9-486a-9f04-b6021f169ba7"
/>

## Codex author
`codex resume 019cc6f4-2cca-7803-b717-8264526dbd97`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-08 20:13:21 -07:00
Dylan Hurd
f41b1638c9
fix(core) patch otel test (#14014)
## Summary
This test was missing the turn completion event in the responses stream,
so it was hanging. This PR fixes the issue

## Testing
- [x] This does update the test
2026-03-08 19:06:30 -07:00
Celia Chen
340f9c9ecb
app-server: include experimental skill metadata in exec approval requests (#13929)
## Summary

This change surfaces skill metadata on command approval requests so
app-server clients can tell when an approval came from a skill script
and identify the originating `SKILL.md`.

- add `skill_metadata` to exec approval events in the shared protocol
- thread skill metadata through core shell escalation and delegated
approval handling for skill-triggered approvals
- expose the field in app-server v2 as experimental `skillMetadata`
- regenerate the JSON/TypeScript schemas and cover the new field in
protocol, transport, core, and TUI tests

## Why

Skill-triggered approvals already carry skill context inside core, but
app-server clients could not see which skill caused the prompt. Sending
the skill metadata with the approval request makes it possible for
clients to present better approval UX and connect the prompt back to the
relevant skill definition.


## example event in app-server-v2
verified that we see this event when experimental api is on:
```
< {
<   "id": 11,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "additionalPermissions": {
<       "fileSystem": null,
<       "macos": {
<         "accessibility": false,
<         "automations": {
<           "bundle_ids": [
<             "com.apple.Notes"
<           ]
<         },
<         "calendar": false,
<         "preferences": "read_only"
<       },
<       "network": null
<     },
<     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
<     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
<     "skillMetadata": {
<       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
<     },
<     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
<     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
<   }
< }`
```

& verified that this is the event when experimental api is off:
```
< {
<   "id": 13,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Users/celia/code/codex/codex-rs",
<     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
<     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
<     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
<   }
< }
```
2026-03-08 18:07:46 -07:00
Eric Traut
da3689f0ef
Add in-process app server and wire up exec to use it (#14005)
This is a subset of PR #13636. See that PR for a full overview of the
architectural change.

This PR implements the in-process app server and modifies the
non-interactive "exec" entry point to use the app server.

---------

Co-authored-by: Felipe Coury <felipe.coury@gmail.com>
2026-03-08 18:43:55 -06:00
Matthew Zeng
a684a36091
[app-server] Support hot-reload user config when batch writing config. (#13839)
- [x] Support hot-reload user config when batch writing config.
2026-03-08 17:38:01 -07:00
Ahmed Ibrahim
1f150eda8b
Stabilize shell serialization tests (#13877)
## What changed
- The duration-recording fixture sleep was reduced from a large
artificial delay to `0.2s`, and the assertion floor was lowered to
`0.1s`.
- The shell tool fixtures now force `login = false` so they do not
invoke login-shell startup paths.

## Why this fixes the flake
- The old tests were paying for two kinds of noise that had nothing to
do with the feature being validated: oversized sleep time and variable
shell initialization cost.
- Login shells can pick up runner-specific startup files and incur
inconsistent startup latency.
- The test only needs to prove that we record a nontrivial duration and
preserve shell output. A shorter fixture delay plus a non-login shell
keeps that coverage while removing runner-dependent wall-clock variance.

## Scope
- Test-only change.
2026-03-08 13:37:41 -07:00
Charley Cunningham
7ba1fccfc1
fix(ci): restore guardian coverage and bazel unit tests (#13912)
## Summary
- restore the guardian review request snapshot test and its tracked
snapshot after it was dropped from `main`
- make Bazel Rust unit-test wrappers resolve runfiles correctly on
manifest-only platforms like macOS and point Insta at the real workspace
root
- harden the shell-escalation socket-closure assertion so the musl Bazel
test no longer depends on fd reuse behavior

## Verification
- cargo test -p codex-core
guardian_review_request_layout_matches_model_visible_request_snapshot
- cargo test -p codex-shell-escalation
- bazel test //codex-rs/exec:exec-unit-tests
//codex-rs/shell-escalation:shell-escalation-unit-tests

Supersedes #13894.

---------

Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com>
Co-authored-by: viyatb-oai <viyatb@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-08 12:05:19 -07:00
Eric Traut
a30edb6c17
Fix inverted Windows PTY TerminateProcess handling (#13989)
Addresses #13945

The vendored WezTerm ConPTY backend in
`codex-rs/utils/pty/src/win/mod.rs` treated `TerminateProcess` return
values backwards: nonzero success was handled as failure, and `0`
failure was handled as success.

This is likely causing a number of bugs reported against Codex running
on Windows native where processes are not cleaned up.
2026-03-08 11:52:16 -06:00
Michael Bolin
dcc4d7b634
linux-sandbox: honor split filesystem policies in bwrap (#13453)
## Why

After `#13449`, the Linux helper could receive split filesystem and
network policies, but the bubblewrap mount builder still reconstructed
filesystem access from the legacy `SandboxPolicy`.

That loses explicit unreadable carveouts under writable roots, and it
also mishandles `Root` read access paired with explicit deny carveouts.
In those cases bubblewrap could still expose paths that the split
filesystem policy intentionally blocked.

## What changed

- switched bubblewrap mount generation to consume
`FileSystemSandboxPolicy` directly at the implementation boundary;
legacy `SandboxPolicy` configs still flow through the existing
`FileSystemSandboxPolicy::from(&sandbox_policy)` bridge before reaching
bwrap
- kept the Linux helper and preflight path on the split filesystem
policy all the way into bwrap
- re-applied explicit unreadable carveouts after readable and writable
mounts so blocked subpaths still win under bubblewrap
- masked denied directories with `--tmpfs` plus `--remount-ro` and
denied files with `--ro-bind-data`, preserving the backing fd until exec
- added comments in the unreadable-root masking block to explain why the
mount order and directory/file split are intentional
- updated Linux helper call sites and tests for the split-policy bwrap
path

## Verification

- added protocol coverage for root carveouts staying scoped
- added core coverage that root-write plus deny carveouts still requires
a platform sandbox
- added bwrap unit coverage for reapplying blocked carveouts after
writable binds
- added Linux integration coverage for explicit split-policy carveouts
under bubblewrap
- validated the final branch state with `cargo test -p
codex-linux-sandbox`, `cargo clippy -p codex-linux-sandbox --all-targets
-- -D warnings`, and the PR CI reruns

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13453).
* __->__ #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 23:46:52 -08:00
Ahmed Ibrahim
dc19e78962
Stabilize abort task follow-up handling (#13874)
- production logic plus tests; cancel running tasks before clearing
pending turn state
- suppress follow-up model requests after cancellation and assert on
stabilized request counts instead of fixed sleeps
2026-03-07 22:56:00 -08:00
Michael Bolin
3b5fe5ca35
protocol: keep root carveouts sandboxed (#13452)
## Why

A restricted filesystem policy that grants `:root` read or write access
but also carries explicit deny entries should still behave like scoped
access with carveouts, not like unrestricted disk access.

Without that distinction, later platform backends cannot preserve
blocked subpaths under root-level permissions because the protocol layer
reports the policy as fully unrestricted.

## What changed

- taught `FileSystemSandboxPolicy` to treat root access plus explicit
deny entries as scoped access rather than full-disk access
- derived readable and writable roots from the filesystem root when root
access is combined with carveouts, while preserving the denied paths as
read-only subpaths
- added protocol coverage for root-write policies with carveouts and a
core sandboxing regression so those policies still require platform
sandboxing

## Verification

- added protocol coverage in `protocol/src/permissions.rs` and
`protocol/src/protocol.rs` for root access with explicit carveouts
- added platform-sandbox regression coverage in
`core/src/sandboxing/mod.rs`
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13452).
* #13453
* __->__ #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 21:15:47 -08:00
Michael Bolin
46b8d127cf
sandboxing: preserve denied paths when widening permissions (#13451)
## Why

After the split-policy plumbing landed, additional-permissions widening
still rebuilt filesystem access through the legacy projection in a few
places.

That can erase explicit deny entries and make the runtime treat a policy
as fully writable even when it still has blocked subpaths, which in turn
can skip the platform sandbox when it is still needed.

## What changed

- preserved explicit deny entries when merging additional read and write
permissions into `FileSystemSandboxPolicy`
- switched platform-sandbox selection to rely on
`FileSystemSandboxPolicy::has_full_disk_write_access()` instead of ad
hoc root-write checks
- kept the widened policy path in `core/src/exec.rs` and
`core/src/sandboxing/mod.rs` aligned so denied subpaths survive both
policy merging and sandbox selection
- added regression coverage for root-write policies that still carry
carveouts

## Verification

- added regression coverage in `core/src/sandboxing/mod.rs` showing that
root write plus carveouts still requires the platform sandbox
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13451).
* #13453
* #13452
* __->__ #13451
* #13449
* #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-08 04:29:35 +00:00
Michael Bolin
07a30da3fb
linux-sandbox: plumb split sandbox policies through helper (#13449)
## Why

The Linux sandbox helper still only accepted the legacy `SandboxPolicy`
payload.

That meant the runtime could compute split filesystem and network
policies, but the helper would immediately collapse them back to the
compatibility projection before applying seccomp or staging the
bubblewrap inner command.

## What changed

- added hidden `--file-system-sandbox-policy` and
`--network-sandbox-policy` flags alongside the legacy `--sandbox-policy`
flag so the helper can migrate incrementally
- updated the core-side Landlock wrapper to pass the split policies
explicitly when launching `codex-linux-sandbox`
- added helper-side resolution logic that accepts either the legacy
policy alone or a complete split-policy pair and normalizes that into
one effective configuration
- switched Linux helper network decisions to use `NetworkSandboxPolicy`
directly
- added `FromStr` support for the split policy types so the helper can
parse them from CLI JSON

## Verification

- added helper coverage in `linux-sandbox/src/linux_run_main_tests.rs`
for split-policy flags and policy resolution
- added CLI argument coverage in `core/src/landlock.rs`
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13449).
* #13453
* #13452
* #13451
* __->__ #13449
* #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 19:40:10 -08:00
Matthew Zeng
a4a9536fd7
[elicitations] Support always allow option for mcp tool calls. (#13807)
- [x] Support always allow option for mcp tool calls, writes to
config.toml.
- [x] Fix config hot-reload after starting a new thread for TUI.
2026-03-08 01:46:40 +00:00
sayan-oai
590cfa6176
chore: use @plugin instead of $plugin for plaintext mentions (#13921)
change plaintext plugin-mentions from `$plugin` to `@plugin`, ensure TUI
can correctly decode these from history.

tested locally, added/updated tests.
2026-03-08 01:36:39 +00:00
Michael Bolin
bf5c2f48a5
seatbelt: honor split filesystem sandbox policies (#13448)
## Why

After `#13440` and `#13445`, macOS Seatbelt policy generation was still
deriving filesystem and network behavior from the legacy `SandboxPolicy`
projection.

That projection loses explicit unreadable carveouts and conflates split
network decisions, so the generated Seatbelt policy could still be wider
than the split policy that Codex had already computed.

## What changed

- added Seatbelt entrypoints that accept `FileSystemSandboxPolicy` and
`NetworkSandboxPolicy` directly
- built read and write policy stanzas from access roots plus excluded
subpaths so explicit unreadable carveouts survive into the generated
Seatbelt policy
- switched network policy generation to consult `NetworkSandboxPolicy`
directly
- failed closed when managed-network or proxy-constrained sessions do
not yield usable loopback proxy endpoints
- updated the macOS callers and test helpers that now need to carry the
split policies explicitly

## Verification

- added regression coverage in `core/src/seatbelt.rs` for unreadable
carveouts under both full-disk and scoped-readable policies
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13448).
* #13453
* #13452
* #13451
* #13449
* __->__ #13448
* #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-08 00:35:19 +00:00
Eric Traut
e8d7ede83c
Fix TUI context window display before first TokenCount (#13896)
The TUI was showing the raw configured `model_context_window` until the
first
`TokenCount` event arrived, even though core had already emitted the
effective
runtime window on `TurnStarted`. This made the footer, status-line
context
window, and `/status` output briefly inconsistent for models/configs
where the
effective window differs from the configured value, such as the
`gpt-5.4`
1,000,000-token override reported in #13623.

Update the TUI to cache `TurnStarted.model_context_window` immediately
so
pre-token-count displays use the runtime effective window, and add
regression
coverage for the startup path.

---------

Co-authored-by: Charles Cunningham <ccunningham@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-07 17:01:47 -07:00
Dylan Hurd
92f7541624
fix(ci) fix guardian ci (#13911)
## Summary
#13910 was merged with some unused imports, let's fix this

## Testing
- [x] Let's make sure CI is green

---------

Co-authored-by: Charles Cunningham <ccunningham@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-07 23:34:56 +00:00
Dylan Hurd
1c888709b5
fix(core) rm guardian snapshot test (#13910)
## Summary
This test is good, but flakey and we have to figure out some bazel build
issues. Let's get CI back go green and then land a stable version!

## Test Summary
- [x] CI Passes
2026-03-07 14:28:54 -08:00
jif-oai
b9a2e40001
tmp: drop artifact skills (#13851) 2026-03-07 18:04:05 +01:00
Charley Cunningham
e84ee33cc0
Add guardian approval MVP (#13692)
## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface

## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR

## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`

## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core

## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks

## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-07 05:40:10 -08:00
jif-oai
cf143bf71e
feat: simplify DB further (#13771) 2026-03-07 03:48:36 -08:00
Michael Bolin
5ceff6588e
safety: honor filesystem policy carveouts in apply_patch (#13445)
## Why

`apply_patch` safety approval was still checking writable paths through
the legacy `SandboxPolicy` projection.

That can hide explicit `none` carveouts when a split filesystem policy
projects back to compatibility `ExternalSandbox`, which leaves one more
approval path that can auto-approve writes inside paths that are
intentionally blocked.

## What changed

- passed `turn.file_system_sandbox_policy` into `assess_patch_safety`
- changed writable-path checks to derive effective access from
`FileSystemSandboxPolicy` instead of the legacy `SandboxPolicy`
- made those checks reject explicit unreadable roots before considering
broad write access or writable roots
- added regression coverage showing that an `ExternalSandbox`
compatibility projection still asks for approval when the split
filesystem policy blocks a subpath

## Verification

- `cargo test -p codex-core safety::tests::`
- `cargo test -p codex-core test_sandbox_config_parsing`
- `cargo clippy -p codex-core --all-targets -- -D warnings`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13445).
* #13453
* #13452
* #13451
* #13449
* #13448
* __->__ #13445
* #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 08:01:08 +00:00
Eric Traut
8df4d9b3b2
Add Fast mode status-line indicator (#13670)
Addresses feature request #13660

Adds new option to `/statusline` so the status line can display "fast
on" or "fast off"

Summary
- introduce a `FastMode` status-line item so `/statusline` can render
explicit `Fast on`/`Fast off` text for the service tier
- wire the item into the picker metadata and resolve its string from
`ChatWidget` without adding any unrelated `thread-name` logic or storage
changes
- ensure the refresh paths keep the cached footer in sync when the
service tier (fast mode) changes

Testing
- Manually tested

Here's what it looks like when enabled:

<img width="366" height="75" alt="image"
src="https://github.com/user-attachments/assets/7f992d2b-6dab-49ed-aa43-ad496f56f193"
/>
2026-03-07 00:42:08 -07:00
iceweasel-oai
4b4f61d379
app-server: require absolute cwd for windowsSandbox/setupStart (#13833)
## Summary
- require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
- reject relative cwd values at request parsing instead of normalizing
them later in the setup flow
- add RPC-layer coverage for relative cwd rejection and update the
checked-in protocol schemas/docs

## Why
windowsSandbox/setupStart was carrying the client-provided cwd as a raw
PathBuf for command_cwd while config derivation normalized the same
value into an absolute policy_cwd.

That left room for relative-path ambiguity in the setup path, especially
for inputs like cwd: "repo". Making the RPC accept only absolute paths
removes that split entirely: the handler now receives one
already-validated absolute path and uses it for both config derivation
and setup.

This keeps the trust model unchanged. Trusted clients could already
choose the session cwd; this change is only about making the setup RPC
reject relative paths so command_cwd and policy_cwd cannot diverge.

## Testing
- cargo test -p codex-app-server windows_sandbox_setup (run locally by
user)
- cargo test -p codex-app-server-protocol windows_sandbox (run locally
by user)
2026-03-06 22:47:08 -08:00
Celia Chen
b0ce16c47a
fix(core): respect reject policy by approval source for skill scripts (#13816)
## Summary
- distinguish reject-policy handling for prefix-rule approvals versus
sandbox approvals in Unix shell escalation
- keep prompting for skill-script execution when `rules=true` but
`sandbox_approval=false`, instead of denying the command up front
- add regression coverage for both skill-script reject-policy paths in
`codex-rs/core/tests/suite/skill_approval.rs`
2026-03-06 21:43:14 -08:00
Michael Bolin
b52c18e414
protocol: derive effective file access from filesystem policies (#13440)
## Why

`#13434` and `#13439` introduce split filesystem and network policies,
but the only code that could answer basic filesystem questions like "is
access effectively unrestricted?" or "which roots are readable and
writable for this cwd?" still lived on the legacy `SandboxPolicy` path.

That would force later backends to either keep projecting through
`SandboxPolicy` or duplicate path-resolution logic. This PR moves those
queries onto `FileSystemSandboxPolicy` itself so later runtime and
platform changes can consume the split policy directly.

## What changed

- added `FileSystemSandboxPolicy` helpers for full-read/full-write
checks, platform-default reads, readable roots, writable roots, and
explicit unreadable roots resolved against a cwd
- added a shared helper for the default read-only carveouts under
writable roots so the legacy and split-policy paths stay aligned
- added protocol coverage for full-access detection and derived
readable, writable, and unreadable roots

## Verification

- added protocol coverage in `protocol/src/protocol.rs` and
`protocol/src/permissions.rs` for full-root access and derived
filesystem roots
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13440).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* __->__ #13440
* #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 03:49:29 +00:00
Michael Bolin
22ac6b9aaa
sandboxing: plumb split sandbox policies through runtime (#13439)
## Why

`#13434` introduces split `FileSystemSandboxPolicy` and
`NetworkSandboxPolicy`, but the runtime still made most execution-time
sandbox decisions from the legacy `SandboxPolicy` projection.

That projection loses information about combinations like unrestricted
filesystem access with restricted network access. In practice, that
means the runtime can choose the wrong platform sandbox behavior or set
the wrong network-restriction environment for a command even when config
has already separated those concerns.

This PR carries the split policies through the runtime so sandbox
selection, process spawning, and exec handling can consult the policy
that actually matters.

## What changed

- threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
`TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
unified exec, and app-server exec overrides
- updated sandbox selection in `core/src/sandboxing/mod.rs` and
`core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
`NetworkSandboxPolicy`, rather than inferring behavior only from the
legacy `SandboxPolicy`
- updated process spawning in `core/src/spawn.rs` and the platform
wrappers to use `NetworkSandboxPolicy` when deciding whether to set
`CODEX_SANDBOX_NETWORK_DISABLED`
- kept additional-permissions handling and legacy `ExternalSandbox`
compatibility projections aligned with the split policies, including
explicit user-shell execution and Windows restricted-token routing
- updated callers across `core`, `app-server`, and `linux-sandbox` to
pass the split policies explicitly

## Verification

- added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
verify `RunUserShellCommand` does not inherit
`CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
- added coverage in `core/src/exec.rs` for Windows restricted-token
sandbox selection when the legacy projection is `ExternalSandbox`
- updated Linux sandbox coverage in
`linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
exec path
- verified the current PR state with `just clippy`




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* __->__ #13439

---------

Co-authored-by: viyatb-oai <viyatb@openai.com>
2026-03-07 02:30:21 +00:00
viyatb-oai
25fa974166
fix: support managed network allowlist controls (#12752)
## Summary
- treat `requirements.toml` `allowed_domains` and `denied_domains` as
managed network baselines for the proxy
- in restricted modes by default, build the effective runtime policy
from the managed baseline plus user-configured allowlist and denylist
entries, so common hosts can be pre-approved without blocking later user
expansion
- add `experimental_network.managed_allowed_domains_only = true` to pin
the effective allowlist to managed entries, ignore user allowlist
additions, and hard-deny non-managed domains without prompting
- apply `managed_allowed_domains_only` anywhere managed network
enforcement is active, including full access, while continuing to
respect denied domains from all sources
- add regression coverage for merged-baseline behavior, managed-only
behavior, and full-access managed-only enforcement

## Behavior
Assuming `requirements.toml` defines both
`experimental_network.allowed_domains` and
`experimental_network.denied_domains`.

### Default mode
- By default, the effective allowlist is
`experimental_network.allowed_domains` plus user or persisted allowlist
additions.
- By default, the effective denylist is
`experimental_network.denied_domains` plus user or persisted denylist
additions.
- Allowlist misses can go through the network approval flow.
- Explicit denylist hits and local or private-network blocks are still
hard-denied.
- When `experimental_network.managed_allowed_domains_only = true`, only
managed `allowed_domains` are respected, user allowlist additions are
ignored, and non-managed domains are hard-denied without prompting.
- Denied domains continue to be respected from all sources.

### Full access
- With managed requirements present, the effective allowlist is pinned
to `experimental_network.allowed_domains`.
- With managed requirements present, the effective denylist is pinned to
`experimental_network.denied_domains`.
- There is no allowlist-miss approval path in full access.
- Explicit denylist hits are hard-denied.
- `experimental_network.managed_allowed_domains_only = true` now also
applies in full access, so managed-only behavior remains in effect
anywhere managed network enforcement is active.
2026-03-06 17:52:54 -08:00
viyatb-oai
5deaf9409b
fix: avoid invoking git before project trust is established (#13804)
## Summary
- resolve trust roots by inspecting `.git` entries on disk instead of
spawning `git rev-parse --git-common-dir`
- keep regular repo and linked-worktree trust inheritance behavior
intact
- add a synthetic regression test that proves worktree trust resolution
works without a real git command

## Testing
- `just fmt`
- `cargo test -p codex-core resolve_root_git_project_for_trust`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
- `cargo test -p codex-core` (fails in this environment on unrelated
managed-config `DangerFullAccess` tests in `codex::tests`,
`tools::js_repl::tests`, and `unified_exec::tests`)
2026-03-06 17:46:23 -08:00
Owen Lin
90469d0a23
feat(app-server-protocol): address naming conflicts in json schema exporter (#13819)
This fixes a schema export bug where two different `WebSearchAction`
types were getting merged under the same name in the app-server v2 JSON
schema bundle.

The problem was that v2 thread items use the app-server API's
`WebSearchAction` with camelCase variants like `openPage`, while
`ThreadResumeParams.history` and
`RawResponseItemCompletedNotification.item` pull in the upstream
`ResponseItem` graph, which uses the Responses API snake_case shape like
`open_page`. During bundle generation we were flattening nested
definitions into the v2 namespace by plain name, so the later definition
could silently overwrite the earlier one.

That meant clients generating code from the bundled schema could end up
with the wrong `WebSearchAction` definition for v2 thread history. In
practice this shows up on web search items reconstructed from rollout
files with persisted extended history.

This change does two things:
- Gives the upstream Responses API schema a distinct JSON schema name:
`ResponsesApiWebSearchAction`
- Makes namespace-level schema definition collisions fail loudly instead
of silently overwriting
2026-03-07 01:33:46 +00:00
Ruslan Nigmatullin
e9bd8b20a1
app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
2026-03-06 17:30:17 -08:00
Rohan Mehta
61098c7f51
Allow full web search tool config (#13675)
Previously, we could only configure whether web search was on/off.

This PR enables sending along a web search config, which includes all
the stuff responsesapi supports: filters, location, etc.
2026-03-07 00:50:50 +00:00
Celia Chen
8b81284975
fix(core): skip exec approval for permissionless skill scripts (#13791)
## Summary

- Treat skill scripts with no permission profile, or an explicitly empty
one, as permissionless and run them with the turn's existing sandbox
instead of forcing an exec approval prompt.
- Keep the approval flow unchanged for skills that do declare additional
permissions.
- Update the skill approval tests to assert that permissionless skill
scripts do not prompt on either the initial run or a rerun.

## Why

Permissionless skills should inherit the current turn sandbox directly.
Prompting for exec approval in that case adds friction without granting
any additional capability.
2026-03-06 16:40:41 -08:00
xl-openai
0243734300
feat: Add curated plugin marketplace + Metadata Cleanup. (#13712)
1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level
2026-03-06 19:39:35 -05:00
Owen Lin
289ed549cf
chore(otel): rename OtelManager to SessionTelemetry (#13808)
## Summary
This is a purely mechanical refactor of `OtelManager` ->
`SessionTelemetry` to better convey what the struct is doing. No
behavior change.

## Why

`OtelManager` ended up sounding much broader than what this type
actually does. It doesn't manage OTEL globally; it's the session-scoped
telemetry surface for emitting log/trace events and recording metrics
with consistent session metadata (`app_version`, `model`, `slug`,
`originator`, etc.).

`SessionTelemetry` is a more accurate name, and updating the call sites
makes that boundary a lot easier to follow.

## Validation

- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core`
2026-03-06 16:23:30 -08:00
Michael Bolin
3794363cac
fix: include libcap-dev dependency when creating a devcontainer for building Codex (#13814)
I mainly use the devcontainer to be able to run `cargo clippy --tests`
locally for Linux.

We still need to make it possible to run clippy from Bazel so I don't
need to do this!
2026-03-06 16:21:14 -08:00
Ahmed Ibrahim
a11c59f634
Add realtime startup context override (#13796)
- add experimental_realtime_ws_startup_context to override or disable
realtime websocket startup context
- preserve generated startup context when unset and cover the new
override paths in tests
2026-03-06 16:00:30 -08:00