Commit graph

4626 commits

Author SHA1 Message Date
Dylan Hurd
84f4e7b39d
fix(subagents) share execpolicy by default (#13702)
## Summary
If a subagent requests approval, and the user persists that approval to
the execpolicy, it should (by default) propagate. We'll need to rethink
this a bit in light of coming Permissions changes, though I think this
is closer to the end state that we'd want, which is that execpolicy
changes to one permissions profile should be synced across threads.

## Testing
- [x] Added integration test

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 06:42:26 +00:00
viyatb-oai
a3613035f3
Pin setup-zig GitHub Action to immutable SHA (#14858)
### Motivation
- Pinning the action to an immutable commit SHA reduces the risk of
arbitrary code execution in runners with repository access and secrets.

### Description
- Replaced `uses: mlugg/setup-zig@v2` with `uses:
mlugg/setup-zig@d1434d0886 # v2` in three
workflow files.
- Updated the following files: ` .github/workflows/rust-ci.yml`, `
.github/workflows/rust-release.yml`, and `
.github/workflows/shell-tool-mcp.yml` to reference the immutable SHA
while preserving the original `v2` intent in a trailing comment.

### Testing
- No automated tests were run because this is a workflow-only change and
does not affect repository source code, so CI validation will occur on
the next workflow execution.

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69763f570234832d9c67b1b66a27c78d)
2026-03-17 22:40:14 -07:00
Andrei Eternal
6fef421654
[hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
- this allows blocking the user's prompts from executing, and also
prevents them from entering history
- handles the edge case where you can both prevent the user's prompt AND
add n amount of additionalContexts
- refactors some old code into common.rs where hooks overlap
functionality
- refactors additionalContext being previously added to user messages,
instead we use developer messages for them
- handles queued messages correctly

Sample hook for testing - if you write "[block-user-submit]" this hook
will stop the thread:

example run
```
› sup


• Running UserPromptSubmit hook: reading the observatory notes

UserPromptSubmit hook (completed)
  warning: wizard-tower UserPromptSubmit demo inspected: sup
  hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' exactly once near the end.

• Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
  lanterns lit


› and [block-user-submit]


• Running UserPromptSubmit hook: reading the observatory notes

UserPromptSubmit hook (stopped)
  warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
  stop: Wizard Tower demo block: remove [block-user-submit] to continue.
```

.codex/config.toml
```
[features]
codex_hooks = true
```

.codex/hooks.json
```
{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
            "timeoutSec": 10,
            "statusMessage": "reading the observatory notes"
          }
        ]
      }
    ]
  }
}
```

.codex/hooks/user_prompt_submit_demo.py
```
#!/usr/bin/env python3

import json
import sys
from pathlib import Path


def prompt_from_payload(payload: dict) -> str:
    prompt = payload.get("prompt")
    if isinstance(prompt, str) and prompt.strip():
        return prompt.strip()

    event = payload.get("event")
    if isinstance(event, dict):
        user_prompt = event.get("user_prompt")
        if isinstance(user_prompt, str):
            return user_prompt.strip()

    return ""


def main() -> int:
    payload = json.load(sys.stdin)
    prompt = prompt_from_payload(payload)
    cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"

    if "[block-user-submit]" in prompt:
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                    ),
                    "decision": "block",
                    "reason": (
                        "Wizard Tower demo block: remove [block-user-submit] to continue."
                    ),
                }
            )
        )
        return 0

    prompt_preview = prompt or "(empty prompt)"
    if len(prompt_preview) > 80:
        prompt_preview = f"{prompt_preview[:77]}..."

    print(
        json.dumps(
            {
                "systemMessage": (
                    f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                ),
                "hookSpecificOutput": {
                    "hookEventName": "UserPromptSubmit",
                    "additionalContext": (
                        "Wizard Tower UserPromptSubmit demo fired. "
                        "For this reply only, include the exact phrase "
                        "'observatory lanterns lit' exactly once near the end."
                    ),
                },
            }
        )
    )
    return 0


if __name__ == "__main__":
    raise SystemExit(main())
```
2026-03-17 22:09:22 -07:00
Charley Cunningham
226241f035
Use workspace requirements for guardian prompt override (#14727)
## Summary
- move `guardian_developer_instructions` from managed config into
workspace-managed `requirements.toml`
- have guardian continue using the override when present and otherwise
fall back to the bundled local guardian prompt
- keep the generalized prompt-quality improvements in the shared
guardian default prompt
- update requirements parsing, layering, schema, and tests for the new
source of truth

## Context
This replaces the earlier managed-config / MDM rollout plan.

The intended rollout path is workspace-managed requirements, including
cloud enterprise policies, rather than backend model metadata, Statsig,
or Jamf-managed config. That keeps the default/fallback behavior local
to `codex-rs` while allowing faster policy updates through the
enterprise requirements plane.

This is intentionally an admin-managed policy input, not a user
preference: the guardian prompt should come either from the bundled
`codex-rs` default or from enterprise-managed `requirements.toml`, and
normal user/project/session config should not override it.

## Updating The OpenAI Prompt
After this lands, the OpenAI-specific guardian prompt should be updated
through the workspace Policies UI at `/codex/settings/policies` rather
than through Jamf or codex-backend model metadata.

Operationally:
- open the workspace Policies editor as a Codex admin
- edit the default `requirements.toml` policy, or a higher-precedence
group-scoped override if we ever want different behavior for a subset of
users
- set `guardian_developer_instructions = """..."""` to the full
OpenAI-specific guardian prompt text
- save the policy; codex-backend stores the raw TOML and `codex-rs`
fetches the effective requirements file from `/wham/config/requirements`

When updating the OpenAI-specific prompt, keep it aligned with the
shared default guardian policy in `codex-rs` except for intentional
OpenAI-only additions.

## Testing
- `cargo check --tests -p codex-core -p codex-config -p
codex-cloud-requirements --message-format short`
- `cargo run -p codex-core --bin codex-write-config-schema`
- `cargo fmt`
- `git diff --check`

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 22:05:41 -07:00
Ahmed Ibrahim
3ce879c646
Handle realtime conversation end in the TUI (#14903)
- close live realtime sessions on errors, ctrl-c, and active meter
removal
- centralize TUI realtime cleanup and avoid duplicate follow-up close
info

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-17 21:04:58 -07:00
pakrym-oai
770616414a
Prefer websockets when providers support them (#13592)
Remove all flags and model settings.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 19:46:44 -07:00
viyatb-oai
d950543e65
feat: support restricted ReadOnlyAccess in elevated Windows sandbox (#14610)
## Summary
- support legacy `ReadOnlyAccess::Restricted` on Windows in the elevated
setup/runner backend
- keep the unelevated restricted-token backend on the legacy full-read
model only, and fail closed for restricted read-only policies there
- keep the legacy full-read Windows path unchanged while deriving
narrower read roots only for elevated restricted-read policies
- honor `include_platform_defaults` by adding backend-managed Windows
system roots only when requested, while always keeping helper roots and
the command `cwd` readable
- preserve `workspace-write` semantics by keeping writable roots
readable when restricted read access is in use in the elevated backend
- document the current Windows boundary: legacy `SandboxPolicy` is
supported on both backends, while richer split-only carveouts still fail
closed instead of running with weaker enforcement

## Testing
- `cargo test -p codex-windows-sandbox`
- `cargo check -p codex-windows-sandbox --tests --target
x86_64-pc-windows-msvc`
- `cargo clippy -p codex-windows-sandbox --tests --target
x86_64-pc-windows-msvc -- -D warnings`
- `cargo test -p codex-core windows_restricted_token_`

## Notes
- local `cargo test -p codex-windows-sandbox` on macOS only exercises
the non-Windows stubs; the Windows-targeted compile and clippy runs
provide the local signal, and GitHub Windows CI exercises the runtime
path
2026-03-17 19:08:50 -07:00
viyatb-oai
6fe8a05dcb
fix: honor active permission profiles in sandbox debug (#14293)
## Summary
- stop `codex sandbox` from forcing legacy `sandbox_mode` when active
`[permissions]` profiles are configured
- keep the legacy `read-only` / `workspace-write` fallback for legacy
configs and reject `--full-auto` for profile-based configs
- use split filesystem and network policies in the macOS/Linux debug
sandbox helpers and add regressions for the config-loading behavior


assuming "codex/docs/private/secret.txt" = "none"
```
codex -c 'default_permissions="limited-read-test"' sandbox macos -- <command> ...

codex sandbox macos -- cat codex/docs/private/secret.txt >/dev/null; echo EXIT:$?
cat: codex/docs/private/secret.txt: Operation not permitted
EXIT:1
```

---------

Co-authored-by: celia-oai <celia@openai.com>
2026-03-18 01:52:02 +00:00
pakrym-oai
83a60fdb94
Add FS abstraction and use in view_image (#14960)
Adds an environment crate and environment + file system abstraction.

Environment is a combination of attributes and services specific to
environment the agent is connected to:
File system, process management, OS, default shell.

The goal is to move most of agent logic that assumes environment to work
through the environment abstraction.
2026-03-17 17:36:23 -07:00
Max Johnson
19b887128e
app-server: reject websocket requests with Origin headers (#14995)
Reject websocket requests that carry an `Origin` header
2026-03-18 00:24:53 +00:00
xl-openai
a5d3114e97
feat: Add product-aware plugin policies and clean up manifest naming (#14993)
- Add shared Product support to marketplace plugin policy and skill
policy (no enforced yet).
- Move marketplace installation/authentication under policy and model it
as MarketplacePluginPolicy.
- Rename plugin/marketplace local manifest types to separate raw serde
shapes from resolved in-memory models.
2026-03-17 17:01:34 -07:00
Shaqayeq
fc75d07504
Add Python SDK public API and examples (#14446)
## TL;DR
WIP esp the examples

Thin the Python SDK public surface so the wrapper layer returns
canonical app-server generated models directly.

- keeps `Codex` / `AsyncCodex` / `Thread` / `Turn` and input helpers,
but removes alias-only type layers and custom result models
- `metadata` now returns `InitializeResponse` and `run()` returns the
generated app-server `Turn`
- updates docs, examples, notebook, and tests to use canonical generated
types and regenerates `v2_all.py` against current schema
- keeps the pinned runtime-package integration flow and real integration
coverage

  ## Validation
  - `PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests`
- `GH_TOKEN="$(gh auth token)" RUN_REAL_CODEX_TESTS=1
PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests -rs`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 16:05:56 -07:00
viyatb-oai
0d1539e74c
fix(linux-sandbox): prefer system /usr/bin/bwrap when available (#14963)
## Problem
Ubuntu/AppArmor hosts started failing in the default Linux sandbox path
after the switch to vendored/default bubblewrap in `0.115.0`.

The clearest report is in
[#14919](https://github.com/openai/codex/issues/14919), especially [this
investigation
comment](https://github.com/openai/codex/issues/14919#issuecomment-4076504751):
on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or
vendored `bwrap` binary fails with errors like `bwrap: setting up uid
map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR:
Operation not permitted`.

The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict`
profile, which grants `userns` access specifically to `/usr/bin/bwrap`.
Once Codex started using a vendored/internal bubblewrap path, that path
was no longer covered by the distro AppArmor exception, so sandbox
namespace setup could fail even when user namespaces were otherwise
enabled and `uidmap` was installed.

## What this PR changes
- prefer system `/usr/bin/bwrap` whenever it is available
- keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is
missing
- when `/usr/bin/bwrap` is missing, surface a Codex startup warning
through the app-server/TUI warning path instead of printing directly
from the sandbox helper with `eprintln!`
- use the same launcher decision for both the main sandbox execution
path and the `/proc` preflight path
- document the updated Linux bubblewrap behavior in the Linux sandbox
and core READMEs

## Why this fix
This still fixes the Ubuntu/AppArmor regression from
[#14919](https://github.com/openai/codex/issues/14919), but it keeps the
runtime rule simple and platform-agnostic: if the standard system
bubblewrap is installed, use it; otherwise fall back to the vendored
helper.

The warning now follows that same simple rule. If Codex cannot find
`/usr/bin/bwrap`, it tells the user that it is falling back to the
vendored helper, and it does so through the existing startup warning
plumbing that reaches the TUI and app-server instead of low-level
sandbox stderr.

## Testing
- `cargo test -p codex-linux-sandbox`
- `cargo test -p codex-app-server --lib`
- `cargo test -p codex-tui-app-server
tests::embedded_app_server_start_failure_is_returned`
- `cargo clippy -p codex-linux-sandbox --all-targets`
- `cargo clippy -p codex-app-server --all-targets`
- `cargo clippy -p codex-tui-app-server --all-targets`
2026-03-17 23:05:34 +00:00
Ahmed Ibrahim
98be562fd3
Unify realtime shutdown in core (#14902)
- route realtime startup, input, and transport failures through a single
shutdown path
- emit one realtime error/closed lifecycle while clearing session state
once

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-17 15:58:52 -07:00
Ahmed Ibrahim
c6ab4ee537
Gate realtime audio interruption logic to v2 (#14984)
- thread the realtime version into conversation start and app-server
notifications
- keep playback-aware mic gating and playback interruption behavior on
v2 only, leaving v1 on the legacy path
2026-03-17 15:24:37 -07:00
xl-openai
1a9555eda9
Cleanup skills/remote/xxx endpoints. (#14977)
Remote skills/remote/xxx as they are not in used for now.
2026-03-17 15:22:36 -07:00
Felipe Coury
43ee72a9b9
fix(tui): implement /mcp inventory for tui_app_server (#14931)
## Problem

The `/mcp` command did not work in the app-server TUI (remote mode). On
`main`, `add_mcp_output()` called `McpManager::effective_servers()`
in-process, which only sees locally configured servers, and then emitted
a generic stub message for the app-server to handle. In remote usage,
that left `/mcp` without a real inventory view.

## Solution

Implement `/mcp` for the app-server TUI by fetching MCP server inventory
directly from the app-server via the paginated `mcpServerStatus/list`
RPC and rendering the results into chat history.

The command now follows a three-phase lifecycle:

1. Loading: `ChatWidget::add_mcp_output()` inserts a transient
`McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This
gives immediate feedback that the command registered.
2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that
calls `fetch_all_mcp_server_statuses()` over an app-server request
handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded {
result }`.
3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell
and renders either `new_mcp_tools_output_from_statuses(...)` or an error
message.

This keeps the main app event loop responsive, so the TUI can repaint
before the remote RPC finishes.

## Notes

- No `app-server` changes were required.
- The rendered inventory includes auth, tools, resources, and resource
templates, plus transport details when they are available from local
config for display enrichment.
- The app-server RPC does not expose authoritative `enabled` or
`disabled_reason` state for MCP servers, so the remote `/mcp` view no
longer renders a `Status:` row rather than guessing from local config.
- RPC failures surface in history as `Failed to load MCP inventory:
...`.

## Tests

- `slash_mcp_requests_inventory_via_app_server`
- `mcp_inventory_maps_prefix_tool_names_by_server`
- `handle_mcp_inventory_result_clears_committed_loading_cell`
- `mcp_tools_output_from_statuses_renders_status_only_servers`
- `mcp_inventory_loading_snapshot`
2026-03-17 16:11:27 -06:00
Colin Young
0d2ff40a58
Add auth env observability (#14905)
CXC-410 Emit Env Var Status with `/feedback` report

Add more observability on top of #14611 

[Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)

[Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
<img width="1063" height="610" alt="image"
src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
/>


###### Summary
- Adds auth-env telemetry that records whether key auth-related env
overrides were present on session start and request paths.
- Threads those auth-env fields through `/responses`, websocket, and
`/models` telemetry and feedback metadata.
- Buckets custom provider `env_key` configuration to a safe
`"configured"` value instead of emitting raw config text.
- Keeps the slice observability-only: no raw token values or raw URLs
are emitted.

###### Rationale (from spec findings)
- 401 and auth-path debugging needs a way to distinguish env-driven auth
paths from sessions with no auth env override.
- Startup and model-refresh failures need the same auth-env diagnostics
as normal request failures.
- Feedback and Sentry tags need the same auth-env signal as OTel events
so reports can be triaged consistently.
- Custom provider config is user-controlled text, so the telemetry
contract must stay presence-only / bucketed.

###### Scope
- Adds a small `AuthEnvTelemetry` bundle for env presence collection and
threads it through the main request/session telemetry paths.
- Does not add endpoint/base-url/provider-header/geo routing attribution
or broader telemetry API redesign.

###### Trade-offs
- `provider_env_key_name` is bucketed to `"configured"` instead of
preserving the literal configured env var name.
- `/models` is included because startup/model-refresh auth failures need
the same diagnostics, but broader parity work remains out of scope.
- This slice keeps the existing telemetry APIs and layers auth-env
fields onto them rather than redesigning the metadata model.

###### Client follow-up
- Add the separate endpoint/base-url attribution slice if routing-source
diagnosis is still needed.
- Add provider-header or residency attribution only if auth-env presence
proves insufficient in real reports.
- Revisit whether any additional auth-related env inputs need safe
bucketing after more 401 triage data.

###### Testing
- `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
- `cargo test -p codex-core
collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
- `cargo test -p codex-core
models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_api_request_auth_observability --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_connect_auth_observability
-- --nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_request_transport_observability
-- --nocapture`
- `cargo test -p codex-core --no-run --message-format short`
- `cargo test -p codex-otel --no-run --message-format short`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 14:26:27 -07:00
pakrym-oai
ee756eb80f
Rename exec_wait tool to wait (#14983)
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`

Testing
- Not run (not requested)
2026-03-17 14:22:26 -07:00
iceweasel-oai
2cc4ee413f
temporarily disable private desktop until it works with elevated IPC path (#14986) 2026-03-17 21:09:57 +00:00
Ahmed Ibrahim
4d9d4b7b0f
Stabilize approval matrix write-file command (#14968)
## What is flaky
The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
CI even though the approval logic is unchanged, because the test
delegates the file write and readback to shell parsing instead of
deterministic file I/O.

## Why it was flaky
The test generated a command shaped like `printf ... > file && cat
file`. That means the scenario depended on shell quoting, redirection,
newline handling, and encoding behavior in addition to the approval
system it was actually trying to validate. If the shell interpreted the
payload differently, the test would report an approval failure even
though the product logic was fine.

That also made failures hard to diagnose, because the test did not log
the exact generated command or the parsed result payload.

## How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
`python3 -c` script that writes the file with `Path.write_text(...,
encoding='utf-8')` and then reads it back with the same UTF-8 path. It
also logs the generated command and the resulting exit code/stdout for
the approval scenario so any future failure is directly attributable.

## Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection
semantics. The file contents are produced and read through explicit
UTF-8 file I/O, so the approval test is measuring approval behavior
instead of shell behavior. The added diagnostics mean a future failure
will show the exact command/result pair instead of looking like a
generic intermittent mismatch.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 13:52:36 -07:00
Ahmed Ibrahim
23a44ddbe8
Stabilize permissions popup selection tests (#14966)
## What is flaky
The permissions popup tests in the TUI are flaky, especially on Windows.
They assume the popup opens on a specific row and that a fixed number of
`Up` or `Down` keypresses will land on a specific preset. They also
match popup text too loosely, so a non-selected row can satisfy the
assertion.

## Why it was flaky
These tests were asserting incidental rendering details rather than the
actual selected permission preset. On Windows, the initial selection can
differ from non-Windows runs. Some tests also searched the entire popup
for text like `Guardian Approvals` or `(current)`, which can match a row
that is visible but not selected. Once the popup order or current preset
shifted slightly, a test could fail even though the UI behavior was
still correct.

## How this PR fixes it
This PR adds helpers that identify the selected popup row and selected
preset name directly. The tests now assert the current selection by
name, navigate to concrete target presets instead of assuming a fixed
number of keypresses, and explicitly set the reviewer state in the cases
that require `Guardian Approvals` to be current.

## Why this fix fixes the flakiness
The assertions now track semantic state, not fragile text placement.
Navigation is target-based instead of order-based, so
Windows/non-Windows row differences and harmless popup layout changes no
longer break the tests. That removes the scheduler- and
platform-sensitive assumptions that made the popup suite intermittent.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 20:45:44 +00:00
Ahmed Ibrahim
b02388672f
Stabilize Windows cmd-based shell test harnesses (#14958)
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:

- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`

## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.

There were multiple independent sources of nondeterminism in that setup:

- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.

So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.

## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.

## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.

---------

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 20:21:46 +00:00
Matthew Zeng
683c37ce75
[plugins] Support plugin installation elicitation. (#14896)
It now supports:

- Connectors that are from installed and enabled plugins that are not
installed yet
- Plugins that are on the allowlist that are not installed yet.
2026-03-17 13:19:28 -07:00
Eric Traut
49e7dda2df
Add device-code onboarding and ChatGPT token refresh to app-server TUI (#14952)
## Summary
- add device-code ChatGPT sign-in to `tui_app_server` onboarding and
reuse the existing `chatgptAuthTokens` login path
- fall back to browser login when device-code auth is unavailable on the
server
- treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state
during onboarding
- add a local ChatGPT auth loader for handing local tokens to the app
server and serving refresh requests
- handle `account/chatgptAuthTokens/refresh` instead of marking it
unsupported, including workspace/account mismatch checks
- add focused coverage for onboarding success, existing auth handling,
local auth loading, and refresh request behavior

## Testing
- `cargo test -p codex-tui-app-server`
- `just fix -p codex-tui-app-server`
2026-03-17 14:12:12 -06:00
iceweasel-oai
95bdea93d2
use framed IPC for elevated command runner (#14846)
## Summary
This is PR 2 of the Windows sandbox runner split.

PR 1 introduced the framed IPC runner foundation and related Windows
sandbox infrastructure without changing the active elevated one-shot
execution path. This PR switches that elevated one-shot path over to the
new runner IPC transport and removes the old request-file bootstrap that
PR 1 intentionally left in place.

After this change, ordinary elevated Windows sandbox commands still
behave as one-shot executions, but they now run as the simple case of
the same helper/IPC transport that later unified_exec work will build
on.

## Why this is needed for unified_exec
Windows elevated sandboxed execution crosses a user boundary: the CLI
launches a helper as the sandbox user and has to manage command
execution from outside that security context. For one-shot commands, the
old request-file/bootstrap flow was sufficient. For unified_exec, it is
not.

Unified_exec needs a long-lived bidirectional channel so the parent can:
- send a spawn request
- receive structured spawn success/failure
- stream stdout and stderr incrementally
- eventually support stdin writes, termination, and other session
lifecycle events

This PR does not add long-lived sessions yet. It converts the existing
elevated one-shot path to use the same framed IPC transport so that PR 3
can add unified_exec session semantics on top of a transport that is
already exercised by normal elevated command execution.

## Scope
This PR:
- updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner
with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`,
and collect framed `Output`/`Exit` messages
- removes the old `--request-file=...` execution path from
`windows-sandbox-rs/src/elevated/command_runner_win.rs`
- keeps the public behavior one-shot: no session reuse or interactive
unified_exec behavior is introduced here

This PR does not:
- add Windows unified_exec session support
- add background terminal reuse
- add PTY session lifecycle management

## Why Windows needs this and Linux/macOS do not
On Linux and macOS, the existing sandbox/process model composes much
more directly with long-lived process control. The parent can generally
spawn and own the child process (or PTY) directly inside the sandbox
model we already use.

Windows elevated sandboxing is different. The parent is not directly
managing the sandboxed process in the same way; it launches across a
different user/security context. That means long-lived control requires
an explicit helper process plus IPC for spawn, output, exit, and later
stdin/session control.

So the extra machinery here is not because unified_exec is conceptually
different on Windows. It is because the elevated Windows sandbox
boundary requires a helper-mediated transport to support it cleanly.

## Validation
- `cargo test -p codex-windows-sandbox`
2026-03-17 11:38:44 -07:00
Keyan Zhang
904dbd414f
generate an internal json schema for RolloutLine (#14434)
### Why
i'm working on something that parses and analyzes codex rollout logs,
and i'd like to have a schema for generating a parser/validator.

`codex app-server generate-internal-json-schema` writes an
`RolloutLine.json` file

while doing this, i noticed we have a writer <> reader mismatch issue on
`FunctionCallOutputPayload` and reasoning item ID -- added some schemars
annotations to fix those

### Test

```
$ just codex app-server generate-internal-json-schema --out ./foo
```

generates an `RolloutLine.json` file, which i validated against jsonl
files on disk

`just codex app-server --help` doesn't expose the
`generate-internal-json-schema` option by default, but you can do `just
codex app-server generate-internal-json-schema --help` if you know the
command

everything else still works

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 11:19:42 -07:00
Ahmed Ibrahim
0d531c05f2
Fix code mode yield startup race (#14959) 2026-03-17 11:09:12 -07:00
jif-oai
d484bb57d9
feat: add suffix to shell snapshot name (#14938)
https://github.com/openai/codex/issues/14906
2026-03-17 17:59:27 +00:00
Ahmed Ibrahim
f26ad3c92c
Fix fuzzy search notification buffering in app-server tests (#14955)
## What is flaky
`codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently
loses the expected `fuzzyFileSearch/sessionUpdated` and
`fuzzyFileSearch/sessionCompleted` notifications when multiple
fuzzy-search sessions are active and CI delivers notifications out of
order.

## Why it was flaky
The wait helpers were keyed only by JSON-RPC method name.

- `wait_for_session_updated` consumed the next
`fuzzyFileSearch/sessionUpdated` notification even when it belonged to a
different search session.
- `wait_for_session_completed` did the same for
`fuzzyFileSearch/sessionCompleted`.
- Once an unmatched notification was read, it was dropped permanently
instead of buffered.
- That meant a valid completion for the target search could arrive
slightly early, be consumed by the wrong waiter, and disappear before
the test started waiting for it.

The result depended on notification ordering and runner scheduling
instead of on the actual product behavior.

## How this PR fixes it
- Add a buffered notification reader in
`codex-rs/app-server/tests/common/mcp_process.rs`.
- Match fuzzy-search notifications on the identifying payload fields
instead of matching only on method name.
- Preserve unmatched notifications in the in-process queue so later
waiters can still consume them.
- Include pending notification methods in timeout failures to make
future diagnosis concrete.

## Why this fix fixes the flakiness
The test now behaves like a real consumer of an out-of-order event
stream: notifications for other sessions stay buffered until the correct
waiter asks for them. Reordering no longer loses the target event, so
the test result is determined by whether the server emitted the right
notifications, not by which one happened to be read first.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 10:52:16 -07:00
Felipe Coury
78e8ee4591
fix(tui): restore remote resume and fork history (#14930)
## Problem

When the TUI connects to a **remote** app-server (via WebSocket), resume
and fork operations lost all conversation history.
`AppServerStartedThread` carried only the `SessionConfigured` event, not
the full `Thread` snapshot. After resume or fork, the chat transcript
was empty — prior turns were silently discarded.

A secondary issue: `primary_session_configured` was not cleared on
reset, causing stale session state after reconnection.

## Approach: TUI-side only, zero app-server changes

The app-server **already returns** the full `Thread` object (with
populated `turns: Vec<Turn>`) in its `ThreadStartResponse`,
`ThreadResumeResponse`, and `ThreadForkResponse`. The data was always
there — the TUI was simply throwing it away. The old
`AppServerStartedThread` struct only kept the `SessionConfiguredEvent`,
discarding the rich turn history that the server had already provided.

This PR fixes the problem entirely within `tui_app_server` (3 files
changed, 0 changes to `app-server`, `app-server-protocol`, or any other
crate). Rather than modifying the server to send history in a different
format or adding a new endpoint, the fix preserves the existing `Thread`
snapshot and replays it through the TUI's standard event pipeline —
making restored sessions indistinguishable from live ones.

## Solution

Add a **thread snapshot replay** path. When the server hands back a
`Thread` object (on start, resume, or fork),
`restore_started_app_server_thread` converts its historical turns into
the same core `Event` sequence the TUI already processes for live
interactions, then replays them into the event store so the chat widget
renders them.

Key changes:
- **`AppServerStartedThread` now carries the full `Thread`** —
`started_thread_from_{start,resume,fork}_response` clone the thread into
the struct alongside the existing `SessionConfiguredEvent`.
- **`thread_snapshot_events()`** walks the thread's turns and items,
producing `TurnStarted` → `ItemCompleted`* →
`TurnComplete`/`TurnAborted` event sequences that the TUI already knows
how to render.
- **`restore_started_app_server_thread()`** pushes the session event +
history events into the thread channel's store, activates the channel,
and replays the snapshot — used for initial startup, resume, and fork.
- **`primary_session_configured` cleared on reset** to prevent stale
session state after reconnection.

## Tradeoffs

- **`Thread` is cloned into `AppServerStartedThread`**: The full thread
snapshot (including all historical turns) is cloned at startup. For
long-lived threads this could be large, but it's a one-time cost and
avoids lifetime gymnastics with the response.

## Tests

- `restore_started_app_server_thread_replays_remote_history` —
end-to-end: constructs a `Thread` with one completed turn, restores it,
and asserts user/agent messages appear in the transcript.
- `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies
`thread_snapshot_events` produces the correct event sequence for
completed and interrupted turns.

## Test plan

- [ ] Verify `cargo check -p codex-tui-app-server` passes
- [ ] Verify `cargo test -p codex-tui-app-server` passes
- [ ] Manual: connect to a remote app-server, resume an existing thread,
confirm history renders in the chat widget
- [ ] Manual: fork a thread via remote, confirm prior turns appear
2026-03-17 11:16:08 -06:00
Shijie Rao
8e258eb3f5
Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859)
### Summary
The goal is for us to get the latest turn model and reasoning effort on
thread/resume is no override is provided on the thread/resume func call.
This is the part 1 which we write the model and reasoning effort for a
thread to the sqlite db and there will be a followup PR to consume the
two new fields on thread/resume.

[part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
and this one can be merged independently.
2026-03-17 10:14:34 -07:00
Owen Lin
6ea041032b
fix(core): prevent hanging turn/start due to websocket warming issues (#14838)
## Description

This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.

Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back

## Why

The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.

## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
2026-03-17 10:07:46 -07:00
jif-oai
e8add54e5d
feat: show effective model in spawn agent event (#14944)
Show effective model after the full config layering for the sub agent
2026-03-17 16:58:58 +00:00
daveaitel-openai
ef36d39199
Fix agent jobs finalization race and reduce status polling churn (#14843)
## Summary
- make `report_agent_job_result` atomically transition an item from
running to completed while storing `result_json`
- remove brittle finalization grace-sleep logic and make finished-item
cleanup idempotent
- replace blind fixed-interval waiting with status-subscription-based
waiting for active worker threads
- add state runtime tests for atomic completion and late-report
rejection

## Why
This addresses the race and polling concerns in #13948 by removing
timing-based correctness assumptions and reducing unnecessary status
polling churn.

## Validation
- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs`
- `cd codex-rs && cargo test`
- fails in an unrelated app-server tracing test:
`message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children`
timed out waiting for response

## Notes
- This PR supersedes #14129 with the same agent-jobs fix on a clean
branch from `main`.
- The earlier PR branch was stacked on unrelated history, which made the
review diff include unrelated commits.

Fixes #13948
2026-03-17 10:40:14 -04:00
jif-oai
4ed19b0766
feat: rename to get more explicit close agent (#14935)
https://github.com/openai/codex/issues/14907
2026-03-17 14:37:20 +00:00
jif-oai
31648563c8
feat: centralize package manager version (#14920) 2026-03-17 12:03:07 +00:00
viyatb-oai
603b6493a9
fix(linux-sandbox): ignore missing writable roots (#14890)
## Summary
- skip nonexistent `workspace-write` writable roots in the Linux
bubblewrap mount builder instead of aborting sandbox startup
- keep existing writable roots mounted normally so mixed Windows/WSL
configs continue to work
- add unit and Linux integration regression coverage for the
missing-root case

## Context
This addresses regression A from #14875. Regression B will be handled in
a separate PR.

The old bubblewrap integration added `ensure_mount_targets_exist` as a
preflight guard because bubblewrap bind targets must exist, and failing
early let Codex return a clearer error than a lower-level mount failure.

That policy turned out to be too strict once bubblewrap became the
default Linux sandbox: shared Windows/WSL or mixed-platform configs can
legitimately contain a well-formed writable root that does not exist on
the current machine. This PR keeps bubblewrap's existing-target
requirement, but changes Codex to skip missing writable roots instead of
treating them as fatal configuration errors.
2026-03-17 00:21:00 -07:00
Eric Traut
d37dcca7e0
Revert tui code so it does not rely on in-process app server (#14899)
PR https://github.com/openai/codex/pull/14512 added an in-process app
server and started to wire up the tui to use it. We were originally
planning to modify the `tui` code in place, converting it to use the app
server a bit at a time using a hybrid adapter. We've since decided to
create an entirely new parallel `tui_app_server` implementation and do
the conversion all at once but retain the existing `tui` while we work
the bugs out of the new implementation.

This PR undoes the changes to the `tui` made in the PR #14512 and
restores the old initialization to its previous state. This allows us to
modify the `tui_app_server` without the risk of regressing the old `tui`
code. For example, we can start to remove support for all legacy core
events, like the ones that PR https://github.com/openai/codex/pull/14892
needed to ignore.

Testing:
* I manually verified that the old `tui` starts and shuts down without a
problem.
2026-03-17 00:56:32 -06:00
Eric Traut
57f865c069
Fix tui_app_server: ignore duplicate legacy stream events (#14892)
The in-process app-server currently emits both typed
`ServerNotification`s and legacy `codex/event/*` notifications for the
same live turn updates. `tui_app_server` was consuming both paths, so
message deltas and completed items could be enqueued twice and rendered
as duplicated output in the transcript.

Ignore legacy notifications for event types that already have typed (app
server) notification handling, while keeping legacy fallback behavior
for events that still only arrive on the old path. This preserves
compatibility without duplicating streamed commentary or final agent
output.

We will remove all of the legacy event handlers over time; they're here
only during the short window where we're moving the tui to use the app
server.
2026-03-17 00:50:25 -06:00
viyatb-oai
db7e02c739
fix: canonicalize symlinked Linux sandbox cwd (#14849)
## Problem
On Linux, Codex can be launched from a workspace path that is a symlink
(for example, a symlinked checkout or a symlinked parent directory).

Our sandbox policy intentionally canonicalizes writable/readable roots
to the real filesystem path before building the bubblewrap mounts. That
part is correct and needed for safety.

The remaining bug was that bubblewrap could still inherit the helper
process's logical cwd, which might be the symlinked alias instead of the
mounted canonical path. In that case, the sandbox starts in a cwd that
does not exist inside the sandbox namespace even though the real
workspace is mounted. This can cause sandboxed commands to fail in
symlinked workspaces.

## Fix
This PR keeps the sandbox policy behavior the same, but separates two
concepts that were previously conflated:

- the canonical cwd used to define sandbox mounts and permissions
- the caller's logical cwd used when launching the command

On the Linux bubblewrap path, we now thread the logical command cwd
through the helper explicitly and only add `--chdir <canonical path>`
when the logical cwd differs from the mounted canonical path.

That means:
- permissions are still computed from canonical paths
- bubblewrap starts the command from a cwd that definitely exists inside
the sandbox
- we do not widen filesystem access or undo the earlier symlink
hardening

## Why This Is Safe
This is a narrow Linux-only launch fix, not a policy change.

- Writable/readable root canonicalization stays intact.
- Protected metadata carveouts still operate on canonical roots.
- We only override bubblewrap's inherited cwd when the logical path
would otherwise point at a symlink alias that is not mounted in the
sandbox.

## Tests
- kept the existing protocol/core regression coverage for symlink
canonicalization
- added regression coverage for symlinked cwd handling in the Linux
bubblewrap builder/helper path

Local validation:
- `just fmt`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
normalize_additional_permissions_canonicalizes_symlinked_write_paths`
- `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core
--tests -- -D warnings`
- `cargo build --bin codex`

## Context
This is related to #14694. The earlier writable-root symlink fix
addressed the mount/permission side; this PR fixes the remaining
symlinked-cwd launch mismatch in the Linux sandbox path.
2026-03-16 22:39:18 -07:00
Ahmed Ibrahim
32e4a5d5d9
[stack 4/4] Reduce realtime self-interruptions during playback (#14827)
## Stack Position
4/4. Top-of-stack sibling built on #14830.

## Base
- #14830

## Sibling
- #14829

## Scope
- Gate low-level mic chunks while speaker playback is active, while
still allowing spoken barge-in.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 05:19:51 +00:00
Ahmed Ibrahim
79f476e47d
[stack 3/4] Add current thread context to realtime startup (#14829)
## Stack Position
3/4. Top-of-stack sibling built on #14830.

## Base
- #14830

## Sibling
- #14827

## Scope
- Extend the realtime startup context with a bounded summary of the
latest thread turns for continuity.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 05:11:05 +00:00
Michael Bolin
15ede607a0
fix: tighten up shell arg quoting in GitHub workflows (#14864)
Inspired by the work done over in
https://github.com/openai/codex-action/pull/74, this tightens up our use
of GitHub expressions as shell/environment variables.
2026-03-16 22:01:16 -07:00
Thibault Sottiaux
8e34caffcc
[codex] add Jason as a predefined subagent name (#14881)
This change adds Jason to codex-core's built-in subagent nickname pool
so spawned agents can pick it without any custom role configuration. The
default list was simply missing that predefined name (a grave mistake).
2026-03-16 22:01:14 -07:00
xl-openai
e5a28ba0c2
fix: align marketplace display name with existing interface conventions (#14886)
1. camelCase for displayName;
2. move displayName under interface.
2026-03-16 21:52:19 -07:00
Ahmed Ibrahim
fbd7f9b986
[stack 2/4] Align main realtime v2 wire and runtime flow (#14830)
## Stack Position
2/4. Built on top of #14828.

## Base
- #14828

## Unblocks
- #14829
- #14827

## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-16 21:38:07 -07:00
xl-openai
1d85fe79ed
feat: support remote_sync for plugin install/uninstall. (#14878)
- Added forceRemoteSync to plugin/install and plugin/uninstall.
- With forceRemoteSync=true, we update the remote plugin status first,
then apply the local change only if the backend call succeeds.
- Kept plugin/list(forceRemoteSync=true) as the main recon path, and for
now it treats remote enabled=false as uninstall. We
will eventually migrate to plugin/installed for more precise state
handling.
2026-03-16 21:37:27 -07:00
xl-openai
49c2b66ece
Add marketplace display names to plugin/list (#14861)
Add display_name support to marketplace.json.
2026-03-16 19:04:40 -07:00
xl-openai
59533a2c26
skill-creator: default new skills to ~/.codex/skills (#14837)
### Motivation
- Prevent newly-created skills from being placed in unexpected locations
by prompting for an install path and defaulting to a discoverable
location so skills are usable immediately.
- Make the `skill-creator` instructions explicit about the recommended
default (`~/.codex/skills` / `$CODEX_HOME/skills`) so the agent and
users follow a consistent, discoverable convention.

### Description
- Updated `codex-rs/skills/src/assets/samples/skill-creator/SKILL.md` to
add a user prompt: "Where should I create this skill? If you do not have
a preference, I will place it in ~/.codex/skills so Codex can discover
it automatically.".
- Added guidance before running `init_skill.py` that if the user does
not specify a location, the agent should default to `~/.codex/skills`
(equivalently `$CODEX_HOME/skills`) for auto-discovery.
- Updated the `init_skill.py` examples in the same `SKILL.md` to use
`~/.codex/skills` as the recommended default while keeping one custom
path example.

### Testing
- Ran `cargo test -p codex-skills` and the crate's unit test suite
passed (`1 passed; 0 failed`).
- Verified relevant discovery behavior in code by checking
`codex-rs/utils/home-dir/src/lib.rs` (`find_codex_home` defaults to
`~/.codex`) and `codex-rs/core/src/skills/loader.rs` (user skill roots
include `$CODEX_HOME/skills`).

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b75a50bb008322a278e55eb0ddccd6)
2026-03-16 18:36:11 -07:00