Commit graph

4014 commits

Author SHA1 Message Date
Eric Traut
01df50cf42
Add thread/shellCommand to app server API surface (#14988)
This PR adds a new `thread/shellCommand` app server API so clients can
implement `!` shell commands. These commands are executed within the
sandbox, and the command text and output are visible to the model.

The internal implementation mirrors the current TUI `!` behavior.
- persist shell command execution as `CommandExecution` thread items,
including source and formatted output metadata
- bridge live and replayed app-server command execution events back into
the existing `tui_app_server` exec rendering path

This PR also wires `tui_app_server` to submit `!` commands through the
new API.
2026-03-18 23:42:40 -06:00
canvrno-oai
10eb3ec7fc
Simple directory mentions (#14970)
- Adds simple support for directory mentions in the TUI.
- Codex App/VS Code will require minor change to recognize a directory
mention as such and change the link behavior.
- Directory mentions have a trailing slash to differentiate from
extensionless files


<img width="972" height="382" alt="image"
src="https://github.com/user-attachments/assets/8035b1eb-0978-465b-8d7a-4db2e5feca39"
/>
<img width="978" height="228" alt="image"
src="https://github.com/user-attachments/assets/af22cf0b-dd10-4440-9bee-a09915f6ba52"
/>
2026-03-19 05:24:09 +00:00
Andrei Eternal
42e932d7bf
[hooks] turn_id extension for Stop & UserPromptSubmit (#15118)
## Description

Adding an extension to the spec that exposes the turn_id to hook
scripts. This is a codex-specific mechanic that allows connecting the
hook runs with particular turns

## Testing

hooks config / sample hooks to use. Extract this, rename codex ->
.codex, and place this into a repo or your home folder. It includes:
config.toml that enables hooks, hooks.json, and sample python hooks:


[codex.zip](https://github.com/user-attachments/files/26102671/codex.zip)

example run (note the turn_ids change between turns):

```
› hello


• Running SessionStart hook: lighting the observatory

SessionStart hook (completed)
  warning: Hi, I'm a session start hook for wizard-tower (startup).
  hook context: Startup context: A wimboltine stonpet is an exotic cuisine from hyperspace

• Running UserPromptSubmit hook: lighting the observatory lanterns

UserPromptSubmit hook (completed)
  warning: wizard-tower UserPromptSubmit demo inspected: hello for turn: 019d036d-c7fa-72d2-b6fd-
78878bfe34e4
  hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' near the end.

• Aloha! Grateful to be here and ready to build with you. Show me what you want to tackle in wizard-
  tower, and we’ll surf the next wave together. observatory lanterns lit

• Running Stop hook: back to shore

Stop hook (completed)
  warning: Wizard Tower Stop hook reviewed the completed reply (170 chars) for turn: 019d036d-c7fa-
72d2-b6fd-78878bfe34e4


› what's a stonpet?


• Running UserPromptSubmit hook: lighting the observatory lanterns

UserPromptSubmit hook (completed)
  warning: wizard-tower UserPromptSubmit demo inspected: what's a stonpet? for turn: 019d036e-3164-
72c3-a170-98925564c4fc
  hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' near the end.

• A stonpet isn’t a standard real-world word, brah. In our shared context here, a wimboltine stonpet
  is an exotic cuisine from hyperspace, so “stonpet” sounds like the dish or food itself.

  If you want, we can totally invent the lore for it next. observatory lanterns lit

• Running Stop hook: back to shore

Stop hook (completed)
  warning: Wizard Tower Stop hook reviewed the completed reply (271 chars) for turn: 019d036e-3164-
72c3-a170-98925564c4fc
```
2026-03-18 21:48:31 -07:00
nicholasclark-openai
b14689df3b
Forward session and turn headers to MCP HTTP requests (#15011)
## Summary
- forward request-scoped task headers through MCP tool metadata lookups
and tool calls
- apply those headers to streamable HTTP initialize, tools/list, and
tools/call requests
- update affected rmcp/core tests for the new request_headers plumbing

## Testing
- cargo test -p codex-rmcp-client
- cargo test -p codex-core (fails on pre-existing unrelated error in
core/src/auth_env_telemetry.rs: missing websocket_connect_timeout_ms in
ModelProviderInfo initializer)
- just fix -p codex-rmcp-client
- just fix -p codex-core (hits the same unrelated auth_env_telemetry.rs
error)
- just fmt

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 21:29:37 -07:00
Owen Lin
20f2a216df
feat(core, tracing): create turn spans over websockets (#14632)
## Description

Dependent on:
- [responsesapi] https://github.com/openai/openai/pull/760991 
- [codex-backend] https://github.com/openai/openai/pull/760985

`codex app-server -> codex-backend -> responsesapi` now reuses a
persistent websocket connection across many turns. This PR updates
tracing when using websockets so that each `response.create` websocket
request propagates the current tracing context, so we can get a holistic
end-to-end trace for each turn.

Tracing is propagated via special keys (`ws_request_header_traceparent`,
`ws_request_header_tracestate`) set in the `client_metadata` param in
Responses API.

Currently tracing on websockets is a bit broken because we only set
tracing context on ws connection time, so it's detached from a
`turn/start` request.
2026-03-19 03:41:06 +00:00
pakrym-oai
903660edba
Remove stdio transport from exec server (#15119)
Summary
- delete the deprecated stdio transport plumbing from the exec server
stack
- add a basic `exec_server()` harness plus test utilities to start a
server, send requests, and await events
- refresh exec-server dependencies, configs, and documentation to
reflect the new flow

Testing
- Not run (not requested)

---------

Co-authored-by: starr-openai <starr@openai.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-19 01:00:35 +00:00
alexsong-oai
825d09373d
Support featured plugins (#15042) 2026-03-18 17:45:30 -07:00
starr-openai
81996fcde6
Add exec-server stub server and protocol docs (#15089)
Stacked PR 1/3.

This is the initialize-only exec-server stub slice: binary/client
scaffolding and protocol docs, without exec/filesystem implementation.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-19 00:30:05 +00:00
xl-openai
dcd5e08269
fix: harden plugin feature gating (#15104)
Resubmit https://github.com/openai/codex/pull/15020 with correct
content.

1. Use requirement-resolved config.features as the plugin gate.
2. Guard plugin/list, plugin/read, and related flows behind that gate.
3. Skip bad marketplace.json files instead of failing the whole list.
4. Simplify plugin state and caching.
2026-03-19 00:03:37 +00:00
pakrym-oai
56d0c6bf67
Add apply_patch code mode result (#15100)
It's empty !
2026-03-18 16:11:10 -07:00
pakrym-oai
3590e181fa
Add update_plan code mode result (#15103)
It's empty!
2026-03-18 16:10:51 -07:00
Ahmed Ibrahim
b306885bd8
don't add transcript for v2 realtime (#15111)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-03-18 15:54:13 -07:00
Shijie Rao
bb30432421
Feat: reuse persisted model and reasoning effort on thread resume (#14888)
## Summary

This PR makes `thread/resume` reuse persisted thread model metadata when
the caller does not explicitly override it.

Changes:
- read persisted thread metadata from SQLite during `thread/resume`
- reuse persisted `model` and `model_reasoning_effort` as resume-time
defaults
- fetch persisted metadata once and reuse it later in the resume
response path
- keep thread summary loading on the existing rollout path, while
reusing persisted metadata when available
- document the resume fallback behavior in the app-server README

## Why

Before this change, resuming a thread without explicit overrides derived
`model` and `model_reasoning_effort` from current config, which could
drift from the thread’s last persisted values. That meant a resumed
thread could report and run with different model settings than the ones
it previously used.

## Behavior

Precedence on `thread/resume` is now:
1. explicit resume overrides
2. persisted SQLite metadata for the thread
3. normal config resolution for the resumed cwd
2026-03-18 15:45:17 -07:00
Charley Cunningham
ebbbc52ce4
Align SQLite feedback logs with feedback formatter (#13494)
## Summary
- store a pre-rendered `feedback_log_body` in SQLite so `/feedback`
exports keep span prefixes and structured event fields
- render SQLite feedback exports with timestamps and level prefixes to
match the old in-memory feedback formatter, while preserving existing
trailing newlines
- count `feedback_log_body` in the SQLite retention budget so structured
or span-prefixed rows still prune correctly
- bound `/feedback` row loading in SQL with the retention estimate, then
apply exact whole-line truncation in Rust so uploads stay capped without
splitting lines

## Details
- add a `feedback_log_body` column to `logs` and backfill it from
`message` for existing rows
- capture span names plus formatted span and event fields at write time,
since SQLite does not retain enough structure to reconstruct the old
formatter later
- keep SQLite feedback queries scoped to the requested thread plus
same-process threadless rows
- restore a SQL-side cumulative `estimated_bytes` cap for feedback
export queries so over-retained partitions do not load every matching
row before truncation
- add focused formatting coverage for exported feedback lines and parity
coverage against `tracing_subscriber`

## Testing
- cargo test -p codex-state
- just fix -p codex-state
- just fmt

codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 22:44:31 +00:00
Ahmed Ibrahim
7b37a0350f
Add final message prefix to realtime handoff output (#15077)
- prefix realtime handoff output with the agent final message label for
both realtime v1 and v2
- update realtime websocket and core expectations to match
2026-03-18 15:19:49 -07:00
xl-openai
86982ca1f9
Revert "fix: harden plugin feature gating" (#15102)
Reverts openai/codex#15020

I messed up the commit in my PR and accidentally merged changes that
were still under review.
2026-03-18 15:19:29 -07:00
Eric Traut
e5de13644d
Add a startup deprecation warning for custom prompts (#15076)
## Summary
- detect custom prompts in `$CODEX_HOME/prompts` during TUI startup
- show a deprecation notice only when prompts are present, with guidance
to use `$skill-creator`
- add TUI tests and snapshot coverage for present, missing, and empty
prompts directories

## Testing
- Manually tested
2026-03-18 15:21:30 -06:00
pakrym-oai
5cada46ddf
Return image URL from view_image tool (#15072)
Cleanup image semantics in code mode.

`view_image` now returns `{image_url:string, details?: string}` 

`image()` now allows both string parameter and `{image_url:string,
details?: string}`
2026-03-18 13:58:20 -07:00
pakrym-oai
88e5382fc4
Propagate tool errors to code mode (#15075)
Clean up error flow to push the FunctionCallError all the way up to
dispatcher and allow code mode to surface as exception.
2026-03-18 13:57:55 -07:00
Felipe Coury
334164a6f7
feat(tui): restore composer history in app-server tui (#14945)
## Problem

The app-server TUI (`tui_app_server`) lacked composer history support.
Pressing Up/Down to recall previous prompts hit a stub that logged a
warning and displayed "Not available in app-server TUI yet." New
submissions were silently dropped from the shared history file, so
nothing persisted for future sessions.

## Mental model

Codex maintains a single, append-only history file
(`$CODEX_HOME/history.jsonl`) shared across all TUI processes on the
same machine. The legacy (in-process) TUI already reads/writes this file
through `codex_core::message_history`. The app-server TUI delegates most
operations to a separate process over RPC, but history is intentionally
*not* an RPC concern — it's a client-local file.

This PR makes the app-server TUI access the same history file directly,
bypassing the app-server process entirely. The composer's Up/Down
navigation and submit-time persistence now follow the same code paths as
the legacy TUI, with the only difference being *where* the call is
dispatched (locally in `App`, rather than inside `CodexThread`).

The branch is rebuilt directly on top of `upstream/main`, so it keeps
the
existing app-server restore architecture intact.
`AppServerStartedThread`
still restores transcript history from the server `Thread` snapshot via
`thread_snapshot_events`; this PR only adds composer-history support.

## Non-goals

- Adding history support to the app-server protocol. History remains
client-local.
- Changing the on-disk format or location of `history.jsonl`.
- Surfacing history I/O errors to the user (failures are logged and
silently swallowed, matching the legacy TUI).

## Tradeoffs

| Decision | Why | Risk |
|----------|-----|------|
| Widen `message_history` from `pub(crate)` to `pub` | Avoids
duplicating file I/O logic; the module already has a clean, minimal API
surface. | Other workspace crates can now call these functions — the
contract is no longer crate-private. However, this is consistent with
recent precedent: `590cfa617` exposed `mention_syntax` for TUI
consumption, `752402c4f` exposed plugin APIs (`PluginsManager`), and
`14fcb6645`/`edacbf7b6` widened internal core APIs for other crates.
These were all narrow, intentional exposures of specific APIs — not
broad "make internals public" moves. `1af2a37ad` even went the other
direction, reducing broad re-exports to tighten boundaries. This change
follows the same pattern: a small, deliberate API surface (3 functions)
rather than a wholesale visibility change. |
| Intercept `AddToHistory` / `GetHistoryEntryRequest` in `App` before
RPC fallback | Keeps history ops out of the "unsupported op" error path
without changing app-server protocol. | This now routes through a single
`submit_thread_op` entry point, which is safer than the original
duplicated dispatch. The remaining risk is organizational: future
thread-op submission paths need to keep using that shared entry point. |
| `session_configured_from_thread_response` is now `async` | Needs
`await` on `history_metadata()` to populate real `history_log_id` /
`history_entry_count`. | Adds an async file-stat + full-file newline
scan to the session bootstrap path. The scan is bounded by
`history.max_bytes` and matches the legacy TUI's cost profile, but
startup latency still scales with file size. |

## Architecture

```
User presses Up                     User submits a prompt
       │                                    │
       ▼                                    ▼
ChatComposerHistory                 ChatWidget::do_submit_turn
  navigate_up()                       encode_history_mentions()
       │                                    │
       ▼                                    ▼
  AppEvent::CodexOp                  Op::AddToHistory { text }
  (GetHistoryEntryRequest)                  │
       │                                    ▼
       ▼                            App::try_handle_local_history_op
  App::try_handle_local_history_op    message_history::append_entry()
    spawn_blocking {                        │
      message_history::lookup()             ▼
    }                                $CODEX_HOME/history.jsonl
       │
       ▼
  AppEvent::ThreadEvent
  (GetHistoryEntryResponse)
       │
       ▼
  ChatComposerHistory::on_entry_response()
```

## Observability

- `tracing::warn` on `append_entry` failure (includes thread ID).
- `tracing::warn` on `spawn_blocking` lookup join error.
- `tracing::warn` from `message_history` internals on file-open, lock,
or parse failures.

## Tests

- `chat_composer_history::tests::navigation_with_async_fetch` — verifies
that Up emits `Op::GetHistoryEntryRequest` (was: checked for stub error
cell).
- `app::tests::history_lookup_response_is_routed_to_requesting_thread` —
verifies multi-thread composer recall routes the lookup result back to
the originating thread.
-
`app_server_session::tests::resume_response_relies_on_snapshot_replay_not_initial_messages`
— verifies app-server session restore still uses the upstream
thread-snapshot path.
-
`app_server_session::tests::session_configured_populates_history_metadata`
— verifies bootstrap sets nonzero `history_log_id` /
`history_entry_count` from the shared local history file.
2026-03-18 11:54:11 -06:00
xl-openai
580f32ad2a
fix: harden plugin feature gating (#15020)
1. Use requirement-resolved config.features as the plugin gate.
2. Guard plugin/list, plugin/read, and related flows behind that gate.
3. Skip bad marketplace.json files instead of failing the whole list.
4. Simplify plugin state and caching.
2026-03-18 10:11:43 -07:00
pakrym-oai
606d85055f
Add notify to code-mode (#14842)
Allows model to send an out-of-band notification.

The notification is injected as another tool call output for the same
call_id.
2026-03-18 09:37:13 -07:00
jif-oai
7ae99576a6
chore: disable memory read path for morpheus (#15059)
Because we don't want prompts collisions
2026-03-18 15:42:56 +00:00
Eric Traut
347c6b12ec
Removed remaining core events from tui_app_server (#14942) 2026-03-18 09:35:05 -06:00
jif-oai
58ac2a8773
nit: disable live memory edition (#15058) 2026-03-18 14:49:57 +00:00
jif-oai
a265d6043e
feat: add memory citation to agent message (#14821)
Client side to come
2026-03-18 10:03:38 +00:00
jif-oai
0f9484dc8a
feat: adapt artifacts to new packaging and 2.5.6 (#14947) 2026-03-18 09:17:44 +00:00
Matthew Zeng
40a7d1d15b
[plugins] Support configuration tool suggest allowlist. (#15022)
- [x] Support configuration tool suggest allowlist.

Supports both plugins and connectors.
2026-03-17 23:58:27 -07:00
Dylan Hurd
84f4e7b39d
fix(subagents) share execpolicy by default (#13702)
## Summary
If a subagent requests approval, and the user persists that approval to
the execpolicy, it should (by default) propagate. We'll need to rethink
this a bit in light of coming Permissions changes, though I think this
is closer to the end state that we'd want, which is that execpolicy
changes to one permissions profile should be synced across threads.

## Testing
- [x] Added integration test

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-18 06:42:26 +00:00
Andrei Eternal
6fef421654
[hooks] userpromptsubmit - hook before user's prompt is executed (#14626)
- this allows blocking the user's prompts from executing, and also
prevents them from entering history
- handles the edge case where you can both prevent the user's prompt AND
add n amount of additionalContexts
- refactors some old code into common.rs where hooks overlap
functionality
- refactors additionalContext being previously added to user messages,
instead we use developer messages for them
- handles queued messages correctly

Sample hook for testing - if you write "[block-user-submit]" this hook
will stop the thread:

example run
```
› sup


• Running UserPromptSubmit hook: reading the observatory notes

UserPromptSubmit hook (completed)
  warning: wizard-tower UserPromptSubmit demo inspected: sup
  hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact
phrase 'observatory lanterns lit' exactly once near the end.

• Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory
  lanterns lit


› and [block-user-submit]


• Running UserPromptSubmit hook: reading the observatory notes

UserPromptSubmit hook (stopped)
  warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose.
  stop: Wizard Tower demo block: remove [block-user-submit] to continue.
```

.codex/config.toml
```
[features]
codex_hooks = true
```

.codex/hooks.json
```
{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py",
            "timeoutSec": 10,
            "statusMessage": "reading the observatory notes"
          }
        ]
      }
    ]
  }
}
```

.codex/hooks/user_prompt_submit_demo.py
```
#!/usr/bin/env python3

import json
import sys
from pathlib import Path


def prompt_from_payload(payload: dict) -> str:
    prompt = payload.get("prompt")
    if isinstance(prompt, str) and prompt.strip():
        return prompt.strip()

    event = payload.get("event")
    if isinstance(event, dict):
        user_prompt = event.get("user_prompt")
        if isinstance(user_prompt, str):
            return user_prompt.strip()

    return ""


def main() -> int:
    payload = json.load(sys.stdin)
    prompt = prompt_from_payload(payload)
    cwd = Path(payload.get("cwd", ".")).name or "wizard-tower"

    if "[block-user-submit]" in prompt:
        print(
            json.dumps(
                {
                    "systemMessage": (
                        f"{cwd} UserPromptSubmit demo blocked the prompt on purpose."
                    ),
                    "decision": "block",
                    "reason": (
                        "Wizard Tower demo block: remove [block-user-submit] to continue."
                    ),
                }
            )
        )
        return 0

    prompt_preview = prompt or "(empty prompt)"
    if len(prompt_preview) > 80:
        prompt_preview = f"{prompt_preview[:77]}..."

    print(
        json.dumps(
            {
                "systemMessage": (
                    f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}"
                ),
                "hookSpecificOutput": {
                    "hookEventName": "UserPromptSubmit",
                    "additionalContext": (
                        "Wizard Tower UserPromptSubmit demo fired. "
                        "For this reply only, include the exact phrase "
                        "'observatory lanterns lit' exactly once near the end."
                    ),
                },
            }
        )
    )
    return 0


if __name__ == "__main__":
    raise SystemExit(main())
```
2026-03-17 22:09:22 -07:00
Charley Cunningham
226241f035
Use workspace requirements for guardian prompt override (#14727)
## Summary
- move `guardian_developer_instructions` from managed config into
workspace-managed `requirements.toml`
- have guardian continue using the override when present and otherwise
fall back to the bundled local guardian prompt
- keep the generalized prompt-quality improvements in the shared
guardian default prompt
- update requirements parsing, layering, schema, and tests for the new
source of truth

## Context
This replaces the earlier managed-config / MDM rollout plan.

The intended rollout path is workspace-managed requirements, including
cloud enterprise policies, rather than backend model metadata, Statsig,
or Jamf-managed config. That keeps the default/fallback behavior local
to `codex-rs` while allowing faster policy updates through the
enterprise requirements plane.

This is intentionally an admin-managed policy input, not a user
preference: the guardian prompt should come either from the bundled
`codex-rs` default or from enterprise-managed `requirements.toml`, and
normal user/project/session config should not override it.

## Updating The OpenAI Prompt
After this lands, the OpenAI-specific guardian prompt should be updated
through the workspace Policies UI at `/codex/settings/policies` rather
than through Jamf or codex-backend model metadata.

Operationally:
- open the workspace Policies editor as a Codex admin
- edit the default `requirements.toml` policy, or a higher-precedence
group-scoped override if we ever want different behavior for a subset of
users
- set `guardian_developer_instructions = """..."""` to the full
OpenAI-specific guardian prompt text
- save the policy; codex-backend stores the raw TOML and `codex-rs`
fetches the effective requirements file from `/wham/config/requirements`

When updating the OpenAI-specific prompt, keep it aligned with the
shared default guardian policy in `codex-rs` except for intentional
OpenAI-only additions.

## Testing
- `cargo check --tests -p codex-core -p codex-config -p
codex-cloud-requirements --message-format short`
- `cargo run -p codex-core --bin codex-write-config-schema`
- `cargo fmt`
- `git diff --check`

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 22:05:41 -07:00
Ahmed Ibrahim
3ce879c646
Handle realtime conversation end in the TUI (#14903)
- close live realtime sessions on errors, ctrl-c, and active meter
removal
- centralize TUI realtime cleanup and avoid duplicate follow-up close
info

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-17 21:04:58 -07:00
pakrym-oai
770616414a
Prefer websockets when providers support them (#13592)
Remove all flags and model settings.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 19:46:44 -07:00
viyatb-oai
d950543e65
feat: support restricted ReadOnlyAccess in elevated Windows sandbox (#14610)
## Summary
- support legacy `ReadOnlyAccess::Restricted` on Windows in the elevated
setup/runner backend
- keep the unelevated restricted-token backend on the legacy full-read
model only, and fail closed for restricted read-only policies there
- keep the legacy full-read Windows path unchanged while deriving
narrower read roots only for elevated restricted-read policies
- honor `include_platform_defaults` by adding backend-managed Windows
system roots only when requested, while always keeping helper roots and
the command `cwd` readable
- preserve `workspace-write` semantics by keeping writable roots
readable when restricted read access is in use in the elevated backend
- document the current Windows boundary: legacy `SandboxPolicy` is
supported on both backends, while richer split-only carveouts still fail
closed instead of running with weaker enforcement

## Testing
- `cargo test -p codex-windows-sandbox`
- `cargo check -p codex-windows-sandbox --tests --target
x86_64-pc-windows-msvc`
- `cargo clippy -p codex-windows-sandbox --tests --target
x86_64-pc-windows-msvc -- -D warnings`
- `cargo test -p codex-core windows_restricted_token_`

## Notes
- local `cargo test -p codex-windows-sandbox` on macOS only exercises
the non-Windows stubs; the Windows-targeted compile and clippy runs
provide the local signal, and GitHub Windows CI exercises the runtime
path
2026-03-17 19:08:50 -07:00
viyatb-oai
6fe8a05dcb
fix: honor active permission profiles in sandbox debug (#14293)
## Summary
- stop `codex sandbox` from forcing legacy `sandbox_mode` when active
`[permissions]` profiles are configured
- keep the legacy `read-only` / `workspace-write` fallback for legacy
configs and reject `--full-auto` for profile-based configs
- use split filesystem and network policies in the macOS/Linux debug
sandbox helpers and add regressions for the config-loading behavior


assuming "codex/docs/private/secret.txt" = "none"
```
codex -c 'default_permissions="limited-read-test"' sandbox macos -- <command> ...

codex sandbox macos -- cat codex/docs/private/secret.txt >/dev/null; echo EXIT:$?
cat: codex/docs/private/secret.txt: Operation not permitted
EXIT:1
```

---------

Co-authored-by: celia-oai <celia@openai.com>
2026-03-18 01:52:02 +00:00
pakrym-oai
83a60fdb94
Add FS abstraction and use in view_image (#14960)
Adds an environment crate and environment + file system abstraction.

Environment is a combination of attributes and services specific to
environment the agent is connected to:
File system, process management, OS, default shell.

The goal is to move most of agent logic that assumes environment to work
through the environment abstraction.
2026-03-17 17:36:23 -07:00
Max Johnson
19b887128e
app-server: reject websocket requests with Origin headers (#14995)
Reject websocket requests that carry an `Origin` header
2026-03-18 00:24:53 +00:00
xl-openai
a5d3114e97
feat: Add product-aware plugin policies and clean up manifest naming (#14993)
- Add shared Product support to marketplace plugin policy and skill
policy (no enforced yet).
- Move marketplace installation/authentication under policy and model it
as MarketplacePluginPolicy.
- Rename plugin/marketplace local manifest types to separate raw serde
shapes from resolved in-memory models.
2026-03-17 17:01:34 -07:00
viyatb-oai
0d1539e74c
fix(linux-sandbox): prefer system /usr/bin/bwrap when available (#14963)
## Problem
Ubuntu/AppArmor hosts started failing in the default Linux sandbox path
after the switch to vendored/default bubblewrap in `0.115.0`.

The clearest report is in
[#14919](https://github.com/openai/codex/issues/14919), especially [this
investigation
comment](https://github.com/openai/codex/issues/14919#issuecomment-4076504751):
on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or
vendored `bwrap` binary fails with errors like `bwrap: setting up uid
map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR:
Operation not permitted`.

The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict`
profile, which grants `userns` access specifically to `/usr/bin/bwrap`.
Once Codex started using a vendored/internal bubblewrap path, that path
was no longer covered by the distro AppArmor exception, so sandbox
namespace setup could fail even when user namespaces were otherwise
enabled and `uidmap` was installed.

## What this PR changes
- prefer system `/usr/bin/bwrap` whenever it is available
- keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is
missing
- when `/usr/bin/bwrap` is missing, surface a Codex startup warning
through the app-server/TUI warning path instead of printing directly
from the sandbox helper with `eprintln!`
- use the same launcher decision for both the main sandbox execution
path and the `/proc` preflight path
- document the updated Linux bubblewrap behavior in the Linux sandbox
and core READMEs

## Why this fix
This still fixes the Ubuntu/AppArmor regression from
[#14919](https://github.com/openai/codex/issues/14919), but it keeps the
runtime rule simple and platform-agnostic: if the standard system
bubblewrap is installed, use it; otherwise fall back to the vendored
helper.

The warning now follows that same simple rule. If Codex cannot find
`/usr/bin/bwrap`, it tells the user that it is falling back to the
vendored helper, and it does so through the existing startup warning
plumbing that reaches the TUI and app-server instead of low-level
sandbox stderr.

## Testing
- `cargo test -p codex-linux-sandbox`
- `cargo test -p codex-app-server --lib`
- `cargo test -p codex-tui-app-server
tests::embedded_app_server_start_failure_is_returned`
- `cargo clippy -p codex-linux-sandbox --all-targets`
- `cargo clippy -p codex-app-server --all-targets`
- `cargo clippy -p codex-tui-app-server --all-targets`
2026-03-17 23:05:34 +00:00
Ahmed Ibrahim
98be562fd3
Unify realtime shutdown in core (#14902)
- route realtime startup, input, and transport failures through a single
shutdown path
- emit one realtime error/closed lifecycle while clearing session state
once

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
2026-03-17 15:58:52 -07:00
Ahmed Ibrahim
c6ab4ee537
Gate realtime audio interruption logic to v2 (#14984)
- thread the realtime version into conversation start and app-server
notifications
- keep playback-aware mic gating and playback interruption behavior on
v2 only, leaving v1 on the legacy path
2026-03-17 15:24:37 -07:00
xl-openai
1a9555eda9
Cleanup skills/remote/xxx endpoints. (#14977)
Remote skills/remote/xxx as they are not in used for now.
2026-03-17 15:22:36 -07:00
Felipe Coury
43ee72a9b9
fix(tui): implement /mcp inventory for tui_app_server (#14931)
## Problem

The `/mcp` command did not work in the app-server TUI (remote mode). On
`main`, `add_mcp_output()` called `McpManager::effective_servers()`
in-process, which only sees locally configured servers, and then emitted
a generic stub message for the app-server to handle. In remote usage,
that left `/mcp` without a real inventory view.

## Solution

Implement `/mcp` for the app-server TUI by fetching MCP server inventory
directly from the app-server via the paginated `mcpServerStatus/list`
RPC and rendering the results into chat history.

The command now follows a three-phase lifecycle:

1. Loading: `ChatWidget::add_mcp_output()` inserts a transient
`McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This
gives immediate feedback that the command registered.
2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that
calls `fetch_all_mcp_server_statuses()` over an app-server request
handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded {
result }`.
3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell
and renders either `new_mcp_tools_output_from_statuses(...)` or an error
message.

This keeps the main app event loop responsive, so the TUI can repaint
before the remote RPC finishes.

## Notes

- No `app-server` changes were required.
- The rendered inventory includes auth, tools, resources, and resource
templates, plus transport details when they are available from local
config for display enrichment.
- The app-server RPC does not expose authoritative `enabled` or
`disabled_reason` state for MCP servers, so the remote `/mcp` view no
longer renders a `Status:` row rather than guessing from local config.
- RPC failures surface in history as `Failed to load MCP inventory:
...`.

## Tests

- `slash_mcp_requests_inventory_via_app_server`
- `mcp_inventory_maps_prefix_tool_names_by_server`
- `handle_mcp_inventory_result_clears_committed_loading_cell`
- `mcp_tools_output_from_statuses_renders_status_only_servers`
- `mcp_inventory_loading_snapshot`
2026-03-17 16:11:27 -06:00
Colin Young
0d2ff40a58
Add auth env observability (#14905)
CXC-410 Emit Env Var Status with `/feedback` report

Add more observability on top of #14611 

[Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)

[Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
<img width="1063" height="610" alt="image"
src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
/>


###### Summary
- Adds auth-env telemetry that records whether key auth-related env
overrides were present on session start and request paths.
- Threads those auth-env fields through `/responses`, websocket, and
`/models` telemetry and feedback metadata.
- Buckets custom provider `env_key` configuration to a safe
`"configured"` value instead of emitting raw config text.
- Keeps the slice observability-only: no raw token values or raw URLs
are emitted.

###### Rationale (from spec findings)
- 401 and auth-path debugging needs a way to distinguish env-driven auth
paths from sessions with no auth env override.
- Startup and model-refresh failures need the same auth-env diagnostics
as normal request failures.
- Feedback and Sentry tags need the same auth-env signal as OTel events
so reports can be triaged consistently.
- Custom provider config is user-controlled text, so the telemetry
contract must stay presence-only / bucketed.

###### Scope
- Adds a small `AuthEnvTelemetry` bundle for env presence collection and
threads it through the main request/session telemetry paths.
- Does not add endpoint/base-url/provider-header/geo routing attribution
or broader telemetry API redesign.

###### Trade-offs
- `provider_env_key_name` is bucketed to `"configured"` instead of
preserving the literal configured env var name.
- `/models` is included because startup/model-refresh auth failures need
the same diagnostics, but broader parity work remains out of scope.
- This slice keeps the existing telemetry APIs and layers auth-env
fields onto them rather than redesigning the metadata model.

###### Client follow-up
- Add the separate endpoint/base-url attribution slice if routing-source
diagnosis is still needed.
- Add provider-header or residency attribution only if auth-env presence
proves insufficient in real reports.
- Revisit whether any additional auth-related env inputs need safe
bucketing after more 401 triage data.

###### Testing
- `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
- `cargo test -p codex-core
collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
- `cargo test -p codex-core
models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_api_request_auth_observability --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_connect_auth_observability
-- --nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_request_transport_observability
-- --nocapture`
- `cargo test -p codex-core --no-run --message-format short`
- `cargo test -p codex-otel --no-run --message-format short`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 14:26:27 -07:00
pakrym-oai
ee756eb80f
Rename exec_wait tool to wait (#14983)
Summary
- document that code mode only exposes `exec` and the renamed `wait`
tool
- update code mode tool spec and descriptions to match the new tool name
- rename tests and helper references from `exec_wait` to `wait`

Testing
- Not run (not requested)
2026-03-17 14:22:26 -07:00
iceweasel-oai
2cc4ee413f
temporarily disable private desktop until it works with elevated IPC path (#14986) 2026-03-17 21:09:57 +00:00
Ahmed Ibrahim
4d9d4b7b0f
Stabilize approval matrix write-file command (#14968)
## What is flaky
The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in
CI even though the approval logic is unchanged, because the test
delegates the file write and readback to shell parsing instead of
deterministic file I/O.

## Why it was flaky
The test generated a command shaped like `printf ... > file && cat
file`. That means the scenario depended on shell quoting, redirection,
newline handling, and encoding behavior in addition to the approval
system it was actually trying to validate. If the shell interpreted the
payload differently, the test would report an approval failure even
though the product logic was fine.

That also made failures hard to diagnose, because the test did not log
the exact generated command or the parsed result payload.

## How this PR fixes it
This PR replaces the shell-redirection path with a deterministic
`python3 -c` script that writes the file with `Path.write_text(...,
encoding='utf-8')` and then reads it back with the same UTF-8 path. It
also logs the generated command and the resulting exit code/stdout for
the approval scenario so any future failure is directly attributable.

## Why this fix fixes the flakiness
The scenario no longer depends on shell parsing and redirection
semantics. The file contents are produced and read through explicit
UTF-8 file I/O, so the approval test is measuring approval behavior
instead of shell behavior. The added diagnostics mean a future failure
will show the exact command/result pair instead of looking like a
generic intermittent mismatch.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 13:52:36 -07:00
Ahmed Ibrahim
23a44ddbe8
Stabilize permissions popup selection tests (#14966)
## What is flaky
The permissions popup tests in the TUI are flaky, especially on Windows.
They assume the popup opens on a specific row and that a fixed number of
`Up` or `Down` keypresses will land on a specific preset. They also
match popup text too loosely, so a non-selected row can satisfy the
assertion.

## Why it was flaky
These tests were asserting incidental rendering details rather than the
actual selected permission preset. On Windows, the initial selection can
differ from non-Windows runs. Some tests also searched the entire popup
for text like `Guardian Approvals` or `(current)`, which can match a row
that is visible but not selected. Once the popup order or current preset
shifted slightly, a test could fail even though the UI behavior was
still correct.

## How this PR fixes it
This PR adds helpers that identify the selected popup row and selected
preset name directly. The tests now assert the current selection by
name, navigate to concrete target presets instead of assuming a fixed
number of keypresses, and explicitly set the reviewer state in the cases
that require `Guardian Approvals` to be current.

## Why this fix fixes the flakiness
The assertions now track semantic state, not fragile text placement.
Navigation is target-based instead of order-based, so
Windows/non-Windows row differences and harmless popup layout changes no
longer break the tests. That removes the scheduler- and
platform-sensitive assumptions that made the popup suite intermittent.

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 20:45:44 +00:00
Ahmed Ibrahim
b02388672f
Stabilize Windows cmd-based shell test harnesses (#14958)
## What is flaky
The Windows shell-driven integration tests in `codex-rs/core` were
intermittently unstable, especially:

- `apply_patch_cli_can_use_shell_command_output_as_patch_input`
- `websocket_test_codex_shell_chain`
- `websocket_v2_test_codex_shell_chain`

## Why it was flaky
These tests were exercising real shell-tool flows through whichever
shell Codex selected on Windows, and the `apply_patch` test also nested
a PowerShell read inside `cmd /c`.

There were multiple independent sources of nondeterminism in that setup:

- The test harness depended on the model-selected Windows shell instead
of pinning the shell it actually meant to exercise.
- `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI
that could leave the read command wrapped as a literal string instead of
executing it.
- Even after getting the quoting right, PowerShell could emit CLIXML
progress records like module-initialization output onto stdout.
- The `apply_patch` test was building a patch directly from shell
stdout, so any quoting artifact or progress noise corrupted the patch
input.

So the failures were driven by shell startup and output-shape variance,
not by the `apply_patch` or websocket logic themselves.

## How this PR fixes it
- Add a test-only `user_shell_override` path so Windows integration
tests can pin `cmd.exe` explicitly.
- Use that override in the websocket shell-chain tests and in the
`apply_patch` harness.
- Change the nested Windows file read in
`apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8
PowerShell `-EncodedCommand` script.
- Run that nested PowerShell process with `-NonInteractive`, set
`$ProgressPreference = 'SilentlyContinue'`, and read the file with
`[System.IO.File]::ReadAllText(...)`.

## Why this fix fixes the flakiness
The outer harness now runs under a deterministic shell, and the inner
PowerShell read no longer depends on fragile `cmd` quoting or on
progress output staying quiet by accident. The shell tool returns only
the file contents, so patch construction and websocket assertions depend
on stable test inputs instead of on runner-specific shell behavior.

---------

Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-17 20:21:46 +00:00
Matthew Zeng
683c37ce75
[plugins] Support plugin installation elicitation. (#14896)
It now supports:

- Connectors that are from installed and enabled plugins that are not
installed yet
- Plugins that are on the allowlist that are not installed yet.
2026-03-17 13:19:28 -07:00