Commit graph

567 commits

Author SHA1 Message Date
Charley Cunningham
bc24017d64
Add Smart Approvals guardian review across core, app-server, and TUI (#13860)
## Summary
- add `approvals_reviewer = "user" | "guardian_subagent"` as the runtime
control for who reviews approval requests
- route Smart Approvals guardian review through core for command
execution, file changes, managed-network approvals, MCP approvals, and
delegated/subagent approval flows
- expose guardian review in app-server with temporary unstable
`item/autoApprovalReview/{started,completed}` notifications carrying
`targetItemId`, `review`, and `action`
- update the TUI so Smart Approvals can be enabled from `/experimental`,
aligned with the matching `/approvals` mode, and surfaced clearly while
reviews are pending or resolved

## Runtime model
This PR does not introduce a new `approval_policy`.

Instead:
- `approval_policy` still controls when approval is needed
- `approvals_reviewer` controls who reviewable approval requests are
routed to:
  - `user`
  - `guardian_subagent`

`guardian_subagent` is a carefully prompted reviewer subagent that
gathers relevant context and applies a risk-based decision framework
before approving or denying the request.

The `smart_approvals` feature flag is a rollout/UI gate. Core runtime
behavior keys off `approvals_reviewer`.

When Smart Approvals is enabled from the TUI, it also switches the
current `/approvals` settings to the matching Smart Approvals mode so
users immediately see guardian review in the active thread:
- `approval_policy = on-request`
- `approvals_reviewer = guardian_subagent`
- `sandbox_mode = workspace-write`

Users can still change `/approvals` afterward.

Config-load behavior stays intentionally narrow:
- plain `smart_approvals = true` in `config.toml` remains just the
rollout/UI gate and does not auto-set `approvals_reviewer`
- the deprecated `guardian_approval = true` alias migration does
backfill `approvals_reviewer = "guardian_subagent"` in the same scope
when that reviewer is not already configured there, so old configs
preserve their original guardian-enabled behavior

ARC remains a separate safety check. For MCP tool approvals, ARC
escalations now flow into the configured reviewer instead of always
bypassing guardian and forcing manual review.

## Config stability
The runtime reviewer override is stable, but the config-backed
app-server protocol shape is still settling.

- `thread/start`, `thread/resume`, and `turn/start` keep stable
`approvalsReviewer` overrides
- the config-backed `approvals_reviewer` exposure returned via
`config/read` (including profile-level config) is now marked
`[UNSTABLE]` / experimental in the app-server protocol until we are more
confident in that config surface

## App-server surface
This PR intentionally keeps the guardian app-server shape narrow and
temporary.

It adds generic unstable lifecycle notifications:
- `item/autoApprovalReview/started`
- `item/autoApprovalReview/completed`

with payloads of the form:
- `{ threadId, turnId, targetItemId, review, action? }`

`review` is currently:
- `{ status, riskScore?, riskLevel?, rationale? }`
- where `status` is one of `inProgress`, `approved`, `denied`, or
`aborted`

`action` carries the guardian action summary payload from core when
available. This lets clients render temporary standalone pending-review
UI, including parallel reviews, even when the underlying tool item has
not been emitted yet.

These notifications are explicitly documented as `[UNSTABLE]` and
expected to change soon.

This PR does **not** persist guardian review state onto `thread/read`
tool items. The intended follow-up is to attach guardian review state to
the reviewed tool item lifecycle instead, which would improve
consistency with manual approvals and allow thread history / reconnect
flows to replay guardian review state directly.

## TUI behavior
- `/experimental` exposes the rollout gate as `Smart Approvals`
- enabling it in the TUI enables the feature and switches the current
session to the matching Smart Approvals `/approvals` mode
- disabling it in the TUI clears the persisted `approvals_reviewer`
override when appropriate and returns the session to default manual
review when the effective reviewer changes
- `/approvals` still exposes the reviewer choice directly
- the TUI renders:
- pending guardian review state in the live status footer, including
parallel review aggregation
  - resolved approval/denial state in history

## Scope notes
This PR includes the supporting core/runtime work needed to make Smart
Approvals usable end-to-end:
- shell / unified-exec / apply_patch / managed-network / MCP guardian
review
- delegated/subagent approval routing into guardian review
- guardian review risk metadata and action summaries for app-server/TUI
- config/profile/TUI handling for `smart_approvals`, `guardian_approval`
alias migration, and `approvals_reviewer`
- a small internal cleanup of delegated approval forwarding to dedupe
fallback paths and simplify guardian-vs-parent approval waiting (no
intended behavior change)

Out of scope for this PR:
- redesigning the existing manual approval protocol shapes
- persisting guardian review state onto app-server `ThreadItem`s
- delegated MCP elicitation auto-review (the current delegated MCP
guardian shim only covers the legacy `RequestUserInput` path)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 15:27:00 -07:00
pakrym-oai
cb7d8f45a1
Normalize MCP tool names to code-mode safe form (#14605)
Code mode doesn't allow `-` in names and it's better if function names
and code-mode names are the same.
2026-03-13 14:50:16 -07:00
Ruslan Nigmatullin
f8f82bfc2b
app-server: add v2 filesystem APIs (#14245)
Add a protocol-level filesystem surface to the v2 app-server so Codex
clients can read and write files, inspect directories, and subscribe to
path changes without relying on host-specific helpers.

High-level changes:
- define the new v2 fs/readFile, fs/writeFile, fs/createDirectory,
fs/getMetadata, fs/readDirectory, fs/remove, fs/copy RPCs
- implement the app-server handlers, including absolute-path validation,
base64 file payloads, recursive copy/remove semantics
- document the API, regenerate protocol schemas/types, and add
end-to-end tests for filesystem operations, copy edge cases

Testing plan:
- validate protocol serialization and generated schema output for the
new fs request, response, and notification types
- run app-server integration coverage for file and directory CRUD paths,
metadata/readDirectory responses, copy failure modes, and absolute-path
validation
2026-03-13 14:42:20 -07:00
Owen Lin
014e19510d
feat(app-server, core): add more spans (#14479)
## Description

This PR expands tracing coverage across app-server thread startup, core
session initialization, and the Responses transport layer. It also gives
core dispatch spans stable operation-specific names so traces are easier
to follow than the old generic `submission_dispatch` spans.

Also use `fmt::Display` for types that we serialize in traces so we send
strings instead of rust types
2026-03-13 13:16:33 -07:00
canvrno-oai
914f7c7317
Override local apps settings with requirements.toml settings (#14304)
This PR changes app and connector enablement when `requirements.toml` is
present locally or via remote configuration.

For apps.* entries:
- `enabled = false` in `requirements.toml` overrides the user’s local
`config.toml` and forces the app to be disabled.
- `enabled = true` in `requirements.toml` does not re-enable an app the
user has disabled in config.toml.

This behavior applies whether or not the user has an explicit entry for
that app in `config.toml`. It also applies to cloud-managed policies and
configurations when the admin sets the override through
`requirements.toml`.

Scenarios tested and verified:
- Remote managed, user config (present) override
- Admin-defined policies & configurations include a connector override:
  `[apps.<appID>]
enabled = false`
- User's config.toml has the same connector configured with `enabled =
true`
  - TUI/App should show connector as disabled
  - Connector should be unavailable for use in the composer
  
- Remote managed, user config (absent) override
- Admin-defined policies & configurations include a connector override:
  `[apps.<appID>]
enabled = false`
  - User's config.toml has no entry for the the same connector
  - TUI/App should show connector as disabled
  - Connector should be unavailable for use in the composer
  
- Locally managed, user config (present) override
  - Local requirements.toml includes a connector override:
  `[apps.<appID>]
enabled = false`
- User's config.toml has the same connector configured with `enabled =
true`
  - TUI/App should show connector as disabled
  - Connector should be unavailable for use in the composer

- Locally managed, user config (absent) override
  - Local requirements.toml includes a connector override:
  `[apps.<appID>]
enabled = false`
  - User's config.toml has no entry for the the same connector
  - TUI/App should show connector as disabled
  - Connector should be unavailable for use in the composer




<img width="1446" height="753" alt="image"
src="https://github.com/user-attachments/assets/61c714ca-dcca-4952-8ad2-0afc16ff3835"
/>
<img width="595" height="233" alt="image"
src="https://github.com/user-attachments/assets/7c8ab147-8fd7-429a-89fb-591c21c15621"
/>
2026-03-13 12:40:24 -07:00
Ruslan Nigmatullin
50558e6507
app-server: Add platform os and family to init response (#14527)
This allows the client to pick os-specific behavior while interacting
with the app server, e.g. to use proper path separators.
2026-03-13 19:07:54 +00:00
Ahmed Ibrahim
3aabce9e0a
Unify realtime v1/v2 session config (#14606)
## Summary
- unify realtime websocket settings under `[realtime]` (`version` and
`type`)
- remove `realtime_conversation_v2` and select parser/session mode from
config

## Testing
- not run (per request)

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-13 11:35:38 -07:00
Eric Traut
9dba7337f2
Start TUI on embedded app server (#14512)
This PR is part of the effort to move the TUI on top of the app server.
In a previous PR, we introduced an in-process app server and moved
`exec` on top of it.

For the TUI, we want to do the migration in stages. The app server
doesn't currently expose all of the functionality required by the TUI,
so we're going to need to support a hybrid approach as we make the
transition.

This PR changes the TUI initialization to instantiate an in-process app
server and access its `AuthManager` and `ThreadManager` rather than
constructing its own copies. It also adds a placeholder TUI event
handler that will eventually translate app server events into TUI
events. App server notifications are accepted but ignored for now. It
also adds proper shutdown of the app server when the TUI terminates.
2026-03-13 12:04:41 -06:00
iceweasel-oai
6b3d82daca
Use a private desktop for Windows sandbox instead of Winsta0\Default (#14400)
## Summary
- launch Windows sandboxed children on a private desktop instead of
`Winsta0\Default`
- make private desktop the default while keeping
`windows.sandbox_private_desktop=false` as the escape hatch
- centralize process launch through the shared
`create_process_as_user(...)` path
- scope the private desktop ACL to the launching logon SID

## Why
Today sandboxed Windows commands run on the visible shared desktop. That
leaves an avoidable same-desktop attack surface for window interaction,
spoofing, and related UI/input issues. This change moves sandboxed
commands onto a dedicated per-launch desktop by default so the sandbox
no longer shares `Winsta0\Default` with the user session.

The implementation stays conservative on security with no silent
fallback back to `Winsta0\Default`

If private-desktop setup fails on a machine, users can still opt out
explicitly with `windows.sandbox_private_desktop=false`.

## Validation
- `cargo build -p codex-cli`
- elevated-path `codex exec` desktop-name probe returned
`CodexSandboxDesktop-*`
- elevated-path `codex exec` smoke sweep for shell commands, nested
`pwsh`, jobs, and hidden `notepad` launch
- unelevated-path full private-desktop compatibility sweep via `codex
exec` with `-c windows.sandbox=unelevated`
2026-03-13 10:13:39 -07:00
Jack Mousseau
7c7e267501
Simplify permissions available in request permissions tool (#14529) 2026-03-12 21:13:17 -07:00
alexsong-oai
650beb177e
Refactor cloud requirements error and surface in JSON-RPC error (#14504)
Refactors cloud requirements error handling to carry structured error
metadata and surfaces that metadata through JSON-RPC config-load
failures, including:
* adds typed CloudRequirementsLoadErrorCode values plus optional
statusCode
* marks thread/start, thread/resume, and thread/fork config failures
with structured cloud-requirements error data
2026-03-13 03:30:51 +00:00
alexsong-oai
1a363d5fcf
Add plugin usage telemetry (#14531)
adding metrics including: 
* plugin used
* plugin installed/uninstalled
* plugin enabled/disabled
2026-03-12 19:22:30 -07:00
xl-openai
1ea69e8d50
feat: add plugin/read. (#14445)
return more information for a specific plugin.
2026-03-12 16:52:21 -07:00
Jack Mousseau
b7dba72dbd
Rename reject approval policy to granular (#14516) 2026-03-12 16:38:04 -07:00
Anton Panasenko
651717323c
feat(search_tool): gate search_tool on model supports_search_tool field (#14502) 2026-03-12 16:03:50 -07:00
Jack Mousseau
a314c7d3ae
Decouple request permissions feature and tool (#14426) 2026-03-12 14:47:08 -07:00
Owen Lin
d3e6680531
fix turn_start_jsonrpc_span_parents_core_turn_spans flakiness (#14490)
This makes the test less flaky by checking the core invariant instead of
the full span chain.

Before, the test waited for several specific internal spans
(`submission_dispatch`, `session_task.turn`, `run_turn`) and asserted
their exact relationships. That was brittle because those spans are
exported asynchronously and are more of an implementation detail than
the thing we actually care about.

Now, the test only checks that:
- `turn/start` is on the expected remote trace with the expected remote
parent
- at least one representative core turn span on that same trace descends
from it

That keeps the sanity-check we want while making the test less sensitive
to timing and internal refactors.
2026-03-12 12:16:56 -07:00
jgershen-oai
3e96c867fe
use scopes_supported for OAuth when present on MCP servers (#14419)
Fixes [#8889](https://github.com/openai/codex/issues/8889).

## Summary
- Discover and use advertised MCP OAuth `scopes_supported` when no
explicit or configured scopes are present.
- Apply the same scope precedence across `mcp add`, `mcp login`, skill
dependency auto-login, and app-server MCP OAuth login.
- Keep discovered scopes ephemeral and non-persistent.
- Retry once without scopes for CLI and skill auto-login flows if the
OAuth provider rejects discovered scopes.

## Motivation
Some MCP servers advertise the scopes they expect clients to request
during OAuth, but Codex was ignoring that metadata and typically
starting OAuth with no scopes unless the user manually passed `--scopes`
or configured `server.scopes`.

That made compliant MCP servers harder to use out of the box and is the
behavior described in
[#8889](https://github.com/openai/codex/issues/8889).

This change also brings our behavior in line with the MCP authorization
spec's scope selection guidance:

https://modelcontextprotocol.io/specification/2025-11-25/basic/authorization#scope-selection-strategy

## Behavior
Scope selection now follows this order everywhere:
1. Explicit request scopes / CLI `--scopes`
2. Configured `server.scopes`
3. Discovered `scopes_supported`
4. Legacy empty-scope behavior

Compatibility notes:
- Existing working setups keep the same behavior because explicit and
configured scopes still win.
- Discovered scopes are never written back into config or token storage.
- If discovery is missing, malformed, or empty, behavior falls back to
the previous empty-scope path.
- App-server login gets the same precedence rules, but does not add a
transparent retry path in this change.

## Implementation
- Extend streamable HTTP OAuth discovery to parse and normalize
`scopes_supported`.
- Add a shared MCP scope resolver in `core` so all login entrypoints use
the same precedence rules.
- Preserve provider callback errors from the OAuth flow so CLI/skill
flows can safely distinguish provider rejections from other failures.
- Reuse discovered scopes from the existing OAuth support check where
possible instead of persisting new config.
2026-03-12 11:57:06 -07:00
gabec-openai
4fa7d6f444
Handle malformed agent role definitions nonfatally (#14488)
## Summary
- make malformed agent role definitions nonfatal during config loading
- drop invalid agent roles and record warnings in `startup_warnings`
- forward startup warnings through app-server `configWarning`
notifications

## Testing
- `cargo test -p codex-core agent_role_ -- --nocapture`
- `just fix -p codex-core`
- `just fmt`
- `cargo test -p codex-app-server config_warning -- --nocapture`

Co-authored-by: Codex <noreply@openai.com>
2026-03-12 11:20:31 -07:00
viyatb-oai
04892b4ceb
refactor: make bubblewrap the default Linux sandbox (#13996)
## Summary
- make bubblewrap the default Linux sandbox and keep
`use_legacy_landlock` as the only override
- remove `use_linux_sandbox_bwrap` from feature, config, schema, and
docs surfaces
- update Linux sandbox selection, CLI/config plumbing, and related
tests/docs to match the new default
- fold in the follow-up CI fixes for request-permissions responses and
Linux read-only sandbox error text
2026-03-11 23:31:18 -07:00
xl-openai
b5f927b973
feat: refactor on openai-curated plugins. (#14427)
- Curated repo sync now uses GitHub HTTP, not local git.
- Curated plugin cache/versioning now uses commit SHA instead of local.
- Startup sync now always repairs or refreshes curated plugin cache from
tmp (auto update to the lastest)
2026-03-11 23:18:58 -07:00
sayan-oai
917c2df201
chore: use AVAILABLE and ON_INSTALL as default plugin install and auth policies (#14407)
make `AVAILABLE` the default plugin installPolicy when unset in
`marketplace.json`. similarly, make `ON_INSTALL` the default authPolicy.

this means, when unset, plugins are available to be installed (but not
auto-installed), and the contained connectors will be authed at
install-time.

updated tests.
2026-03-11 20:33:17 -07:00
Owen Lin
5bc82c5b93
feat(app-server): propagate traces across tasks and core ops (#14387)
## Summary

This PR keeps app-server RPC request trace context alive for the full
lifetime of the work that request kicks off (e.g. for `thread/start`,
this is `app-server rpc handler -> tokio background task -> core op
submissions`). Previously we lose trace lineage once the request handler
returns or hands work off to background tasks.

This approach is especially relevant for `thread/start` and other RPC
handlers that run in a non-blocking way. In the near future we'll most
likely want to make all app-server handlers run in a non-blocking way by
default, and only queue operations that must operate in order (e.g.
thread RPCs per thread?), so we want to make sure tracing in app-server
just generally works.

Depends on https://github.com/openai/codex/pull/14300

**Before**
<img width="155" height="207" alt="image"
src="https://github.com/user-attachments/assets/c9487459-36f1-436c-beb7-fafeb40737af"
/>


**After**
<img width="299" height="337" alt="image"
src="https://github.com/user-attachments/assets/727392b2-d072-4427-9dc4-0502d8652dea"
/>

## What changed

- Keep request-scoped trace context around until we send the final
response or error, or the connection closes.
- Thread that trace context through detached `thread/start` work so
background startup stays attached to the originating request.
- Pass request trace context through to downstream core operations,
including:
  - thread creation
  - resume/fork flows
  - turn submission
  - review
  - interrupt
  - realtime conversation operations
- Add tracing tests that verify:
  - remote W3C trace context is preserved for `thread/start`
  - remote W3C trace context is preserved for `turn/start`
  - downstream core spans stay under the originating request span
  - request-scoped tracing state is cleaned up correctly
- Clean up shutdown behavior so detached background tasks and spawned
threads are drained before process exit.
2026-03-11 20:18:31 -07:00
Ahmed Ibrahim
bf5e997b31
Include spawn agent model metadata in app-server items (#14410)
- add model and reasoning effort to app-server collab spawn items and
notifications
- regenerate app-server protocol schemas for the new fields

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-11 19:25:21 -07:00
Anton Panasenko
77b0c75267
feat: search_tool migrate to bring you own tool of Responses API (#14274)
## Why

to support a new bring your own search tool in Responses
API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search)
we migrating our bm25 search tool to use official way to execute search
on client and communicate additional tools to the model.

## What
- replace the legacy `search_tool_bm25` flow with client-executed
`tool_search`
- add protocol, SSE, history, and normalization support for
`tool_search_call` and `tool_search_output`
- return namespaced Codex Apps search results and wire namespaced
follow-up tool calls back into MCP dispatch
2026-03-11 17:51:51 -07:00
Owen Lin
72631755e0
chore(app-server): stop emitting codex/event/ notifications (#14392)
## Description

This PR stops emitting legacy `codex/event/*` notifications from the
public app-server transports.

It's been a long time coming! app-server was still producing a raw
notification stream from core, alongside the typed app-server
notifications and server requests, for compatibility reasons. Now,
external clients should no longer be depending on those legacy
notifications, so this change removes them from the stdio and websocket
contract and updates the surrounding docs, examples, and tests to match.

### Caveat
I left the "in-process" version of app-server alone for now, since
`codex exec` was recently based on top of app-server via this in-process
form here: https://github.com/openai/codex/pull/14005

Seems like `codex exec` still consumes some legacy notifications
internally, so this branch only removes `codex/event/*` from app-server
over stdio and websockets.

## Follow-up

Once `codex exec` is fully migrated off `codex/event/*` notifications,
we'll be able to stop emitting them entirely entirely instead of just
filtering it at the external transport boundary.
2026-03-12 00:45:20 +00:00
sayan-oai
7b2cee53db chore: wire through plugin policies + category from marketplace.json (#14305)
wire plugin marketplace metadata through app-server endpoints:
- `plugin/list` has `installPolicy` and `authPolicy`
- `plugin/install` has plugin-level `authPolicy`

`plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when
installing.


added tests.
2026-03-11 12:33:10 -07:00
Owen Lin
fa1242c83b fix(otel): make HTTP trace export survive app-server runtimes (#14300)
## Summary

This PR fixes OTLP HTTP trace export in runtimes where the previous
exporter setup was unreliable, especially around app-server usage. It
also removes the old `codex_otel::otel_provider` compatibility shim and
switches remaining call sites over to the crate-root
`codex_otel::OtelProvider` export.

## What changed

- Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
- Add an async HTTP client path for trace export when we are already
inside a multi-thread Tokio runtime.
- Make provider shutdown flush traces before tearing down the tracer
provider.
- Add loopback coverage that verifies traces are actually sent to
`/v1/traces`:
  - outside Tokio
  - inside a multi-thread Tokio runtime
  - inside a current-thread Tokio runtime
- Remove the `codex_otel::otel_provider` shim and update remaining
imports.

## Why

I hit cases where spans were being created correctly but never made it
to the collector. The issue turned out to be in exporter/runtime
behavior rather than the span plumbing itself. This PR narrows that gap
and gives us regression coverage for the actual export path.
2026-03-11 12:33:10 -07:00
pakrym-oai
548583198a Allow bool web_search in ToolsToml (#14352)
Summary
- add a custom deserializer so `[tools].web_search` can be a bool
(treated as disabled) or a config object
- extend core and app-server tests to cover bool handling in TOML config

Testing
- Not run (not requested)
2026-03-11 12:33:10 -07:00
Celia Chen
c1a424691f chore: add a separate reject-policy flag for skill approvals (#14271)
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
2026-03-11 12:33:09 -07:00
Leo Shimonaka
889b4796fc feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155)
Add additional macOS Sandbox Permissions levers for the following:

- Launch Services
- Contacts
- Reminders
2026-03-11 12:33:09 -07:00
joeytrasatti-openai
8ac27b2a16 Add ephemeral flag support to thread fork (#14248)
### Summary
This PR adds first-class ephemeral support to thread/fork, bringing it
in line with thread/start. The goal is to support one-off completions on
full forked threads without persisting them as normal user-visible
threads.

### Testing
2026-03-11 12:33:08 -07:00
Dylan Hurd
d5694529ca app-server: propagate nested experimental gating for AskForApproval::Reject (#14191)
## Summary
This change makes `AskForApproval::Reject` gate correctly anywhere it
appears inside otherwise-stable app-server protocol types.

Previously, experimental gating for `approval_policy: Reject` was
handled with request-specific logic in `ClientRequest` detection. That
covered a few request params types, but it did not generalize to other
nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or
`ConfigRequirements`.

This PR replaces that ad hoc handling with a generic nested experimental
propagation mechanism.

## Testing

seeing this when run app-server-test-client without experimental api
enabled:
```
 initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" }
> {
>   "id": "50244f6a-270a-425d-ace0-e9e98205bde7",
>   "method": "thread/start",
>   "params": {
>     "approvalPolicy": {
>       "reject": {
>         "mcp_elicitations": false,
>         "request_permissions": true,
>         "rules": false,
>         "sandbox_approval": true
>       }
>     },
>     "baseInstructions": null,
>     "config": null,
>     "cwd": null,
>     "developerInstructions": null,
>     "dynamicTools": null,
>     "ephemeral": null,
>     "experimentalRawEvents": false,
>     "mockExperimentalField": null,
>     "model": null,
>     "modelProvider": null,
>     "persistExtendedHistory": false,
>     "personality": null,
>     "sandbox": null,
>     "serviceName": null
>   }
> }
< {
<   "error": {
<     "code": -32600,
<     "message": "askForApproval.reject requires experimentalApi capability"
<   },
<   "id": "50244f6a-270a-425d-ace0-e9e98205bde7"
< }
[verified] thread/start rejected approvalPolicy=Reject without experimentalApi
```

---------

Co-authored-by: celia-oai <celia@openai.com>
2026-03-11 12:33:08 -07:00
xl-openai
d751e68f44 feat: Allow sync with remote plugin status. (#14176)
Add forceRemoteSync to plugin/list.
When it is set to True, we will sync the local plugin status with the
remote one (backend-api/plugins/list).
2026-03-11 12:33:08 -07:00
guinness-oai
4ac6042850 Mark incomplete resumed turns interrupted when idle (#14125)
Fixes a Codex app bug where quitting the app mid-run could leave the
reopened thread stuck in progress and non-interactable. On cold thread
resume, app-server could return an idle thread with a replayed turn
still marked in progress. This marks incomplete replayed turns as
interrupted unless the thread is actually active.
2026-03-11 12:33:07 -07:00
Eric Traut
f9cba5cb16 Log ChatGPT user ID for feedback tags (#13901)
There are some bug investigations that currently require us to ask users
for their user ID even though they've already uploaded logs and session
details via `/feedback`. This frustrates users and increases the time
for diagnosis.

This PR includes the ChatGPT user ID in the metadata uploaded for
`/feedback` (both the TUI and app-server).
2026-03-11 12:33:07 -07:00
Ahmed Ibrahim
f3f47cf455
Stabilize app-server notify initialize test (#13939)
## What changed
- This PR changes only the flaky test setup for
`turn_start_notify_payload_includes_initialize_client_name`.
- Instead of shelling out to `python3` to write the notify payload, the
test uses the first-party `codex-app-server-test-notify-capture` helper.
- The helper writes `notify.json` atomically and the test waits for the
file to exist before reading it.

## Why this fixes the flake
- The old test depended on an external Python interpreter being present
and behaving consistently on every CI runner.
- It also raced the file write: the test could observe the path before
the payload had been fully written, which produced partial reads and
intermittent assertion failures.
- Moving the write into a repo-owned helper removes the external
dependency, and atomic write-plus-wait makes the handoff deterministic.

## Scope
- Test-only change.
2026-03-09 23:41:58 -07:00
Ahmed Ibrahim
b39ae9501f
Stabilize websocket test server binding (#14002)
## Summary
- stop reserving a localhost port in the websocket tests before spawning
the server
- let the app-server bind `127.0.0.1:0` itself and read back the actual
bound websocket address from stderr
- update the websocket test helpers and callers to use the discovered
address

## Why this fixes the flake
The previous harness reserved a port in the test process, dropped it,
and then asked the server process to bind that same address. On busy
runners there is a race between releasing the reservation and the child
process rebinding it, which can produce sporadic startup failures.
Binding to port `0` inside the server removes that race entirely, and
waiting for the server to report the real bound address makes the tests
connect only after the listener is actually ready.
2026-03-09 23:39:56 -07:00
Ahmed Ibrahim
2e24be2134
Use realtime transcript for handoff context (#14132)
- collect input/output transcript deltas into active handoff transcript
state
- attach and clear that transcript on each handoff, and regenerate
schema/tests
2026-03-09 22:30:03 -07:00
Channing Conger
c6343e0649
Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296)
### Purpose
While trying to build out CLI-Tools for the agent to use under skills we
have found that those tools sometimes need to invoke a user elicitation.
These elicitations are handled out of band of the codex app-server but
need to indicate to the exec manager that the command running is not
going to progress on the usual timeout horizon.

### Example
Model calls universal exec:
`$ download-credit-card-history --start-date 2026-01-19 --end-date
2026-02-19 > credit_history.jsonl`

download-cred-card-history might hit a hosted/preauthenticated service
to fetch data. That service might decide that the request requires an
end user approval the access to the personal data. It should be able to
signal to the running thread that the command in question is blocked on
user elicitation. In that case we want the exec to continue, but the
timeout to not expire on the tool call, essentially freezing time until
the user approves or rejects the command at which point the tool would
signal the app-server to decrement the outstanding elicitation count.
Now timeouts would proceed as normal.

### What's Added

- New v2 RPC methods:
    - thread/increment_elicitation
    - thread/decrement_elicitation
- Protocol updates in:
    - codex-rs/app-server-protocol/src/protocol/common.rs
    - codex-rs/app-server-protocol/src/protocol/v2.rs
- App-server handlers wired in:
    - codex-rs/app-server/src/codex_message_processor.rs

### Behavior

- Counter starts at 0 per thread.
- increment atomically increases the counter.
- decrement atomically decreases the counter; decrement at 0 returns
invalid request.
- Transition rules:
- 0 -> 1: broadcast pause state, pausing all active stopwatches
immediately.
    - \>0 -> >0: remain paused.
    - 1 -> 0: broadcast unpause state, resuming stopwatches.
- Core thread/session logic:
    - codex-rs/core/src/codex_thread.rs
    - codex-rs/core/src/codex.rs
    - codex-rs/core/src/mcp_connection_manager.rs

### Exec-server stopwatch integration

- Added centralized stopwatch tracking/controller:
    - codex-rs/exec-server/src/posix/stopwatch_controller.rs
- Hooked pause/unpause broadcast handling + stopwatch registration:
    - codex-rs/exec-server/src/posix/mcp.rs
    - codex-rs/exec-server/src/posix/stopwatch.rs
    - codex-rs/exec-server/src/posix.rs
2026-03-09 22:29:26 -07:00
Matthew Zeng
566e4cee4b
[apps] Fix apps enablement condition. (#14011)
- [x] Fix apps enablement condition to check both the feature flag and
that the user is not an API key user.
2026-03-09 22:25:43 -07:00
xl-openai
0c33af7746
feat: support disabling bundled system skills (#13792)
Support disable bundled system skills with a config:

[skills.bundled]
enabled = false
2026-03-09 22:02:53 -07:00
pakrym-oai
d71e042694
Enforce single tool output type in codex handlers (#14157)
We'll need to associate output schema with each tool. Each tool can only
have on output type.
2026-03-09 21:49:44 -07:00
Andrei Eternal
244b2d53f4
start of hooks engine (#13276)
(Experimental)

This PR adds a first MVP for hooks, with SessionStart and Stop

The core design is:

- hooks live in a dedicated engine under codex-rs/hooks
- each hook type has its own event-specific file
- hook execution is synchronous and blocks normal turn progression while
running
- matching hooks run in parallel, then their results are aggregated into
a normalized HookRunSummary

On the AppServer side, hooks are exposed as operational metadata rather
than transcript-native items:

- new live notifications: hook/started, hook/completed
- persisted/replayed hook results live on Turn.hookRuns
- we intentionally did not add hook-specific ThreadItem variants

Hooks messages are not persisted, they remain ephemeral. The context
changes they add are (they get appended to the user's prompt)
2026-03-10 04:11:31 +00:00
viyatb-oai
b0cbc25a48
fix(protocol): preserve legacy workspace-write semantics (#13957)
## Summary
This is a fast follow to the initial `[permissions]` structure.

- keep the new split-policy carveout behavior for narrower non-write
entries under broader writable roots
- preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge
that drops only redundant nested readable roots when projecting from
`SandboxPolicy`
- route the legacy macOS seatbelt adapter through that same legacy
bridge so redundant nested readable roots do not become read-only
carveouts on macOS
- derive the legacy bridge for `command_exec` using the sandbox root cwd
rather than the request cwd so policy derivation matches later sandbox
enforcement
- add regression coverage for the legacy macOS nested-readable-root case

## Examples
### Legacy `workspace-write` on macOS
A legacy `workspace-write` policy can redundantly list a nested readable
root under an already-writable workspace root.

For example, legacy config can effectively mean:
- workspace root (`.` / `cwd`) is writable
- `docs/` is also listed in `readable_roots`

The new shared split-policy helper intentionally treats a narrower
non-write entry under a broader writable root as a carveout for real
`[permissions]` configs. Without this fast follow, the unchanged macOS
seatbelt legacy adapter could project that legacy shape into a
`FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout
under the writable workspace root. In practice, legacy callers on macOS
could unexpectedly lose write access inside `docs/`, even though that
path was writable before the `[permissions]` migration work.

This change fixes that by routing the legacy seatbelt path through the
cwd-aware legacy bridge, so:
- legacy `workspace-write` keeps `docs/` writable when `docs/` was only
a redundant readable root
- explicit `[permissions]` entries like `'.' = 'write'` and `'docs' =
'read'` still make `docs/` read-only, which is the new intended
split-policy behavior

### Legacy `command_exec` with a subdirectory cwd
`command_exec` can run a command from a request cwd that is narrower
than the sandbox root cwd.

For example:
- sandbox root cwd is `/repo`
- request cwd is `/repo/subdir`
- legacy policy is still `workspace-write` rooted at `/repo`

Before this fast follow, `command_exec` derived the legacy bridge using
the request cwd, but the sandbox was later built using the sandbox root
cwd. That mismatch could miss redundant legacy readable roots during
projection and accidentally reintroduce read-only carveouts for paths
that should still be writable under the legacy model.

This change fixes that by deriving the legacy bridge with the same
sandbox root cwd that sandbox enforcement later uses.

## Verification
- `just fmt`
- `cargo test -p codex-core
seatbelt_legacy_workspace_write_nested_readable_root_stays_writable`
- `cargo test -p codex-core test_sandbox_config_parsing`
- `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D
warnings`
- `cargo clean`
2026-03-09 18:43:27 -07:00
Dylan Hurd
6da84efed8
feat(approvals) RejectConfig for request_permissions (#14118)
## Summary
We need to support allowing request_permissions calls when using
`Reject` policy

<img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
40 PM"
src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
/>

Note that this is a backwards-incompatible change for Reject policy. I'm
not sure if we need to add a default based on our current use/setup

## Testing
- [x] Added tests
- [x] Tested locally
2026-03-09 18:16:54 -07:00
Max Johnson
66e71cce11
codex-rs/app-server: add health endpoints for --listen websocket server (#13782)
Healthcheck endpoints for the websocket server

- serve `GET /readyz` and `GET /healthz` from the same listener used for
`--listen ws://...`
- switch the websocket listener over to `axum` upgrade handling instead
of manual socket parsing
- add websocket transport coverage for the health endpoints and document
the new behavior

Testing
- integration tests
- built and tested e2e

```
> curl -i http://127.0.0.1:9234/readyz
HTTP/1.1 200 OK
content-length: 0
date: Fri, 06 Mar 2026 19:20:23 GMT

>  curl -i http://127.0.0.1:9234/healthz
HTTP/1.1 200 OK
content-length: 0
date: Fri, 06 Mar 2026 19:20:24 GMT
```
2026-03-09 22:11:30 +00:00
Dylan Hurd
d241dc598c
feat(core) Persist request_permission data across turns (#14009)
## Summary
request_permissions flows should support persisting results for the
session.

Open Question: Still deciding if we need within-turn approvals - this
adds complexity but I could see it being useful

## Testing
- [x] Updated unit tests

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-09 14:36:38 -07:00
sayan-oai
6ad448b658
chore: plugin/uninstall endpoint (#14111)
add `plugin/uninstall` app-server endpoint to fully rm plugin from
plugins cache dir and rm entry from user config file.

plugin-enablement is session-scoped, so uninstalls are only picked up in
new sessions (like installs).

added tests.
2026-03-09 12:40:25 -07:00
xl-openai
c1f3ef16ec
fix(plugin): Also load curated plugins for TUI. (#14050)
Also run maybe_start_curated_repo_sync_for_config at TUI start time.
2026-03-09 11:05:02 -07:00