Commit graph

56 commits

Author SHA1 Message Date
Celia Chen
c1a424691f chore: add a separate reject-policy flag for skill approvals (#14271)
## Summary
- add `skill_approval` to `RejectConfig` and the app-server v2
`AskForApproval::Reject` payload so skill-script prompts can be
configured independently from sandbox and rule-based prompts
- update Unix shell escalation to reject prompts based on the actual
decision source, keeping prefix rules tied to `rules`, unmatched command
fallbacks tied to `sandbox_approval`, and skill scripts tied to
`skill_approval`
- regenerate the affected protocol/config schemas and expand
unit/integration coverage for the new flag and skill approval behavior
2026-03-11 12:33:09 -07:00
joeytrasatti-openai
8ac27b2a16 Add ephemeral flag support to thread fork (#14248)
### Summary
This PR adds first-class ephemeral support to thread/fork, bringing it
in line with thread/start. The goal is to support one-off completions on
full forked threads without persisting them as normal user-visible
threads.

### Testing
2026-03-11 12:33:08 -07:00
xl-openai
d751e68f44 feat: Allow sync with remote plugin status. (#14176)
Add forceRemoteSync to plugin/list.
When it is set to True, we will sync the local plugin status with the
remote one (backend-api/plugins/list).
2026-03-11 12:33:08 -07:00
Dylan Hurd
772259b01f
fix(core) default RejectConfig.request_permissions (#14165)
## Summary
Adds a default here so existing config deserializes

## Testing
- [x] Added a unit test
2026-03-10 04:56:23 +00:00
Dylan Hurd
6da84efed8
feat(approvals) RejectConfig for request_permissions (#14118)
## Summary
We need to support allowing request_permissions calls when using
`Reject` policy

<img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06
40 PM"
src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5"
/>

Note that this is a backwards-incompatible change for Reject policy. I'm
not sure if we need to add a default based on our current use/setup

## Testing
- [x] Added tests
- [x] Tested locally
2026-03-09 18:16:54 -07:00
sayan-oai
6ad448b658
chore: plugin/uninstall endpoint (#14111)
add `plugin/uninstall` app-server endpoint to fully rm plugin from
plugins cache dir and rm entry from user config file.

plugin-enablement is session-scoped, so uninstalls are only picked up in
new sessions (like installs).

added tests.
2026-03-09 12:40:25 -07:00
Celia Chen
340f9c9ecb
app-server: include experimental skill metadata in exec approval requests (#13929)
## Summary

This change surfaces skill metadata on command approval requests so
app-server clients can tell when an approval came from a skill script
and identify the originating `SKILL.md`.

- add `skill_metadata` to exec approval events in the shared protocol
- thread skill metadata through core shell escalation and delegated
approval handling for skill-triggered approvals
- expose the field in app-server v2 as experimental `skillMetadata`
- regenerate the JSON/TypeScript schemas and cover the new field in
protocol, transport, core, and TUI tests

## Why

Skill-triggered approvals already carry skill context inside core, but
app-server clients could not see which skill caused the prompt. Sending
the skill metadata with the approval request makes it possible for
clients to present better approval UX and connect the prompt back to the
relevant skill definition.


## example event in app-server-v2
verified that we see this event when experimental api is on:
```
< {
<   "id": 11,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "additionalPermissions": {
<       "fileSystem": null,
<       "macos": {
<         "accessibility": false,
<         "automations": {
<           "bundle_ids": [
<             "com.apple.Notes"
<           ]
<         },
<         "calendar": false,
<         "preferences": "read_only"
<       },
<       "network": null
<     },
<     "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes",
<     "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy",
<     "skillMetadata": {
<       "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md"
<     },
<     "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69",
<     "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f"
<   }
< }`
```

& verified that this is the event when experimental api is off:
```
< {
<   "id": 13,
<   "method": "item/commandExecution/requestApproval",
<   "params": {
<     "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0",
<     "availableDecisions": [
<       "accept",
<       "acceptForSession",
<       "cancel"
<     ],
<     "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<     "commandActions": [
<       {
<         "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info",
<         "type": "unknown"
<       }
<     ],
<     "cwd": "/Users/celia/code/codex/codex-rs",
<     "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt",
<     "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4",
<     "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a"
<   }
< }
```
2026-03-08 18:07:46 -07:00
Matthew Zeng
a684a36091
[app-server] Support hot-reload user config when batch writing config. (#13839)
- [x] Support hot-reload user config when batch writing config.
2026-03-08 17:38:01 -07:00
iceweasel-oai
4b4f61d379
app-server: require absolute cwd for windowsSandbox/setupStart (#13833)
## Summary
- require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
- reject relative cwd values at request parsing instead of normalizing
them later in the setup flow
- add RPC-layer coverage for relative cwd rejection and update the
checked-in protocol schemas/docs

## Why
windowsSandbox/setupStart was carrying the client-provided cwd as a raw
PathBuf for command_cwd while config derivation normalized the same
value into an absolute policy_cwd.

That left room for relative-path ambiguity in the setup path, especially
for inputs like cwd: "repo". Making the RPC accept only absolute paths
removes that split entirely: the handler now receives one
already-validated absolute path and uses it for both config derivation
and setup.

This keeps the trust model unchanged. Trusted clients could already
choose the session cwd; this change is only about making the setup RPC
reject relative paths so command_cwd and policy_cwd cannot diverge.

## Testing
- cargo test -p codex-app-server windows_sandbox_setup (run locally by
user)
- cargo test -p codex-app-server-protocol windows_sandbox (run locally
by user)
2026-03-06 22:47:08 -08:00
Owen Lin
90469d0a23
feat(app-server-protocol): address naming conflicts in json schema exporter (#13819)
This fixes a schema export bug where two different `WebSearchAction`
types were getting merged under the same name in the app-server v2 JSON
schema bundle.

The problem was that v2 thread items use the app-server API's
`WebSearchAction` with camelCase variants like `openPage`, while
`ThreadResumeParams.history` and
`RawResponseItemCompletedNotification.item` pull in the upstream
`ResponseItem` graph, which uses the Responses API snake_case shape like
`open_page`. During bundle generation we were flattening nested
definitions into the v2 namespace by plain name, so the later definition
could silently overwrite the earlier one.

That meant clients generating code from the bundled schema could end up
with the wrong `WebSearchAction` definition for v2 thread history. In
practice this shows up on web search items reconstructed from rollout
files with persisted extended history.

This change does two things:
- Gives the upstream Responses API schema a distinct JSON schema name:
`ResponsesApiWebSearchAction`
- Makes namespace-level schema definition collisions fail loudly instead
of silently overwriting
2026-03-07 01:33:46 +00:00
Ruslan Nigmatullin
e9bd8b20a1
app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
2026-03-06 17:30:17 -08:00
xl-openai
0243734300
feat: Add curated plugin marketplace + Metadata Cleanup. (#13712)
1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level
2026-03-06 19:39:35 -05:00
xl-openai
520ed724d2
support plugin/list. (#13540)
Introduce a plugin/list which reads from local marketplace.json.
Also update the signature for plugin/install.
2026-03-05 21:58:50 -05:00
xl-openai
1e877ccdd2
plugin: support local-based marketplace.json + install endpoint. (#13422)
Support marketplace.json that points to a local file, with
```
    "source":
    {
        "source": "local",
        "path": "./plugin-1"
    },
 ```
 
 Add a new plugin/install endpoint which add the plugin to the cache folder and enable it in config.toml.
2026-03-04 19:08:18 -05:00
Owen Lin
8dfd654196
feat(app-server-test-client): OTEL setup for tracing (#13493)
### Overview
This PR:
- Updates `app-server-test-client` to load OTEL settings from
`$CODEX_HOME/config.toml` and initializes its own OTEL provider.
- Add real client root spans to app-server test client traces.

This updates `codex-app-server-test-client` so its Datadog traces
reflect the full client-driven flow instead of a set of server spans
stitched together under a synthetic parent.

Before this change, the test client generated a fake `traceparent` once
and reused it for every JSON-RPC request. That kept the requests in one
trace, but there was no real client span at the top, so Datadog ended up
showing the sequence in a slightly misleading way, where all RPCs were
anchored under `initialize`.

Now the test client:
- loads OTEL settings from the normal Codex config path, including
`$CODEX_HOME/config.toml` and existing --config overrides
- initializes tracing the same way other Codex binaries do when trace
export is enabled
- creates a real client root span for each scripted command
- creates per-request client spans for JSON-RPC methods like
`initialize`, `thread/start`, and `turn/start`
- injects W3C trace context from the current client span into
request.trace instead of reusing a fabricated carrier

This gives us a cleaner trace shape in Datadog:
- one trace URL for the whole scripted flow
- a visible client root span
- proper client/server parent-child relationships for each app-server
request
2026-03-04 13:30:09 -08:00
iceweasel-oai
54a1c81d73
allow apps to specify cwd for sandbox setup. (#13484)
The electron app doesn't start up the app-server in a particular
workspace directory.
So sandbox setup happens in the app-installed directory instead of the
project workspace.

This allows the app do specify the workspace cwd so that the sandbox
setup actually sets up the ACLs instead of exiting fast and then having
the first shell command be slow.
2026-03-04 10:54:30 -08:00
Won Park
fa2306b303
image-gen-core (#13290)
Core tool-calling for image-gen, handles requesting and receiving logic
for images using response API
2026-03-03 23:11:28 -08:00
Val Kharitonov
4f6c4bb143
support 'flex' tier in app-server in addition to 'fast' (#13391) 2026-03-03 22:46:05 -08:00
Celia Chen
e6773f856c
Feat: Preserve network access on read-only sandbox policies (#13409)
## Summary

`PermissionProfile.network` could not be preserved when additional or
compiled permissions resolved to
`SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
field. This change makes read-only + network
enabled representable directly and threads that through the protocol,
app-server v2 mirror, and permission-
  merging logic.

## What changed

- Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
protocol and app-server v2 protocol.
- Kept backward compatibility by defaulting the new field to false, so
legacy read-only payloads still
    deserialize unchanged.
- Updated `has_full_network_access()` and sandbox summaries to respect
read-only network access.
  - Preserved PermissionProfile.network when:
      - compiling skill permission profiles into sandbox policies
      - normalizing additional permissions
      - merging additional permissions into existing sandbox policies
- Updated the approval overlay to show network in the rendered
permission rule when requested.
  - Regenerated app-server schema fixtures for the new v2 wire shape.
2026-03-04 02:41:57 +00:00
Curtis 'Fjord' Hawthorne
b92146d48b
Add under-development original-resolution view_image support (#13050)
## Summary

Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.

When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.

## What changed

- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
  - Preserves original PNG/JPEG/WebP bytes.
  - Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
  - select original-resolution mode only when:
    - the feature flag is enabled, and
    - the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.

## Behavior

### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.

### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
-  `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 15:56:54 -08:00
joeytrasatti-openai
935754baa3
Add thread metadata update endpoint to app server (#13280)
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly

## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
    - `suite::grep_files::grep_files_tool_collects_matches`
    - `suite::grep_files::grep_files_tool_reports_empty_results`
2026-03-03 15:56:11 -08:00
pash-openai
07e532dcb9
app-server service tier plumbing (plus some cleanup) (#13334)
followup to https://github.com/openai/codex/pull/13212 to expose fast
tier controls to app server
(majority of this PR is generated schema jsons - actual code is +69 /
-35 and +24 tests )

- add service tier fields to the app-server protocol surfaces used by
thread lifecycle, turn start, config, and session configured events
- thread service tier through the app-server message processor and core
thread config snapshots
- allow runtime config overrides to carry service tier for app-server
callers

cleanup:
- Removing useless "legacy" code supporting "standard" - we moved to
None | "fast", so "standard" is not needed.
2026-03-03 02:35:09 -08:00
Curtis 'Fjord' Hawthorne
7e980d7db6
Support multimodal custom tool outputs (#12948)
## Summary

This changes `custom_tool_call_output` to use the same output payload
shape as `function_call_output`, so freeform tools can return either
plain text or structured content items.

The main goal is to let `js_repl` return image content from nested
`view_image` calls in its own `custom_tool_call_output`, instead of
relying on a separate injected message.

## What changed

- Changed `custom_tool_call_output.output` from `string` to
`FunctionCallOutputPayload`
- Updated freeform tool plumbing to preserve structured output bodies
- Updated `js_repl` to aggregate nested tool content items and attach
them to the outer `js_repl` result
- Removed the old `js_repl` special case that injected `view_image`
results as a separate pending user image message
- Updated normalization/history/truncation paths to handle multimodal
`custom_tool_call_output`
- Regenerated app-server protocol schema artifacts

## Behavior

Direct `view_image` calls still return a `function_call_output` with
image content.

When `view_image` is called inside `js_repl`, the outer `js_repl`
`custom_tool_call_output` now carries:
- an `input_text` item if the JS produced text output
- one or more `input_image` items from nested tool results

So the nested image result now stays inside the `js_repl` tool output
instead of being injected as a separate message.

## Compatibility

This is intended to be backward-compatible for resumed conversations.

Older histories that stored `custom_tool_call_output.output` as a plain
string still deserialize correctly, and older histories that used the
previous injected-image-message flow also continue to resume.

Added regression coverage for resuming a pre-change rollout containing:
- string-valued `custom_tool_call_output`
- legacy injected image message history


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12948
2026-02-26 18:17:46 -08:00
Owen Lin
21f7032dbb
feat(app-server): thread/unsubscribe API (#10954)
Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`

Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.

This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.

Closed threads will also be removed from `thread/loaded/list`.
2026-02-25 13:14:30 -08:00
Ahmed Ibrahim
947092283a
Add app-server v2 thread realtime API (#12715)
Add experimental `thread/realtime/*` v2 requests and notifications, then
route app-server realtime events through that thread-scoped surface with
integration coverage.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-25 09:59:10 -08:00
alexsong-oai
6d6570d89d
Support external agent config detect and import (#12660)
Migration Behavior

* Config
  *  Migrates settings.json into config.toml
* Only adds fields when config.toml is missing, or when those fields are
missing from the existing file
  *  Supported mappings:
    env -> shell_environment_policy
     sandbox.enabled = true -> sandbox_mode = "workspace-write"

* Skills
  *  Copies home and repo .claude/skills into .agents/skills
  *  Existing skill directories are not overwritten
  *  SKILL.md content is rewritten from Claude-related terms to Codex

* AgentsMd
  *  Repo only
  *  Migrates CLAUDE.md into AGENTS.md
* Detect/import only proceed when AGENTS.md is missing or present but
empty
  *  Content is rewritten from Claude-related terms to Codex
2026-02-25 02:11:51 -08:00
jif-oai
f46b767b7e
feat: add search term to thread list (#12578)
Add `searchTerm` to `thread/list` that will search for a match in the
titles (the condition being `searchTerm` $$\in$$ `title`)
2026-02-25 09:59:41 +00:00
jif-oai
10c04e11b8
feat: add service name to app-server (#12319)
Add service name to the app-server so that the app can use it's own
service name

This is on thread level because later we might plan the app-server to
become a singleton on the computer
2026-02-25 09:51:42 +00:00
Michael Bolin
1779feb6a7
ignore v1 in JSON schema codegen (#12408)
## Why

The generated unnamespaced JSON envelope schemas (`ClientRequest` and
`ServerNotification`) still contained both v1 and v2 variants, which
pulled legacy v1/core types and v2 types into the same `definitions`
graph. That caused `schemars` to produce numeric suffix names (for
example `AskForApproval2`, `ByteRange2`, `MessagePhase2`).

This PR moves JSON codegen toward v2-only output while preserving the
unnamespaced envelope artifacts, and avoids reintroducing numeric-suffix
tolerance by removing the v1/internal-only variants that caused the
collisions in those envelope schemas.

## What Changed

- In `codex-rs/app-server-protocol/src/export.rs`, JSON generation now
excludes v1 schema artifacts (`v1/*`) while continuing to emit
unnamespaced/root JSON schemas and the JSON bundle.
- Added a narrow JSON v1 allowlist (`JSON_V1_ALLOWLIST`) so
`InitializeParams` and `InitializeResponse` are still emitted.
- Added JSON-only post-processing for the mixed envelope schemas before
collision checks run:
- `ClientRequest`: strips v1 request variants from the generated `oneOf`
using the temporary `V1_CLIENT_REQUEST_METHODS` list
- `ServerNotification`: strips v1 notifications plus the internal-only
`rawResponseItem/completed` notification using the temporary
`EXCLUDED_SERVER_NOTIFICATION_METHODS_FOR_JSON` list
- Added a temporary local-definition pruning pass for those envelope
schemas so now-unreferenced v1/core definitions are removed from
`definitions` after method filtering.
- Updated the variant-title naming heuristic for single-property literal
object variants to use the literal value (when available), avoiding
collisions like multiple `state`-only variants all deriving the same
title.
- Collision handling remains fail-fast (no numeric suffix fallback map
in this PR path).

## Verification

- `just write-app-server-schema`

---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12408).
* __->__ #12408
* #12406
2026-02-20 21:36:12 -08:00
pakrym-oai
1bb7989b20
Add ability to attach extra files to feedback (#12370)
Allow clients to provide extra files.
2026-02-20 22:26:14 +00:00
Michael Bolin
425fff7ad6
feat: add Reject approval policy with granular prompt rejection controls (#12087)
## Why

We need a way to auto-reject specific approval prompt categories without
switching all approvals off.

The goal is to let users independently control:
- sandbox escalation approvals,
- execpolicy `prompt` rule approvals,
- MCP elicitation prompts.

## What changed

- Added a new primary approval mode in `protocol/src/protocol.rs`:

```rust
pub enum AskForApproval {
    // ...
    Reject(RejectConfig),
    // ...
}

pub struct RejectConfig {
    pub sandbox_approval: bool,
    pub rules: bool,
    pub mcp_elicitations: bool,
}
```

- Wired `RejectConfig` semantics through approval paths in `core`:
  - `core/src/exec_policy.rs`
    - rejects rule-driven prompts when `rules = true`
    - rejects sandbox/escalation prompts when `sandbox_approval = true`
- preserves rule priority when both rule and sandbox prompt conditions
are present
  - `core/src/tools/sandboxing.rs`
- applies `sandbox_approval` to default exec approval decisions and
sandbox-failure retry gating
  - `core/src/safety.rs`
- keeps `Reject { all false }` behavior aligned with `OnRequest` for
patch safety
    - rejects out-of-root patch approvals when `sandbox_approval = true`
  - `core/src/mcp_connection_manager.rs`
    - auto-declines MCP elicitations when `mcp_elicitations = true`

- Ensured approval policy used by MCP elicitation flow stays in sync
with constrained session policy updates.

- Updated app-server v2 conversions and generated schema/TypeScript
artifacts for the new `Reject` shape.

## Verification

Added focused unit coverage for the new behavior in:
- `core/src/exec_policy.rs`
- `core/src/tools/sandboxing.rs`
- `core/src/mcp_connection_manager.rs`
- `core/src/safety.rs`
- `core/src/tools/runtimes/apply_patch.rs`

Key cases covered include rule-vs-sandbox prompt precedence, MCP
auto-decline behavior, and patch/sandbox retry behavior under
`RejectConfig`.
2026-02-19 11:41:49 -08:00
iceweasel-oai
292542616a
app-server support for Windows sandbox setup. (#12025)
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.

The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
2026-02-18 13:03:16 -08:00
Ruslan Nigmatullin
31cbebd3c2
app-server: Emit thread archive/unarchive notifications (#12030)
* Add v2 server notifications `thread/archived` and `thread/unarchived`
with a `threadId` payload.
* Wire new events into `thread/archive` and `thread/unarchive` success
paths.
* Update app-server protocol/schema/docs accordingly.

Testing:
- Updated archive/unarchive end-to-end tests to verify both
notifications are emitted with the expected thread id payload.
2026-02-17 14:53:58 -08:00
xl-openai
314029ffa3
Add remote skill scope/product_surface/enabled params and cleanup (#11801)
skills/remote/list: params=hazelnutScope, productSurface, enabled;
returns=data: { id, name, description }[]
skills/remote/export: params=hazelnutId; returns={ id, path }
2026-02-17 11:05:22 -08:00
sayan-oai
6b466df146
fix: send unfiltered models over model/list (#11793)
### What
to unblock filtering models in VSCE, change `model/list` app-server
endpoint to send all models + visibility field `showInPicker` so
filtering can be done in VSCE if desired.

### Tests
Updated tests.
2026-02-13 16:26:32 -08:00
acrognale-oai
ebe359b876
Add cwd as an optional field to thread/list (#11651)
Add's the ability to filter app-server thread/list by cwd
2026-02-13 02:05:04 +00:00
Dylan Hurd
4668feb43a
chore(core) Deprecate approval_policy: on-failure (#11631)
## Summary
In an effort to start simplifying our sandbox setup, we're announcing
this approval_policy as deprecated. In general, it performs worse than
`on-request`, and we're focusing on making fewer sandbox configurations
perform much better.

## Testing
- [x] Tested locally
- [x] Existing tests pass
2026-02-12 13:23:30 -08:00
Michael Bolin
abbd74e2be
feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.

It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.

## What

- Added `ReadOnlyAccess` in protocol with:
  - `Restricted { include_platform_defaults, readable_roots }`
  - `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
  - `ReadOnly { access: ReadOnlyAccess }`
  - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.

## Compatibility / rollout

- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
2026-02-11 18:31:14 -08:00
jif-oai
a364dd8b56
feat: opt-out of events in the app-server (#11319)
Add `optOutNotificationMethods` in the app-server to opt-out events
based on exact method matching
2026-02-10 18:04:52 +00:00
Matthew Zeng
005e040f97
[apps] Add thread_id param to optionally load thread config for apps feature check. (#11279)
- [x] Add thread_id param to optionally load thread config for apps
feature check
2026-02-09 23:10:26 -08:00
Owen Lin
53741013ab
fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
…ount_id and chatgpt_plan_type

### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012

Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).

The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
2026-02-09 20:48:58 -08:00
xl-openai
a33ee46e3b
feat: extend skills/list to support additional roots. (#10835)
Add an optional perCwdExtraUserRoots
2026-02-09 13:30:38 -08:00
Matthew Zeng
45b7763c3f
[apps] Improve app loading. (#10994)
There are two concepts of apps that we load in the harness:

- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.

Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.

In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.

- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.
2026-02-08 15:24:56 -08:00
sayan-oai
5d2702f6b8
fix(tui): conditionally restore status indicator using message phase (#10947)
TLDR: use new message phase field emitted by preamble-supported models
to determine whether an AgentMessage is mid-turn commentary. if so,
restore the status indicator afterwards to indicate the turn has not
completed.

### Problem
`commit_tick` hides the status indicator while streaming assistant text.
For preamble-capable models, that text can be commentary mid-turn, so
hiding was correct during streaming but restore timing mattered:
- restoring too aggressively caused jitter/flashing
- not restoring caused indicator to stay hidden before subsequent work
(tool calls, web search, etc.)

### Fix
- Add optional `phase` to `AgentMessageItem` and propagate it from
`ResponseItem::Message`
- Keep indicator hidden during streamed commit ticks, restore only when:
  - assistant item completes as `phase=commentary`, and
  - stream queues are idle + task is still running.
- Treat `phase=None` as final-answer behavior (no restore) to keep
existing behavior for non-preamble models

### Tests
Add/update tests for:
- no idle-tick restore without commentary completion
- commentary completion restoring status before tool begin
- snapshot coverage for preamble/status behavior

---------

Co-authored-by: Josh McKinney <joshka@openai.com>
2026-02-07 02:39:52 +00:00
Owen Lin
1751116ec6
chore(app-server): add experimental annotation to relevant fields (#10928)
These fields had always been documented as experimental/unstable with
docstrings, but now let's actually use the `experimental` annotation to
be more explicit.

- thread/start.experimentalRawEvents
- thread/resume.history
- thread/resume.path
- thread/fork.path
- turn/start.collaborationMode
- account/login/start.chatgptAuthTokens
2026-02-06 20:48:04 +00:00
Owen Lin
0d8b2b74c4
feat(app-server): turn/steer API (#10821)
This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.

## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.

Having a dedicated `turn/steer` API eliminates all those quirks.

`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.
2026-02-06 00:35:04 +00:00
Matthew Zeng
7e81f63698
[app-server] Add a method to list experimental features. (#10721)
- [x] Add a method to list experimental features.
2026-02-05 20:04:01 +00:00
Owen Lin
5ea107a088
feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567)
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397

Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image

So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.

## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).

Old shape:
`{ "output": "dynamic-ok", "success": true }`

New shape:
```
{
    "contentItems": [
      { "type": "inputText", "text": "dynamic-ok" },
      { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
    ]
  "success": true
}
```
2026-02-04 16:12:47 -08:00
Ahmed Ibrahim
f9c38f531c
add none personality option (#10688)
- add none personality enum value and empty placeholder behavior\n- add
docs/schema updates and e2e coverage
2026-02-04 15:40:33 -08:00
Ahmed Ibrahim
38a47700b5
Add thread/compact v2 (#10445)
- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.
2026-02-03 18:15:55 -08:00