Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`
Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.
This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.
Closed threads will also be removed from `thread/loaded/list`.
Add experimental `thread/realtime/*` v2 requests and notifications, then
route app-server realtime events through that thread-scoped surface with
integration coverage.
---------
Co-authored-by: Codex <noreply@openai.com>
Migration Behavior
* Config
* Migrates settings.json into config.toml
* Only adds fields when config.toml is missing, or when those fields are
missing from the existing file
* Supported mappings:
env -> shell_environment_policy
sandbox.enabled = true -> sandbox_mode = "workspace-write"
* Skills
* Copies home and repo .claude/skills into .agents/skills
* Existing skill directories are not overwritten
* SKILL.md content is rewritten from Claude-related terms to Codex
* AgentsMd
* Repo only
* Migrates CLAUDE.md into AGENTS.md
* Detect/import only proceed when AGENTS.md is missing or present but
empty
* Content is rewritten from Claude-related terms to Codex
Add service name to the app-server so that the app can use it's own
service name
This is on thread level because later we might plan the app-server to
become a singleton on the computer
## Why
The generated unnamespaced JSON envelope schemas (`ClientRequest` and
`ServerNotification`) still contained both v1 and v2 variants, which
pulled legacy v1/core types and v2 types into the same `definitions`
graph. That caused `schemars` to produce numeric suffix names (for
example `AskForApproval2`, `ByteRange2`, `MessagePhase2`).
This PR moves JSON codegen toward v2-only output while preserving the
unnamespaced envelope artifacts, and avoids reintroducing numeric-suffix
tolerance by removing the v1/internal-only variants that caused the
collisions in those envelope schemas.
## What Changed
- In `codex-rs/app-server-protocol/src/export.rs`, JSON generation now
excludes v1 schema artifacts (`v1/*`) while continuing to emit
unnamespaced/root JSON schemas and the JSON bundle.
- Added a narrow JSON v1 allowlist (`JSON_V1_ALLOWLIST`) so
`InitializeParams` and `InitializeResponse` are still emitted.
- Added JSON-only post-processing for the mixed envelope schemas before
collision checks run:
- `ClientRequest`: strips v1 request variants from the generated `oneOf`
using the temporary `V1_CLIENT_REQUEST_METHODS` list
- `ServerNotification`: strips v1 notifications plus the internal-only
`rawResponseItem/completed` notification using the temporary
`EXCLUDED_SERVER_NOTIFICATION_METHODS_FOR_JSON` list
- Added a temporary local-definition pruning pass for those envelope
schemas so now-unreferenced v1/core definitions are removed from
`definitions` after method filtering.
- Updated the variant-title naming heuristic for single-property literal
object variants to use the literal value (when available), avoiding
collisions like multiple `state`-only variants all deriving the same
title.
- Collision handling remains fail-fast (no numeric suffix fallback map
in this PR path).
## Verification
- `just write-app-server-schema`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12408).
* __->__ #12408
* #12406
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.
The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
* Add v2 server notifications `thread/archived` and `thread/unarchived`
with a `threadId` payload.
* Wire new events into `thread/archive` and `thread/unarchive` success
paths.
* Update app-server protocol/schema/docs accordingly.
Testing:
- Updated archive/unarchive end-to-end tests to verify both
notifications are emitted with the expected thread id payload.
### What
to unblock filtering models in VSCE, change `model/list` app-server
endpoint to send all models + visibility field `showInPicker` so
filtering can be done in VSCE if desired.
### Tests
Updated tests.
## Summary
In an effort to start simplifying our sandbox setup, we're announcing
this approval_policy as deprecated. In general, it performs worse than
`on-request`, and we're focusing on making fewer sandbox configurations
perform much better.
## Testing
- [x] Tested locally
- [x] Existing tests pass
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.
It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.
## What
- Added `ReadOnlyAccess` in protocol with:
- `Restricted { include_platform_defaults, readable_roots }`
- `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
- `ReadOnly { access: ReadOnlyAccess }`
- `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.
## Compatibility / rollout
- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
…ount_id and chatgpt_plan_type
### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012
Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).
The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
There are two concepts of apps that we load in the harness:
- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.
Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.
In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.
- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.
TLDR: use new message phase field emitted by preamble-supported models
to determine whether an AgentMessage is mid-turn commentary. if so,
restore the status indicator afterwards to indicate the turn has not
completed.
### Problem
`commit_tick` hides the status indicator while streaming assistant text.
For preamble-capable models, that text can be commentary mid-turn, so
hiding was correct during streaming but restore timing mattered:
- restoring too aggressively caused jitter/flashing
- not restoring caused indicator to stay hidden before subsequent work
(tool calls, web search, etc.)
### Fix
- Add optional `phase` to `AgentMessageItem` and propagate it from
`ResponseItem::Message`
- Keep indicator hidden during streamed commit ticks, restore only when:
- assistant item completes as `phase=commentary`, and
- stream queues are idle + task is still running.
- Treat `phase=None` as final-answer behavior (no restore) to keep
existing behavior for non-preamble models
### Tests
Add/update tests for:
- no idle-tick restore without commentary completion
- commentary completion restoring status before tool begin
- snapshot coverage for preamble/status behavior
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
These fields had always been documented as experimental/unstable with
docstrings, but now let's actually use the `experimental` annotation to
be more explicit.
- thread/start.experimentalRawEvents
- thread/resume.history
- thread/resume.path
- thread/fork.path
- turn/start.collaborationMode
- account/login/start.chatgptAuthTokens
This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.
## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.
Having a dedicated `turn/steer` API eliminates all those quirks.
`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397
Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.
## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).
Old shape:
`{ "output": "dynamic-ok", "success": true }`
New shape:
```
{
"contentItems": [
{ "type": "inputText", "text": "dynamic-ok" },
{ "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
]
"success": true
}
```
- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.
## Summary
This PR simplifies collaboration modes to the visible set `default |
plan`, while preserving backward compatibility for older partners that
may still send legacy mode
names.
Specifically:
- Renames the old Code behavior to **Default**.
- Keeps **Plan** as-is.
- Removes **Custom** mode behavior (fallbacks now resolve to Default).
- Keeps `PairProgramming` and `Execute` internally for compatibility
plumbing, while removing them from schema/API and UI visibility.
- Adds legacy input aliasing so older clients can still send old mode
names.
## What Changed
1. Mode enum and compatibility
- `ModeKind` now uses `Plan` + `Default` as active/public modes.
- `ModeKind::Default` deserialization accepts legacy values:
- `code`
- `pair_programming`
- `execute`
- `custom`
- `PairProgramming` and `Execute` variants remain in code but are hidden
from protocol/schema generation.
- `Custom` variant is removed; previous custom fallbacks now map to
`Default`.
2. Collaboration presets and templates
- Built-in presets now return only:
- `Plan`
- `Default`
- Template rename:
- `core/templates/collaboration_mode/code.md` -> `default.md`
- `execute.md` and `pair_programming.md` remain on disk but are not
surfaced in visible preset lists.
3. TUI updates
- Updated user-facing naming and prompts from “Code” to “Default”.
- Updated mode-cycle and indicator behavior to reflect only visible
`Plan` and `Default`.
- Updated corresponding tests and snapshots.
4. request_user_input behavior
- `request_user_input` remains allowed only in `Plan` mode.
- Rejection messaging now consistently treats non-plan modes as
`Default`.
5. Schemas
- Regenerated config and app-server schemas.
- Public schema types now advertise mode values as:
- `plan`
- `default`
## Backward Compatibility Notes
- Incoming legacy mode names (`code`, `pair_programming`, `execute`,
`custom`) are accepted and coerced to `default`.
- Outgoing/public schema surfaces intentionally expose only `plan |
default`.
- This allows tolerant ingestion of older partner payloads while
standardizing new integrations on the reduced mode set.
## Codex author
`codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
### What
add wiring for `phase` field on `ResponseItem::Message` to lay
groundwork for differentiating model preambles and final messages.
currently optional.
follows pattern in #9698.
updated schemas with `just write-app-server-schema` so we can see type
changes.
### Tests
Updated existing tests for SSE parsing and hydrating from history
## Problem being solved
- We need a single, reliable way to mark app-server API surface as
experimental so that:
1. the runtime can reject experimental usage unless the client opts in
2. generated TS/JSON schemas can exclude experimental methods/fields for
stable clients.
Right now that’s easy to drift or miss when done ad-hoc.
## How to declare experimental methods and fields
- **Experimental method**: add `#[experimental("method/name")]` to the
`ClientRequest` variant in `client_request_definitions!`.
- **Experimental field**: on the params struct, derive `ExperimentalApi`
and annotate the field with `#[experimental("method/name.field")]` + set
`inspect_params: true` for the method variant so
`ClientRequest::experimental_reason()` inspects params for experimental
fields.
## How the macro solves it
- The new derive macro lives in
`codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
`#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
attributes.
- **Structs**:
- Generates `ExperimentalApi::experimental_reason(&self)` that checks
only annotated fields.
- The “presence” check is type-aware:
- `Option<T>`: `is_some_and(...)` recursively checks inner.
- `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
- `bool`: must be `true`.
- Other types: considered present (returns `true`).
- Registers each experimental field in an `inventory` with `(type_name,
serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
that type. Field names are converted from `snake_case` to `camelCase`
for schema/TS filtering.
- **Enums**:
- Generates an exhaustive `match` returning `Some(reason)` for annotated
variants and `None` otherwise (no wildcard arm).
- **Wiring**:
- Runtime gating uses `ExperimentalApi::experimental_reason()` in
`codex-rs/app-server/src/message_processor.rs` to reject requests unless
`InitializeParams.capabilities.experimental_api == true`.
- Schema/TS export filters use the inventory list and
`EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
strip experimental methods/fields when `experimental_api` is false.
Similar to what @sayan-oai did in openai/codex#8956 for
`config.schema.json`, this PR updates the repo so that it includes the
output of `codex app-server generate-json-schema` and `codex app-server
generate-ts` and adds a test to verify it is in sync with the current
code.
Motivation:
- This makes any schema changes introduced by a PR transparent during
code review.
- In particular, this should help us catch PRs that would introduce a
non-backwards-compatible change to the app schema (eventually, this
should also be enforced by tooling).
- Once https://github.com/openai/codex/pull/10231 is in to formalize the
notion of "experimental" fields, we can work on ensuring the
non-experimental bits are backwards-compatible.
`codex-rs/app-server-protocol/tests/schema_fixtures.rs` was added as the
test and `just write-app-server-schema` can be use to generate the
vendored schema files.
Incidentally, when I run:
```
rg _ codex-rs/app-server-protocol/schema/typescript/v2
```
I see a number of `snake_case` names that should be `camelCase`.