Add `thread_id` and `turn_id` to `item/started`, `item/completed`, and
`error` notifications. Otherwise the client will have a hard time
knowing which thread & turn (if multiple threads are running in
parallel) a new item/error is for.
Also add `thread_id` to `turn/started` and `turn/completed`.
Add a `Declined` status for when we request an approval from the user
and the user declines. This allows us to distinguish from commands that
actually ran, but failed.
This behaves similarly to apply_patch / FileChange, which does the same
thing.
This PR adds the API V2 version of the apply_patch approval flow, which
centers around `ThreadItem::FileChange`.
This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only)
and related events (`item/started`, `item/completed` for
`ThreadItem::FileChange`, which are emitted in both V1 and V2) through
the app-server
protocol. The new approval RPC is only sent when the user initiates a
turn with the new `turn/start` API so we don't break backwards
compatibility with VSCE.
Similar to https://github.com/openai/codex/pull/6758, the approach I
took was to make as few changes to the Codex core as possible,
leveraging existing `EventMsg` core events, and translating those in
app-server. I did have to add a few additional fields to
`EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those
were fairly lightweight.
However, the `EventMsg`s emitted by core are the following:
```
1) Auto-approved (no request for approval)
- EventMsg::PatchApplyBegin
- EventMsg::PatchApplyEnd
2) Approved by user
- EventMsg::ApplyPatchApprovalRequest
- EventMsg::PatchApplyBegin
- EventMsg::PatchApplyEnd
3) Declined by user
- EventMsg::ApplyPatchApprovalRequest
- EventMsg::PatchApplyBegin
- EventMsg::PatchApplyEnd
```
For a request triggering an approval, this would result in:
```
item/fileChange/requestApproval
item/started
item/completed
```
which is different from the `ThreadItem::CommandExecution` flow
introduced in https://github.com/openai/codex/pull/6758, which does the
below and is preferable:
```
item/started
item/commandExecution/requestApproval
item/completed
```
To fix this, we leverage `TurnSummaryStore` on codex_message_processor
to store a little bit of state, allowing us to fire `item/started` and
`item/fileChange/requestApproval` whenever we receive the underlying
`EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the
`EventMsg::PatchApplyBegin` later.
This is much less invasive than modifying the order of EventMsg within
core (I tried).
The resulting payloads:
```
{
"method": "item/started",
"params": {
"item": {
"changes": [
{
"diff": "Hello from Codex!\n",
"kind": "add",
"path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
}
],
"id": "call_Nxnwj7B3YXigfV6Mwh03d686",
"status": "inProgress",
"type": "fileChange"
}
}
}
```
```
{
"id": 0,
"method": "item/fileChange/requestApproval",
"params": {
"grantRoot": null,
"itemId": "call_Nxnwj7B3YXigfV6Mwh03d686",
"reason": null,
"threadId": "019a9e11-8295-7883-a283-779e06502c6f",
"turnId": "1"
}
}
```
```
{
"id": 0,
"result": {
"decision": "accept"
}
}
```
```
{
"method": "item/completed",
"params": {
"item": {
"changes": [
{
"diff": "Hello from Codex!\n",
"kind": "add",
"path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt"
}
],
"id": "call_Nxnwj7B3YXigfV6Mwh03d686",
"status": "completed",
"type": "fileChange"
}
}
}
```
Expand the rate-limit cache/TUI: store credit snapshots alongside
primary and secondary windows, render “Credits” when the backend reports
they exist (unlimited vs rounded integer balances)
This PR allows clients to render historical messages when resuming a
thread via `thread/resume` by reading from the list of `EventMsg`
payloads loaded from the rollout, and then transforming them into Turns
and ThreadItems to be returned on the `Thread` object.
This is implemented by leveraging `SessionConfiguredNotification` which
returns this list of `EventMsg` objects when resuming a conversation,
and then applying a stateful `ThreadHistoryBuilder` that parses from
this EventMsg log and transforms it into Turns and ThreadItems.
Note that we only persist a subset of `EventMsg`s in a rollout as
defined in `policy.rs`, so we lose fidelity whenever we resume a thread
compared to when we streamed the thread's turns originally. However,
this behavior is at parity with the legacy API.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This adds the following fields to `ThreadStartResponse` and
`ThreadResumeResponse`:
```rust
pub model: String,
pub model_provider: String,
pub cwd: PathBuf,
pub approval_policy: AskForApproval,
pub sandbox: SandboxPolicy,
pub reasoning_effort: Option<ReasoningEffort>,
```
This is important because these fields are optional in
`ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
able to determine what values were ultimately used to start/resume the
conversation. (Though note that any of these could be changed later
between turns in the conversation.)
Though to get this information reliably, it must be read from the
internal `SessionConfiguredEvent` that is created in response to the
start of a conversation. Because `SessionConfiguredEvent` (as defined in
`codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
number of them had to be added as part of this PR.
Because `SessionConfiguredEvent` is referenced in many tests, test
instances of `SessionConfiguredEvent` had to be updated, as well, which
is why this PR touches so many files.
similar to logic in
`codex/codex-rs/exec/src/event_processor_with_jsonl_output.rs`.
translation of v1 -> v2 events:
`codex/event/task_complete` -> `turn/completed`
`codex/event/turn_aborted` -> `turn/completed` with `interrupted` status
`codex/event/error` -> `turn/completed` with `error` status
this PR also makes `items` field in `Turn` optional. For now, we only
populate it when we resume a thread, and leave it as None for all other
places until we properly rewrite core to keep track of items.
tested using the codex app server client. example new event:
```
< {
< "method": "turn/completed",
< "params": {
< "turn": {
< "id": "0",
< "items": [],
< "status": "interrupted"
< }
< }
< }
```
## Summary
- update documentation, example configs, and automation defaults to
reference gpt-5.1 / gpt-5.1-codex
- bump the CLI and core configuration defaults, model presets, and error
messaging to the new models while keeping the model-family/tool coverage
for legacy slugs
- refresh tests, fixtures, and TUI snapshots so they expect the upgraded
defaults
## Testing
- `cargo test -p codex-core
config::tests::test_precedence_fixture_with_gpt5_profile`
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
This PR adds the API V2 version of the command‑execution approval flow
for the shell tool.
This PR wires the new RPC (`item/commandExecution/requestApproval`, V2
only) and related events (`item/started`, `item/completed`, and
`item/commandExecution/delta`, which are emitted in both V1 and V2)
through the app-server
protocol. The new approval RPC is only sent when the user initiates a
turn with the new `turn/start` API so we don't break backwards
compatibility with VSCE.
The approach I took was to make as few changes to the Codex core as
possible, leveraging existing `EventMsg` core events, and translating
those in app-server. I did have to add additional fields to
`EventMsg::ExecCommandEndEvent` to capture the command's input so that
app-server can statelessly transform these events to a
`ThreadItem::CommandExecution` item for the `item/completed` event.
Once we stabilize the API and it's complete enough for our partners, we
can work on migrating the core to be aware of command execution items as
a first-class concept.
**Note**: We'll need followup work to make sure these APIs work for the
unified exec tool, but will wait til that's stable and landed before
doing a pass on app-server.
Example payloads below:
```
{
"method": "item/started",
"params": {
"item": {
"aggregatedOutput": null,
"command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
"cwd": "/Users/owen/repos/codex/codex-rs",
"durationMs": null,
"exitCode": null,
"id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
"parsedCmd": [
{
"cmd": "touch /tmp/should-trigger-approval",
"type": "unknown"
}
],
"status": "inProgress",
"type": "commandExecution"
}
}
}
```
```
{
"id": 0,
"method": "item/commandExecution/requestApproval",
"params": {
"itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
"parsedCmd": [
{
"cmd": "touch /tmp/should-trigger-approval",
"type": "unknown"
}
],
"reason": "Need to create file in /tmp which is outside workspace sandbox",
"risk": null,
"threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
"turnId": "1"
}
}
```
```
{
"id": 0,
"result": {
"acceptSettings": {
"forSession": false
},
"decision": "accept"
}
}
```
```
{
"params": {
"item": {
"aggregatedOutput": null,
"command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
"cwd": "/Users/owen/repos/codex/codex-rs",
"durationMs": 224,
"exitCode": 0,
"id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
"parsedCmd": [
{
"cmd": "touch /tmp/should-trigger-approval",
"type": "unknown"
}
],
"status": "completed",
"type": "commandExecution"
}
}
}
```
A partner is consuming our generated JSON schema bundle for app-server
and identified a few issues:
- not all polymorphic / one-of types have a type descriminator
- `"$ref": "#/definitions/v2/SandboxPolicy"` is missing
- "Option<>" is an invalid schema name, and also unnecessary
This PR:
- adds the type descriminator to the various types that are missing it
except for `SessionSource` and `SubAgentSource` because they are
serialized to disk (adding this would break backwards compat for
resume), and they should not be necessary to consume for an integration
with app-server.
- removes the special handling in `export.rs` of various types like
SandboxPolicy, which turned out to be unnecessary and incorrect
- filters out `Option<>` which was auto-generated for request params
that don't need a body
For context, we currently pull in wayyy more types than we need through
the `EventMsg` god object which we are **not** planning to expose in API
v2 (this is how I suspect `SessionSource` and `SubAgentSource` are being
pulled in). But until we have all the necessary v2 notifications in
place that will allow us to remove `EventMsg`, we will keep exporting it
for now.
This updates `thread/resume` to be at parity with v1's
`ResumeConversationParams`. Turns out history is useful for codex cloud
and path is useful for the VSCode extension. And config overrides are
always useful.
Add the following fields to Thread:
```
pub preview: String,
pub model_provider: String,
pub created_at: i64,
```
Will prob need another PR once this lands:
https://github.com/openai/codex/pull/6337
This PR does two things:
1. add a new function in core that maps the core-internal plan type to
the external plan type;
2. implement account/read that get account status (v2 of
`getAuthStatus`).
Implements:
```
turn/start
turn/interrupt
```
along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.
However, an improvement made for developer ergonomics:
- `turn/start` replaces both `SendUserMessage` (no turn overrides) and
`SendUserTurn` (can override model, approval policy, etc.)
This PR implements `account/login/start` and `account/login/completed`.
Instead of having separate endpoints for login with chatgpt and api, we
have a single enum handling different login methods. For sync auth
methods like sign in with api key, we still send a `completed`
notification back to be compatible with the async login flow.
I'm seeing two tests fail intermittently in CI. This PR attempts to
address (or at least mitigate) the flakiness.
* summarize_context_three_requests_and_instructions - The test snapshots
server.received_requests() immediately after observing TaskComplete.
Because the OpenAI /v1/responses call is streamed, the HTTP request can
still be draining when that event fires, so wiremock occasionally
reports only two captured requests. Fix is to wait for async activity to
complete.
* archive_conversation_moves_rollout_into_archived_directory - times out
on a slow CI run. Mitigation is to increase timeout value from 10s to
20s.
Implements:
```
thread/list
thread/start
thread/resume
thread/archive
```
along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.
However, an improvement made for developer ergonomics:
- `thread/start` and `thread/resume` automatically attaches a
conversation listener internally, so clients don't have to make a
separate `AddConversationListener` call like they do today.
For consistency, also updated `model/list` and `feedback/upload` (naming
conventions, list API params).
**Typescript and JSON schema exports**
While working on Thread/Turn/Items type definitions, I realize we will
run into name conflicts between v1 and v2 APIs (e.g. `RateLimitWindow`
which won't be reusable since v1 uses `RateLimitWindow` from `protocol/`
which uses snake_case, but we want to expose camelCase everywhere, so
we'll define a V2 version of that struct that serializes as camelCase).
To set us up for a clean and isolated v2 API, generate types into a
`v2/` namespace for both typescript and JSON schema.
- TypeScript: v2 types emit under `out_dir/v2/*.ts`, and root index.ts
now re-exports them via `export * as v2 from "./v2"`;.
- JSON Schemas: v2 definitions bundle under `#/definitions/v2/*` rather
than the root.
The location for the original types (v1 and types pulled from
`protocol/` and other core crates) haven't changed and are still at the
root. This is for backwards compatibility: no breaking changes to
existing usages of v1 APIs and types.
**Notifications**
While working on export.rs, I:
- refactored server/client notifications with macros (like we already do
for methods) so they also get exported (I noticed they weren't being
exported at all).
- removed the hardcoded list of types to export as JSON schema by
leveraging the existing macros instead
- and took a stab at API V2 notifications. These aren't wired up yet,
and I expect to iterate on these this week.
V2 for `account/updated` and `account/logout` for app server. correspond
to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs
will make other v2 endpoints call `account/updated` instead of
`authStatusChange` too.
We currently have nested enums when sending raw response items in the
app-server protocol. This makes downstream schemas confusing because we
need to embed `type`-discriminated enums within each other.
This PR adds a small wrapper around the response item so we can keep the
schemas separate
This PR adds an option to app server to allow conversation summaries to
be fetched from just the conversation id rather than rollout path for
convenience at the cost of some latency to discover the rollout path.
This convenience is non-trivial as it allows app servers to simply
maintain conversation ids rather than rollout paths and the associated
platform (Windows) handling associated with storing and encoding them
correctly.
There's a lot of visual noise in app-server's integration tests due to
the number of `.expect("<some_msg>")` lines which are largely redundant
/ not very useful. Clean them up by using `anyhow::Result` + `?`
consistently.
Replaces the existing pattern of:
```
let codex_home = TempDir::new().expect("create temp dir");
create_config_toml(codex_home.path()).expect("write config.toml");
let mut mcp = McpProcess::new(codex_home.path())
.await
.expect("spawn mcp process");
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize())
.await
.expect("initialize timeout")
.expect("initialize request");
```
With:
```
let codex_home = TempDir::new()?;
create_config_toml(codex_home.path())?;
let mut mcp = McpProcess::new(codex_home.path()).await?;
timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??;
```
This PR is a follow-up to #5591. It allows users to choose which auth
storage mode they want by using the new
`cli_auth_credentials_store_mode` config.
This PR introduces a new `Auth Storage` abstraction layer that takes
care of read, write, and load of auth tokens based on the
AuthCredentialsStoreMode. It is similar to how we handle MCP client
oauth
[here](https://github.com/openai/codex/blob/main/codex-rs/rmcp-client/src/oauth.rs).
Instead of reading and writing directly from disk for auth tokens, Codex
CLI workflows now should instead use this auth storage using the public
helper functions.
This PR is just a refactor of the current code so the behavior stays the
same. We will add support for keyring and hybrid mode in follow-up PRs.
I have read the CLA Document and I hereby sign the CLA
Because conversations that use the Responses API can have encrypted
reasoning messages, trying to resume a conversation with a different
provider could lead to confusing "failed to decrypt" errors. (This is
reproducible by starting a conversation using ChatGPT login and resuming
it as a conversation that uses OpenAI models via Azure.)
This changes `ListConversationsParams` to take a `model_providers:
Option<Vec<String>>` and adds `model_provider` on each
`ConversationSummary` it returns so these cases can be disambiguated.
Note this ended up making changes to
`codex-rs/core/src/rollout/tests.rs` because it had a number of cases
where it expected `Some` for the value of `next_cursor`, but the list of
rollouts was complete, so according to this docstring:
bcd64c7e72/codex-rs/app-server-protocol/src/protocol.rs (L334-L337)
If there are no more items to return, then `next_cursor` should be
`None`. This PR updates that logic.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658).
* #5803
* #5793
* __->__ #5658
This PR fixes a test that is sporadically failing in CI.
The problem is that two unit tests (the older `login_and_cancel_chatgpt`
and a recently added
`login_chatgpt_includes_forced_workspace_query_param`) exercise code
paths that start the login server. The server binds to a hard-coded
localhost port number, so attempts to start more than one server at the
same time will fail. If these two tests happen to run concurrently, one
of them will fail.
To fix this, I've added a simple mutex. We can use this same mutex for
future tests that use the same pattern.
This PR adds support for a model-based summary and risk assessment for
commands that violate the sandbox policy and require user approval. This
aids the user in evaluating whether the command should be approved.
The feature works by taking a failed command and passing it back to the
model and asking it to summarize the command, give it a risk level (low,
medium, high) and a risk category (e.g. "data deletion" or "data
exfiltration"). It uses a new conversation thread so the context in the
existing thread doesn't influence the answer. If the call to the model
fails or takes longer than 5 seconds, it falls back to the current
behavior.
For now, this is an experimental feature and is gated by a config key
`experimental_sandbox_command_assessment`.
Here is a screen shot of the approval prompt showing the risk assessment
and summary.
<img width="723" height="282" alt="image"
src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
/>
1. Adds AgentMessage, Reasoning, WebSearch items.
2. Switches the ResponseItem parsing to use new items and then also emit
3. Removes user-item kind and filters out "special" (environment) user
items when returning to clients.
Adds a `GET account/rateLimits/read` API to app-server. This calls the
codex backend to fetch the user's current rate limits.
This would be helpful in checking rate limits without having to send a
message.
For calling the codex backend usage API, I generated the types and
manually copied the relevant ones into `codex-backend-openapi-types`.
It'll be nice to extend our internal openapi generator to support Rust
so we don't have to run these manual steps.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.