Expand the rate-limit cache/TUI: store credit snapshots alongside
primary and secondary windows, render “Credits” when the backend reports
they exist (unlimited vs rounded integer balances)
This PR allows clients to render historical messages when resuming a
thread via `thread/resume` by reading from the list of `EventMsg`
payloads loaded from the rollout, and then transforming them into Turns
and ThreadItems to be returned on the `Thread` object.
This is implemented by leveraging `SessionConfiguredNotification` which
returns this list of `EventMsg` objects when resuming a conversation,
and then applying a stateful `ThreadHistoryBuilder` that parses from
this EventMsg log and transforms it into Turns and ThreadItems.
Note that we only persist a subset of `EventMsg`s in a rollout as
defined in `policy.rs`, so we lose fidelity whenever we resume a thread
compared to when we streamed the thread's turns originally. However,
this behavior is at parity with the legacy API.
## Summary
On app-server startup, detect whether the experimental sandbox is
enabled, and send a notification .
**Note**
New conversations will not respect the feature because we [ignore cli
overrides in
NewConversation](a75321a64c/codex-rs/app-server/src/codex_message_processor.rs (L1237-L1252)).
However, this should be okay, since we don't actually use config for
this, we use a [global
variable](87cce88f48/codex-rs/core/src/safety.rs (L105-L110)).
We should carefully unwind this setup at some point.
## Testing
- [ ] In progress: testing locally
---------
Co-authored-by: jif-oai <jif@openai.com>
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
This adds the following fields to `ThreadStartResponse` and
`ThreadResumeResponse`:
```rust
pub model: String,
pub model_provider: String,
pub cwd: PathBuf,
pub approval_policy: AskForApproval,
pub sandbox: SandboxPolicy,
pub reasoning_effort: Option<ReasoningEffort>,
```
This is important because these fields are optional in
`ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be
able to determine what values were ultimately used to start/resume the
conversation. (Though note that any of these could be changed later
between turns in the conversation.)
Though to get this information reliably, it must be read from the
internal `SessionConfiguredEvent` that is created in response to the
start of a conversation. Because `SessionConfiguredEvent` (as defined in
`codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a
number of them had to be added as part of this PR.
Because `SessionConfiguredEvent` is referenced in many tests, test
instances of `SessionConfiguredEvent` had to be updated, as well, which
is why this PR touches so many files.
similar to logic in
`codex/codex-rs/exec/src/event_processor_with_jsonl_output.rs`.
translation of v1 -> v2 events:
`codex/event/task_complete` -> `turn/completed`
`codex/event/turn_aborted` -> `turn/completed` with `interrupted` status
`codex/event/error` -> `turn/completed` with `error` status
this PR also makes `items` field in `Turn` optional. For now, we only
populate it when we resume a thread, and leave it as None for all other
places until we properly rewrite core to keep track of items.
tested using the codex app server client. example new event:
```
< {
< "method": "turn/completed",
< "params": {
< "turn": {
< "id": "0",
< "items": [],
< "status": "interrupted"
< }
< }
< }
```
## Summary
- update documentation, example configs, and automation defaults to
reference gpt-5.1 / gpt-5.1-codex
- bump the CLI and core configuration defaults, model presets, and error
messaging to the new models while keeping the model-family/tool coverage
for legacy slugs
- refresh tests, fixtures, and TUI snapshots so they expect the upgraded
defaults
## Testing
- `cargo test -p codex-core
config::tests::test_precedence_fixture_with_gpt5_profile`
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)
This PR adds the API V2 version of the command‑execution approval flow
for the shell tool.
This PR wires the new RPC (`item/commandExecution/requestApproval`, V2
only) and related events (`item/started`, `item/completed`, and
`item/commandExecution/delta`, which are emitted in both V1 and V2)
through the app-server
protocol. The new approval RPC is only sent when the user initiates a
turn with the new `turn/start` API so we don't break backwards
compatibility with VSCE.
The approach I took was to make as few changes to the Codex core as
possible, leveraging existing `EventMsg` core events, and translating
those in app-server. I did have to add additional fields to
`EventMsg::ExecCommandEndEvent` to capture the command's input so that
app-server can statelessly transform these events to a
`ThreadItem::CommandExecution` item for the `item/completed` event.
Once we stabilize the API and it's complete enough for our partners, we
can work on migrating the core to be aware of command execution items as
a first-class concept.
**Note**: We'll need followup work to make sure these APIs work for the
unified exec tool, but will wait til that's stable and landed before
doing a pass on app-server.
Example payloads below:
```
{
"method": "item/started",
"params": {
"item": {
"aggregatedOutput": null,
"command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
"cwd": "/Users/owen/repos/codex/codex-rs",
"durationMs": null,
"exitCode": null,
"id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
"parsedCmd": [
{
"cmd": "touch /tmp/should-trigger-approval",
"type": "unknown"
}
],
"status": "inProgress",
"type": "commandExecution"
}
}
}
```
```
{
"id": 0,
"method": "item/commandExecution/requestApproval",
"params": {
"itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU",
"parsedCmd": [
{
"cmd": "touch /tmp/should-trigger-approval",
"type": "unknown"
}
],
"reason": "Need to create file in /tmp which is outside workspace sandbox",
"risk": null,
"threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a",
"turnId": "1"
}
}
```
```
{
"id": 0,
"result": {
"acceptSettings": {
"forSession": false
},
"decision": "accept"
}
}
```
```
{
"params": {
"item": {
"aggregatedOutput": null,
"command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'",
"cwd": "/Users/owen/repos/codex/codex-rs",
"durationMs": 224,
"exitCode": 0,
"id": "call_lNWWsbXl1e47qNaYjFRs0dyU",
"parsedCmd": [
{
"cmd": "touch /tmp/should-trigger-approval",
"type": "unknown"
}
],
"status": "completed",
"type": "commandExecution"
}
}
}
```
A partner is consuming our generated JSON schema bundle for app-server
and identified a few issues:
- not all polymorphic / one-of types have a type descriminator
- `"$ref": "#/definitions/v2/SandboxPolicy"` is missing
- "Option<>" is an invalid schema name, and also unnecessary
This PR:
- adds the type descriminator to the various types that are missing it
except for `SessionSource` and `SubAgentSource` because they are
serialized to disk (adding this would break backwards compat for
resume), and they should not be necessary to consume for an integration
with app-server.
- removes the special handling in `export.rs` of various types like
SandboxPolicy, which turned out to be unnecessary and incorrect
- filters out `Option<>` which was auto-generated for request params
that don't need a body
For context, we currently pull in wayyy more types than we need through
the `EventMsg` god object which we are **not** planning to expose in API
v2 (this is how I suspect `SessionSource` and `SubAgentSource` are being
pulled in). But until we have all the necessary v2 notifications in
place that will allow us to remove `EventMsg`, we will keep exporting it
for now.
core event to app server event mapping:
1. `codex/event/reasoning_content_delta` ->
`item/reasoning/summaryTextDelta`.
2. `codex/event/reasoning_raw_content_delta` ->
`item/reasoning/textDelta`
3. `codex/event/agent_message_content_delta` →
`item/agentMessage/delta`.
4. `codex/event/agent_reasoning_section_break` ->
`item/reasoning/summaryPartAdded`.
Also added a change in core to pass down content index, summary index
and item id from events.
Tested with the `git checkout owen/app_server_test_client && cargo run
-p codex-app-server-test-client -- send-message-v2 "hello"` and verified
that new events are emitted correctly.
This updates `thread/resume` to be at parity with v1's
`ResumeConversationParams`. Turns out history is useful for codex cloud
and path is useful for the VSCode extension. And config overrides are
always useful.
This one should be quite straightforward, as it's just a translation of
TurnItem events we already emit to ThreadItem that app-server exposes to
customers.
To test, cp my change to owen/app_server_test_client and do the
following:
```
cargo build -p codex-cli
RUST_LOG=codex_app_server=info CODEX_BIN=target/debug/codex cargo run -p codex-app-server-test-client -- send-message-v2 "hello"
```
example event before (still kept there for backward compatibility):
```
{
< "method": "codex/event/item_completed",
< "params": {
< "conversationId": "019a74cc-fad9-7ab3-83a3-f42827b7b074",
< "id": "0",
< "msg": {
< "item": {
< "Reasoning": {
< "id": "rs_03d183492e07e20a016913a936eb8c81a1a7671a103fee8afc",
< "raw_content": [],
< "summary_text": [
< "Hey! What would you like to work on? I can explore the repo, run specific tests, or implement a change. Let's keep it short and straightforward. There's no need for a lengthy introduction or elaborate planning, just a friendly greeting and an open offer to help. I want to make sure the user feels welcomed and understood right from the start. It's all about keeping the tone friendly and concise!"
< ]
< }
< },
< "thread_id": "019a74cc-fad9-7ab3-83a3-f42827b7b074",
< "turn_id": "0",
< "type": "item_completed"
< }
< }
< }
```
after (v2):
```
< {
< "method": "item/completed",
< "params": {
< "item": {
< "id": "rs_03d183492e07e20a016913a936eb8c81a1a7671a103fee8afc",
< "text": "Hey! What would you like to work on? I can explore the repo, run specific tests, or implement a change. Let's keep it short and straightforward. There's no need for a lengthy introduction or elaborate planning, just a friendly greeting and an open offer to help. I want to make sure the user feels welcomed and understood right from the start. It's all about keeping the tone friendly and concise!",
< "type": "reasoning"
< }
< }
< }
```
Add a `codex generate-json-schema` command for generating a JSON schema
bundle of app-server types, analogous to the existing `codex
generate-ts` command for Typescript.
Add the following fields to Thread:
```
pub preview: String,
pub model_provider: String,
pub created_at: i64,
```
Will prob need another PR once this lands:
https://github.com/openai/codex/pull/6337
This PR does two things:
1. add a new function in core that maps the core-internal plan type to
the external plan type;
2. implement account/read that get account status (v2 of
`getAuthStatus`).
Implements:
```
turn/start
turn/interrupt
```
along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.
However, an improvement made for developer ergonomics:
- `turn/start` replaces both `SendUserMessage` (no turn overrides) and
`SendUserTurn` (can override model, approval policy, etc.)
This PR implements `account/login/start` and `account/login/completed`.
Instead of having separate endpoints for login with chatgpt and api, we
have a single enum handling different login methods. For sync auth
methods like sign in with api key, we still send a `completed`
notification back to be compatible with the async login flow.
I'm seeing two tests fail intermittently in CI. This PR attempts to
address (or at least mitigate) the flakiness.
* summarize_context_three_requests_and_instructions - The test snapshots
server.received_requests() immediately after observing TaskComplete.
Because the OpenAI /v1/responses call is streamed, the HTTP request can
still be draining when that event fires, so wiremock occasionally
reports only two captured requests. Fix is to wait for async activity to
complete.
* archive_conversation_moves_rollout_into_archived_directory - times out
on a slow CI run. Mitigation is to increase timeout value from 10s to
20s.
Implements:
```
thread/list
thread/start
thread/resume
thread/archive
```
along with their integration tests. These are relatively light wrappers
around the existing core logic, and changes to core logic are minimal.
However, an improvement made for developer ergonomics:
- `thread/start` and `thread/resume` automatically attaches a
conversation listener internally, so clients don't have to make a
separate `AddConversationListener` call like they do today.
For consistency, also updated `model/list` and `feedback/upload` (naming
conventions, list API params).
**Typescript and JSON schema exports**
While working on Thread/Turn/Items type definitions, I realize we will
run into name conflicts between v1 and v2 APIs (e.g. `RateLimitWindow`
which won't be reusable since v1 uses `RateLimitWindow` from `protocol/`
which uses snake_case, but we want to expose camelCase everywhere, so
we'll define a V2 version of that struct that serializes as camelCase).
To set us up for a clean and isolated v2 API, generate types into a
`v2/` namespace for both typescript and JSON schema.
- TypeScript: v2 types emit under `out_dir/v2/*.ts`, and root index.ts
now re-exports them via `export * as v2 from "./v2"`;.
- JSON Schemas: v2 definitions bundle under `#/definitions/v2/*` rather
than the root.
The location for the original types (v1 and types pulled from
`protocol/` and other core crates) haven't changed and are still at the
root. This is for backwards compatibility: no breaking changes to
existing usages of v1 APIs and types.
**Notifications**
While working on export.rs, I:
- refactored server/client notifications with macros (like we already do
for methods) so they also get exported (I noticed they weren't being
exported at all).
- removed the hardcoded list of types to export as JSON schema by
leveraging the existing macros instead
- and took a stab at API V2 notifications. These aren't wired up yet,
and I expect to iterate on these this week.
V2 for `account/updated` and `account/logout` for app server. correspond
to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs
will make other v2 endpoints call `account/updated` instead of
`authStatusChange` too.
Addresses issue https://github.com/openai/codex/issues/3582 where an
"archive conversation" command in the extension fails on Windows.
The problem is that the `archive_conversation` api server call is not
canonicalizing the path to the rollout path when performing its check to
verify that the rollout path is in the sessions directory. This causes
it to fail 100% of the time on Windows.
Testing: I was able to repro the error on Windows 100% prior to this
change. After the change, I'm no longer able to repro.
There's still some debate about whether we want to expose
`tools.view_image` or `feature.view_image` so those are left unchanged
for now, but this old `include_view_image_tool` config is good-to-go.
Also updated the doc to reflect that `view_image` tool is now by default
true.
We had this annotation everywhere in app-server APIs which made it so
that fields get serialized as `field?: T`, meaning if the field as
`None` we would omit the field in the payload. Removing this annotation
changes it so that we return `field: T | null` instead, which makes
codex app-server's API more aligned with the convention of public OpenAI
APIs like Responses.
Separately, remove the `#[ts(optional_fields = nullable)]` annotations
that were recently added which made all the TS types become `field?: T |
null` which is not great since clients need to handle undefined and
null.
I think generally it'll be best to have optional types be either:
- `field: T | null` (preferred, aligned with public OpenAI APIs)
- `field?: T` where we have to, such as types generated from the MCP
schema:
https://github.com/modelcontextprotocol/modelcontextprotocol/blob/main/schema/2025-06-18/schema.ts
(see changes to `mcp-types/`)
I updated @etraut-openai's unit test to check that all generated TS
types are one or the other, not both (so will error if we have a type
that has `field?: T | null`). I don't think there's currently a good use
case for that - but we can always revisit.
The goal is to have a single place where we actually write files
In a follow-up PR, will move everything config related in a dedicated
module and move the helpers in a dedicated file
We currently have nested enums when sending raw response items in the
app-server protocol. This makes downstream schemas confusing because we
need to embed `type`-discriminated enums within each other.
This PR adds a small wrapper around the response item so we can keep the
schemas separate