## Summary
- add deterministic child-process cleanup to both test `McpProcess`
helpers
- keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
polling in `Drop`
- document the failure mode and why this avoids nondeterministic `LEAK`
flakes
## Why
`cargo nextest` leak detection can intermittently report `LEAK` when a
spawned server outlives test teardown, making CI flaky.
## Testing
- `just fmt`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
## Failing CI Reference
- Original failing job:
https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
Hardens PTY Python REPL test and make MCP test startup deterministic
**Summary**
- `utils/pty/src/tests.rs`
- Added a REPL readiness handshake (`wait_for_python_repl_ready`) that
repeatedly sends a marker and waits for it in PTY output before sending
test commands.
- Updated `pty_python_repl_emits_output_and_exits` to:
- wait for readiness first,
- preserve startup output,
- append output collected through process exit.
- Reduces Windows/ConPTY flakiness from early stdin writes racing REPL
startup.
- `mcp-server/tests/suite/codex_tool.rs`
- Avoid remote model refresh during MCP test startup, reducing
timeout-prone nondeterminism.
We started working with MCP in Codex before
https://crates.io/crates/rmcp was mature, so we had our own crate for
MCP types that was generated from the MCP schema:
8b95d3e082/codex-rs/mcp-types/README.md
Now that `rmcp` is more mature, it makes more sense to use their MCP
types in Rust, as they handle details (like the `_meta` field) that our
custom version ignored. Though one advantage that our custom types had
is that our generated types implemented `JsonSchema` and `ts_rs::TS`,
whereas the types in `rmcp` do not. As such, part of the work of this PR
is leveraging the adapters between `rmcp` types and the serializable
types that are API for us (app server and MCP) introduced in #10356.
Note this PR results in a number of changes to
`codex-rs/app-server-protocol/schema`, which merit special attention
during review. We must ensure that these changes are still
backwards-compatible, which is possible because we have:
```diff
- export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, };
+ export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, };
```
so `ContentBlock` has been replaced with the more general `JsonValue`.
Note that `ContentBlock` was defined as:
```typescript
export type ContentBlock = TextContent | ImageContent | AudioContent | ResourceLink | EmbeddedResource;
```
so the deletion of those individual variants should not be a cause of
great concern.
Similarly, we have the following change in
`codex-rs/app-server-protocol/schema/typescript/Tool.ts`:
```
- export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, };
+ export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, };
```
so:
- `annotations?: ToolAnnotations` ➡️ `JsonValue`
- `inputSchema: ToolInputSchema` ➡️ `JsonValue`
- `outputSchema?: ToolOutputSchema` ➡️ `JsonValue`
and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349).
* #10357
* __->__ #10349
* #10356
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.
Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).
If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:
- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:
```
failures:
suite::compact::manual_compact_twice_preserves_latest_user_messages
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```
- `just build-for-release` to build release binaries for all
platforms/architectures remotely
To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)
## CI
In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.
The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:
```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```
Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:
- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.
Release builds will also continue to be done by Cargo for now.
Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504
---------
Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.
As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.
See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.
Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.
My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
This PR adds support for a model-based summary and risk assessment for
commands that violate the sandbox policy and require user approval. This
aids the user in evaluating whether the command should be approved.
The feature works by taking a failed command and passing it back to the
model and asking it to summarize the command, give it a risk level (low,
medium, high) and a risk category (e.g. "data deletion" or "data
exfiltration"). It uses a new conversation thread so the context in the
existing thread doesn't influence the answer. If the call to the model
fails or takes longer than 5 seconds, it falls back to the current
behavior.
For now, this is an experimental feature and is gated by a config key
`experimental_sandbox_command_assessment`.
Here is a screen shot of the approval prompt showing the risk assessment
and summary.
<img width="723" height="282" alt="image"
src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
/>
This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent`
in the core protocol (`protocol/src/protocol.rs`), which is also what
this field is named on `ExecCommandBeginEvent`. Honestly, I don't love
the name (it sounds like a single command, but it is actually a list of
them), but I don't want to get distracted by a naming discussion right
now.
This also adds `parsed_cmd` to `ExecCommandApprovalParams` in
`codex-rs/app-server-protocol/src/protocol.rs`, so it will be available
via `codex app-server`, as well.
For consistency, I also updated `ExecApprovalElicitRequestParams` in
`codex-rs/mcp-server/src/exec_approval.rs` to include this field under
the name `codex_parsed_cmd`, as that struct already has a number of
special `codex_*` fields. Note this is the code for when Codex is used
as an MCP _server_ and therefore has to conform to the official spec for
an MCP elicitation type.
This is a very large PR with some non-backwards-compatible changes.
Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish
server that had two overlapping responsibilities:
- Running an MCP server, providing some basic tool calls.
- Running the app server used to power experiences such as the VS Code
extension.
This PR aims to separate these into distinct concepts:
- `codex mcp-server` for the MCP server
- `codex app-server` for the "application server"
Note `codex mcp` still exists because it already has its own subcommands
for MCP management (`list`, `add`, etc.)
The MCP logic continues to live in `codex-rs/mcp-server` whereas the
refactored app server logic is in the new `codex-rs/app-server` folder.
Note that most of the existing integration tests in
`codex-rs/mcp-server/tests/suite` were actually for the app server, so
all the tests have been moved with the exception of
`codex-rs/mcp-server/tests/suite/mod.rs`.
Because this is already a large diff, I tried not to change more than I
had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses
the name `McpProcess` for now, but I will do some mechanical renamings
to things like `AppServer` in subsequent PRs.
While `mcp-server` and `app-server` share some overlapping functionality
(like reading streams of JSONL and dispatching based on message types)
and some differences (completely different message types), I ended up
doing a bit of copypasta between the two crates, as both have somewhat
similar `message_processor.rs` and `outgoing_message.rs` files for now,
though I expect them to diverge more in the near future.
One material change is that of the initialize handshake for `codex
app-server`, as we no longer use the MCP types for that handshake.
Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an
`Initialize` variant to `ClientRequest`, which takes the `ClientInfo`
object we need to update the `USER_AGENT_SUFFIX` in
`codex-rs/app-server/src/message_processor.rs`.
One other material change is in
`codex-rs/app-server/src/codex_message_processor.rs` where I eliminated
a use of the `send_event_as_notification()` method I am generally trying
to deprecate (because it blindly maps an `EventMsg` into a
`JSONNotification`) in favor of `send_server_notification()`, which
takes a `ServerNotification`, as that is intended to be a custom enum of
all notification types supported by the app server. So to make this
update, I had to introduce a new variant of `ServerNotification`,
`SessionConfigured`, which is a non-backwards compatible change with the
old `codex mcp`, and clients will have to be updated after the next
release that contains this PR. Note that
`codex-rs/app-server/tests/suite/list_resume.rs` also had to be update
to reflect this change.
I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility
crate to avoid some of the copying between `mcp-server` and
`app-server`.
This changes the reqwest client used in tests to be sandbox-friendly,
and skips a bunch of other tests that don't work inside the
sandbox/without network.
This PR addresses an edge-case bug that appears in the VS Code extension
in the following situation:
1. Log in using ChatGPT (using either the CLI or extension). This will
create an `auth.json` file.
2. Manually modify `config.toml` to specify a custom provider.
3. Start a fresh copy of the VS Code extension.
The profile menu in the VS Code extension will indicate that you are
logged in using ChatGPT even though you're not.
This is caused by the `get_auth_status` method returning an
`auth_method: 'chatgpt'` when a custom provider is configured and it
doesn't use OpenAI auth (i.e. `requires_openai_auth` is false). The
method should always return `auth_method: None` if
`requires_openai_auth` is false.
The same bug also causes the NUX (new user experience) screen to be
displayed in the VSCE in this situation.
It turns out that we want slightly different behavior for the
`SetDefaultModel` RPC because some models do not work with reasoning
(like GPT-4.1), so we should be able to explicitly clear this value.
Verified in `codex-rs/mcp-server/tests/suite/set_default_model.rs`.
This adds `SetDefaultModel`, which takes `model` and `reasoning_effort`
as optional fields. If set, the field will overwrite what is in the
user's `config.toml`.
This reuses logic that was added to support the `/model` command in the
TUI: https://github.com/openai/codex/pull/2799.
`ClientRequest::NewConversation` picks up the reasoning level from the user's defaults in `config.toml`, so it should be reported in `NewConversationResponse`.
This PR does the following:
* Adds the ability to paste or type an API key.
* Removes the `preferred_auth_method` config option. The last login
method is always persisted in auth.json, so this isn't needed.
* If OPENAI_API_KEY env variable is defined, the value is used to
prepopulate the new UI. The env variable is otherwise ignored by the
CLI.
* Adds a new MCP server entry point "login_api_key" so we can implement
this same API key behavior for the VS Code extension.
<img width="473" height="140" alt="Screenshot 2025-09-04 at 3 51 04 PM"
src="https://github.com/user-attachments/assets/c11bbd5b-8a4d-4d71-90fd-34130460f9d9"
/>
<img width="726" height="254" alt="Screenshot 2025-09-04 at 3 51 32 PM"
src="https://github.com/user-attachments/assets/6cc76b34-309a-4387-acbc-15ee5c756db9"
/>
This adds a simple endpoint that provides the email address encoded in
`$CODEX_HOME/auth.json`.
As noted, for now, we do not hit the server to verify this is the user's
true email address.
https://github.com/openai/codex/pull/3395 updated `mcp-types/src/lib.rs`
by hand, but that file is generated code that is produced by
`mcp-types/generate_mcp_types.py`. Unfortunately, we do not have
anything in CI to verify this right now, but I will address that in a
subsequent PR.
#3395 ended up introducing a change that added a required field when
deserializing `InitializeResult`, breaking Codex when used as an MCP
client, so the quick fix in #3436 was to make the new field `Optional`
with `skip_serializing_if = "Option::is_none"`, but that did not address
the problem that `mcp-types/generate_mcp_types.py` and
`mcp-types/src/lib.rs` are out of sync.
This PR gets things back to where they are in sync. It removes the
custom `mcp_types::McpClientInfo` type that was added to
`mcp-types/src/lib.rs` and forces us to use the generated
`mcp_types::Implementation` type. Though this PR also updates
`generate_mcp_types.py` to generate the additional `user_agent:
Optional<String>` field on `Implementation` so that we can continue to
specify it when Codex operates as an MCP server.
However, this also requires us to specify `user_agent: None` when Codex
operates as an MCP client.
We may want to introduce our own `InitializeResult` type that is
specific to when we run as a server to avoid this in the future, but my
immediate goal is just to get things back in sync.
This PR improves two existing auth-related tests. They were failing when
run in an environment where an `OPENAI_API_KEY` env variable was
defined. The change makes them more resilient.
The previous config approach had a few issues:
1. It is part of the config but not designed to be used externally
2. It had to be wired through many places (look at the +/- on this PR
3. It wasn't guaranteed to be set consistently everywhere because we
don't have a super well defined way that configs stack. For example, the
extension would configure during newConversation but anything that
happened outside of that (like login) wouldn't get it.
This env var approach is cleaner and also creates one less thing we have
to deal with when coming up with a better holistic story around configs.
One downside is that I removed the unit test testing for the override
because I don't want to deal with setting the global env or spawning
child processes and figuring out how to introspect their originator
header. The new code is sufficiently simple and I tested it e2e that I
feel as if this is still worth it.
Adds support for `ArchiveConversation` in the JSON-RPC server that takes
a `(ConversationId, PathBuf)` pair and:
- verifies the `ConversationId` corresponds to the rollout id at the
`PathBuf`
- if so, invokes
`ConversationManager.remove_conversation(ConversationId)`
- if the `CodexConversation` was in memory, send `Shutdown` and wait for
`ShutdownComplete` with a timeout
- moves the `.jsonl` file to `$CODEX_HOME/archived_sessions`
---------
Co-authored-by: Gabriel Peal <gabriel@openai.com>
Adding the `rollout_path` to the `NewConversationResponse` makes it so a
client can perform subsequent operations on a `(ConversationId,
PathBuf)` pair. #3353 will introduce support for `ArchiveConversation`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/3352).
* #3353
* __->__ #3352
This PR does multiple things that are necessary for conversation resume
to work from the extension. I wanted to make sure everything worked so
these changes wound up in one PR:
1. Generate more ts types
2. Resume rollout history files rather than create a new one every time
it is resumed so you don't see a duplicate conversation in history for
every resume. Chatted with @aibrahim-oai to verify this
3. Return conversation_id in conversation summaries
4. [Cleanup] Use serde and strong types for a lot of the rollout file
parsing
We're trying to migrate from `session_id: Uuid` to `conversation_id:
ConversationId`. Not only does this give us more type safety but it
unifies our terminology across Codex and with the implementation of
session resuming, a conversation (which can span multiple sessions) is
more appropriate.
I started this impl on https://github.com/openai/codex/pull/3219 as part
of getting resume working in the extension but it's big enough that it
should be broken out.
When item ids are sent to Responses API it will load them from the
database ignoring the provided values. This adds extra latency.
Not having the mode to store requests also allows us to simplify the
code.
## Breaking change
The `disable_response_storage` configuration option is removed.
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
## Summary
Follow-up to #3056
This PR updates the mcp-server interface for reading the config settings
saved by the user. At risk of introducing _another_ Config struct, I
think it makes sense to avoid tying our protocol to ConfigToml, as its
become a bit unwieldy. GetConfigTomlResponse was a de-facto struct for
this already - better to make it explicit, in my opinion.
This is technically a breaking change of the mcp-server protocol, but
given the previous interface was introduced so recently in #2725, and we
have not yet even started to call it, I propose proceeding with the
breaking change - but am open to preserving the old endpoint.
## Testing
- [x] Added additional integration test coverage