core-agent-ide

Author	SHA1	Message	Date
pakrym-oai	dee03da508	Move environment abstraction into exec server (#15125 ) The idea is that codex-exec exposes an Environment struct with services on it. Each of those is a trait. Depending on construction parameters passed to Environment they are either backed by local or remote server but core doesn't see these differences.	2026-03-19 08:31:14 -07:00
pakrym-oai	903660edba	Remove stdio transport from exec server (#15119 ) Summary - delete the deprecated stdio transport plumbing from the exec server stack - add a basic `exec_server()` harness plus test utilities to start a server, send requests, and await events - refresh exec-server dependencies, configs, and documentation to reflect the new flow Testing - Not run (not requested) --------- Co-authored-by: starr-openai <starr@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-19 01:00:35 +00:00
starr-openai	81996fcde6	Add exec-server stub server and protocol docs (#15089 ) Stacked PR 1/3. This is the initialize-only exec-server stub slice: binary/client scaffolding and protocol docs, without exec/filesystem implementation. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 00:30:05 +00:00
Michael Bolin	38f84b6b29	refactor: delete exec-server and move execve wrapper into shell-escalation (#12632 ) ## Why We already plan to remove the shell-tool MCP path, and doing that cleanup first makes the follow-on `shell-escalation` work much simpler. This change removes the last remaining reason to keep `codex-rs/exec-server` around by moving the `codex-execve-wrapper` binary and shared shell test fixtures to the crates/tests that now own that functionality. ## What Changed ### Delete `codex-rs/exec-server` - Remove the `exec-server` crate, including the MCP server binary, MCP-specific modules, and its test support/test suite - Remove `exec-server` from the `codex-rs` workspace and update `Cargo.lock` ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation` - Move the wrapper implementation into `shell-escalation` (`src/unix/execve_wrapper.rs`) - Add the `codex-execve-wrapper` binary entrypoint under `shell-escalation/src/bin/` - Update `shell-escalation` exports/module layout so the wrapper entrypoint is hosted there - Move the wrapper README content from `exec-server` to `shell-escalation/README.md` ### Move shared shell test fixtures to `app-server` - Move the DotSlash `bash`/`zsh` test fixtures from `exec-server/tests/suite/` to `app-server/tests/suite/` - Update `app-server` zsh-fork tests to reference the new fixture paths ### Keep `shell-tool-mcp` as a shell-assets package - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm artifact contains only patched Bash/Zsh payloads (no Rust binaries) - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`, and docs to reflect the shell-assets-only package shape - `shell-tool-mcp-ci.yml` does not need changes because it is already JS-only ## Verification - `cargo shear` - `cargo clippy -p codex-shell-escalation --tests` - `just clippy`	2026-02-23 20:10:22 -08:00
Michael Bolin	5221575f23	refactor: normalize unix module layout for exec-server and shell-escalation (#12556 ) ## Why Shell execution refactoring in `exec-server` had become split between duplicated code paths, which blocked a clean introduction of the new reusable shell escalation flow. This commit creates a dedicated foundation crate so later shell tooling changes can share one implementation. ## What changed - Added the `codex-shell-escalation` crate and moved the core escalation pieces (`mcp` protocol/socket/session flow, policy glue) that were previously in `exec-server` into it. - Normalized `exec-server` Unix structure under a dedicated `unix` module layout and kept non-Unix builds narrow. - Wired crate/build metadata so `shell-escalation` is a first-class workspace dependency for follow-on integration work. ## Verification - Built and linted the stack at this commit point with `just clippy`. [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12556). * #12584 * #12583 * __->__ #12556	2026-02-23 09:28:17 -08:00
Michael Bolin	956f2f439e	refactor: decouple MCP policy construction from escalate server (#12555 ) ## Why The current escalate path in `codex-rs/exec-server` still had policy creation coupled to MCP details, which makes it hard to reuse the shell execution flow outside the MCP server. This change is part of a broader goal to split MCP-specific behavior from shared escalation execution so other handlers (for example a future `ShellCommandHandler`) can reuse it without depending on MCP request context types. ## What changed - Added a new `EscalationPolicyFactory` abstraction in `mcp.rs`: - `crate`-relative path: `codex-rs/exec-server/src/posix/mcp.rs` - https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L87-L107 - Made `run_escalate_server` in `mcp.rs` accept a policy factory instead of constructing `McpEscalationPolicy` directly. - https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L178-L201 - Introduced `McpEscalationPolicyFactory` that stores MCP-only state (`RequestContext`, `preserve_program_paths`) and implements the new trait. - https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L100-L117 - Updated `shell()` to pass a `McpEscalationPolicyFactory` instance into `run_escalate_server`, so the server remains the MCP-specific wiring layer. - https://github.com/openai/codex/blob/main/codex-rs/exec-server/src/posix/mcp.rs#L163-L170 ## Verification - Build and test execution was not re-run in this pass; changes are limited to `mcp.rs` and preserve the existing escalation flow semantics by only extracting policy construction behind a factory. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12555). * #12556 * __->__ #12555	2026-02-23 00:31:29 -08:00
Michael Bolin	e8949f4507	test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518 ) ## Why The zsh integration tests were still brittle in two ways: - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so they often did not exercise the patched zsh fork that `shell-tool-mcp` ships - once the tests consistently used the vendored zsh fork, they exposed real Linux-specific zsh-fork issues in CI In particular, the Linux failures were not just test noise: - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode could receive malformed arguments - the `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` test uses the zsh exec bridge (which talks to the parent over a Unix socket), but Linux restricted sandbox seccomp denies `connect(2)`, causing timeouts on `ubuntu-24.04` x86/arm This PR makes the zsh tests consistently run against the intended vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI signal is meaningful. ## What Changed - Added a single shared test-only DotSlash file for the patched zsh fork at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing `bash` test resource). - Updated both app-server and exec-server zsh tests to use that shared DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH` dependency). - Updated the app-server zsh-fork test helper to resolve the shared DotSlash zsh and avoid silently falling back to host zsh. - Kept the app-server zsh-fork tests configured via `config.toml`, using a test wrapper path where needed to force `zsh -df` (and rewrite `-lc` to `-c`) for the subcommand-decline test. - Hardened the app-server subcommand-decline zsh-fork test for CI variability: - tolerate an extra `/responses` POST with a no-op mock response - tolerate non-target approval ordering while remaining strict on the two `/usr/bin/true` approvals and decline behavior - use `DangerFullAccess` on Linux for this one test because it validates zsh approval flow, not Linux sandbox socket restrictions - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox` arg0 dispatch continues to work. - Moved `maybe_run_zsh_exec_wrapper_mode()` under `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode handling coexists correctly with arg0-dispatched helper modes. - Consolidated duplicated `dotslash -- fetch` resolution logic into shared test support (`core/tests/common/lib.rs`). - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh differences by: - resolving an absolute `git` path - running `git init --quiet .` - asserting success / `.git` creation instead of relying on banner text ## Verification - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture` - `cargo test -p codex-exec-server accept_elicitation -- --nocapture` - `bazel test //codex-rs/exec-server:exec-server-all-test --test_output=streamed --test_arg=--nocapture --test_arg=accept_elicitation_for_prompt_rule_with_zsh` - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 - x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm - aarch64-unknown-linux-gnu` passed in [run 22291424358](https://github.com/openai/codex/actions/runs/22291424358)	2026-02-22 19:39:56 -08:00
Michael Bolin	1af2a37ada	chore: remove codex-core public protocol/shell re-exports (#12432 ) ## Why `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules from `codex-protocol` and `codex-shell-command`. That made it easy for workspace crates to import those APIs through `codex-core`, which in turn hides dependency edges and makes it harder to reduce compile-time coupling over time. This change removes those public re-exports so call sites must import from the source crates directly. Even when a crate still depends on `codex-core` today, this makes dependency boundaries explicit and unblocks future work to drop `codex-core` dependencies where possible. ## What Changed - Removed public re-exports from `codex-rs/core/src/lib.rs` for: - `codex_protocol::protocol` and related protocol/model types (including `InitialHistory`) - `codex_protocol::config_types` (`protocol_config_types`) - `codex_shell_command::{bash, is_dangerous_command, is_safe_command, parse_command, powershell}` - Migrated workspace Rust call sites to import directly from: - `codex_protocol::protocol` - `codex_protocol::config_types` - `codex_protocol::models` - `codex_shell_command` - Added explicit `Cargo.toml` dependencies (`codex-protocol` / `codex-shell-command`) in crates that now import those crates directly. - Kept `codex-core` internal modules compiling by using `pub(crate)` aliases in `core/src/lib.rs` (internal-only, not part of the public API). - Updated the two utility crates that can already drop a `codex-core` dependency edge entirely: - `codex-utils-approval-presets` - `codex-utils-cli` ## Verification - `cargo test -p codex-utils-approval-presets` - `cargo test -p codex-utils-cli` - `cargo check --workspace --all-targets` - `just clippy`	2026-02-20 23:45:35 -08:00
viyatb-oai	e8afaed502	Refactor network approvals to host/protocol/port scope (#12140 ) ## Summary Simplify network approvals by removing per-attempt proxy correlation and moving to session-level approval dedupe keyed by (host, protocol, port). Instead of encoding attempt IDs into proxy credentials/URLs, we now treat approvals as a destination policy decision. - Concurrent calls to the same destination share one approval prompt. - Different destinations (or same host on different ports) get separate prompts. - Allow once approves the current queued request group only. - Allow for session caches that (host, protocol, port) and auto-allows future matching requests. - Never policy continues to deny without prompting. Example: - 3 calls: - a.com (line 443) - b.com (line 443) - a.com (line 443) => 2 prompts total (a, b), second a waits on the first decision. - a.com:80 is treated separately from a.com line 443 ## Testing - `just fmt` (in `codex-rs`) - `cargo test -p codex-core tools::network_approval::tests` - `cargo test -p codex-core` (unit tests pass; existing integration-suite failures remain in this environment)	2026-02-20 10:39:55 -08:00
viyatb-oai	b527ee2890	feat(core): add structured network approval plumbing and policy decision model (#11672 ) ### Description #### Summary Introduces the core plumbing required for structured network approvals #### What changed - Added structured network policy decision modeling in core. - Added approval payload/context types needed for network approval semantics. - Wired shell/unified-exec runtime plumbing to consume structured decisions. - Updated related core error/event surfaces for structured handling. - Updated protocol plumbing used by core approval flow. - Included small CLI debug sandbox compatibility updates needed by this layer. #### Why establishes the minimal backend foundation for network approvals without yet changing high-level orchestration or TUI behavior. #### Notes - Behavior remains constrained by existing requirements/config gating. - Follow-up PRs in the stack handle orchestration, UX, and app-server integration. --------- Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>	2026-02-14 04:18:12 +00:00
Jeremy Rose	9cf7a07281	feat(shell-tool-mcp): add patched zsh build pipeline (#11668 ) ## Summary - add `shell-tool-mcp/patches/zsh-exec-wrapper.patch` against upstream zsh `77045ef899e53b9598bebc5a41db93a548a40ca6` - add `zsh-linux` and `zsh-darwin` jobs to `.github/workflows/shell-tool-mcp.yml` - stage zsh binaries under `artifacts/vendor/<target>/zsh/<variant>/zsh` - include zsh artifact jobs in `package.needs` - mark staged zsh binaries executable during packaging ## Notes - zsh source is cloned from `https://git.code.sf.net/p/zsh/code` - workflow pins zsh commit `77045ef899e53b9598bebc5a41db93a548a40ca6` - zsh build runs `./Util/preconfig` before `./configure` ## Validation - parsed workflow YAML locally (`yaml-ok`) - validated zsh patch applies cleanly with `git apply --check` on a fresh zsh clone	2026-02-13 01:34:48 +00:00
Gabriel Peal	bd3ce98190	Bump rmcp to 0.15 (#11539 ) https://github.com/modelcontextprotocol/rust-sdk/pull/598 in 0.14 broke some MCP oauth (like Linear) and https://github.com/modelcontextprotocol/rust-sdk/pull/641 fixed it in 0.15	2026-02-11 22:04:17 -08:00
Michael Bolin	abbd74e2be	feat: make sandbox read access configurable with `ReadOnlyAccess` (#11387 ) `SandboxPolicy::ReadOnly` previously implied broad read access and could not express a narrower read surface. This change introduces an explicit read-access model so we can support user-configurable read restrictions in follow-up work, while preserving current behavior today. It also ensures unsupported backends fail closed for restricted-read policies instead of silently granting broader access than intended. ## What - Added `ReadOnlyAccess` in protocol with: - `Restricted { include_platform_defaults, readable_roots }` - `FullAccess` - Updated `SandboxPolicy` to carry read-access configuration: - `ReadOnly { access: ReadOnlyAccess }` - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }` - Preserved existing behavior by defaulting current construction paths to `ReadOnlyAccess::FullAccess`. - Threaded the new fields through sandbox policy consumers and call sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and related tests. - Updated Seatbelt policy generation to honor restricted read roots by emitting scoped read rules when full read access is not granted. - Added fail-closed behavior on Linux and Windows backends when restricted read access is requested but not yet implemented there (`UnsupportedOperation`). - Regenerated app-server protocol schema and TypeScript artifacts, including `ReadOnlyAccess`. ## Compatibility / rollout - Runtime behavior remains unchanged by default (`FullAccess`). - API/schema changes are in place so future config wiring can enable restricted read access without another policy-shape migration.	2026-02-11 18:31:14 -08:00
Michael Bolin	383b45279e	feat: include NetworkConfig through ExecParams (#11105 ) This PR adds the following field to `Config`: ```rust pub network: Option<NetworkProxy>, ``` Though for the moment, it will always be initialized as `None` (this will be addressed in a subsequent PR). This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.	2026-02-09 03:32:17 +00:00
Matthew Zeng	9f1009540b	Upgrade rmcp to 0.14 (#10718 ) - [x] Upgrade rmcp to 0.14	2026-02-08 15:07:53 -08:00
viyatb-oai	ae4de43ccc	feat(linux-sandbox): add bwrap support (#9938 ) ## Summary This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The curent Linux sandbox path relies on in-process restrictions (including Landlock). Bubblewrap gives us a more uniform filesystem isolation model, especially explicit writable roots with the option to make some directories read-only and granular network controls. This is behind a feature flag so we can validate behavior safely before making it the default. - Added temporary rollout flag: - `features.use_linux_sandbox_bwrap` - Preserved existing default path when the flag is off. - In Bubblewrap mode: - Added internal retry without /proc when /proc mount is not permitted by the host/container.	2026-02-04 11:13:17 -08:00
gt-oai	a046481ad9	Wire up cloud reqs in exec, app-server (#10241 ) We're fetching cloud requirements in TUI in https://github.com/openai/codex/pull/10167. This adds the same fetching in exec and app-server binaries also.	2026-01-30 23:53:41 +00:00
gt-oai	e85d019daa	Fetch Requirements from cloud (#10167 ) Load requirements from Codex Backend. It only does this for enterprise customers signed in with ChatGPT. Todo in follow-up PRs: * Add to app-server and exec too * Switch from fail-open to fail-closed on failure	2026-01-30 12:03:29 +00:00
iceweasel-oai	c40ad65bd8	remove sandbox globals. (#9797 ) Threads sandbox updates through OverrideTurnContext for active turn Passes computed sandbox type into safety/exec	2026-01-27 11:04:23 -08:00
zbarsky-openai	2a06d64bc9	feat: add support for building with Bazel (#8875 ) This PR configures Codex CLI so it can be built with [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc` includes configuration so that remote builds can be done using [BuildBuddy](https://www.buildbuddy.io). If you are familiar with Bazel, things should work as you expect, e.g., run `bazel test //... --keep-going` to run all the tests in the repo, but we have also added some new aliases in the `justfile` for convenience: - `just bazel-test` to run tests locally - `just bazel-remote-test` to run tests remotely (currently, the remote build is for x86_64 Linux regardless of your host platform). Note we are currently seeing the following test failures in the remote build, so we still need to figure out what is happening here: ``` failures: suite::compact::manual_compact_twice_preserves_latest_user_messages suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view ``` - `just build-for-release` to build release binaries for all platforms/architectures remotely To setup remote execution: - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI employees should also request org access at https://openai.buildbuddy.io/join/ with their `@openai.com` email address.) - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to `~/.bazelrc` (add the line `build --remote_header=x-buildbuddy-api-key=YOUR_KEY`) - Use `--config=remote` in your `bazel` invocations (or add `common --config=remote` to your `~/.bazelrc`, or use the `just` commands) ## CI In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners (we are working on supporting Windows, but that is not ready yet). Note that the failures we are seeing in `just bazel-remote-test` do not occur on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml` is green right now. The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so that macOS CI jobs build _remotely_ on Linux hosts (using the `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the root `BUILD.bazel`) using cross-compilation to build the macOS artifacts. Then these artifacts are downloaded locally to GitHub's macOS runner so the tests can be executed natively. This is the relevant config that enables this: ``` common:macos --config=remote common:macos --strategy=remote common:macos --strategy=TestRunner=darwin-sandbox,local ``` Because of the remote caching benefits we get from BuildBuddy, these new CI jobs can be extremely fast! For example, consider these two jobs that ran all the tests on Linux x86_64: - Bazel 1m37s https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875 - Cargo 9m20s https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875 For now, we will continue to run both the Bazel and Cargo jobs for PRs, but once we add support for Windows and running Clippy, we should be able to cutover to using Bazel exclusively for PRs, which should still speed things up considerably. We will probably continue to run the Cargo jobs post-merge for commits that land on `main` as a sanity check. Release builds will also continue to be done by Cargo for now. Earlier attempt at this PR: https://github.com/openai/codex/pull/8832 Earlier attempt to add support for Buck2, now abandoned: https://github.com/openai/codex/pull/8504 --------- Co-authored-by: David Zbarsky <dzbarsky@gmail.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-01-09 11:09:43 -08:00
Michael Bolin	a70f5b0b3c	fix: correct login shell mismatch in the accept_elicitation_for_prompt_rule() test (#8931 ) Because the path to `git` is used to construct `elicitations_to_accept`, we need to ensure that we resolve which `git` to use the same way our Bash process will: `c9c6560685/codex-rs/exec-server/tests/suite/accept_elicitation.rs (L59-L69)` This fixes an issue when running the test on macOS using Bazel (https://github.com/openai/codex/pull/8875) where the login shell chose `/opt/homebrew/bin/git` whereas the non-login shell chose `/usr/bin/git`.	2026-01-08 12:37:38 -08:00
Michael Bolin	35fd69a9f0	fix: make the find_resource! macro responsible for the absolutize() call (#8884 ) https://github.com/openai/codex/pull/8879 introduced the `find_resource!` macro, but now that I am about to use it in more places, I realize that it should take care of this normalization case for callers. Note the `use $crate::path_absolutize::Absolutize;` line is there so that users of `find_resource!` do not have to explicitly include `path-absolutize` to their own `Cargo.toml`.	2026-01-07 23:03:43 -08:00
Michael Bolin	f6b563ec64	feat: introduce find_resource! macro that works with Cargo or Bazel (#8879 ) To support Bazelification in https://github.com/openai/codex/pull/8875, this PR introduces a new `find_resource!` macro that we use in place of our existing logic in tests that looks for resources relative to the compile-time `CARGO_MANIFEST_DIR` env var. To make this work, we plan to add the following to all `rust_library()` and `rust_test()` Bazel rules in the project: ``` rustc_env = { "BAZEL_PACKAGE": native.package_name(), }, ``` Our new `find_resource!` macro reads this value via `option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code using `find_resource!`_ is injected into the code expanded from the macro. (If `find_resource()` were a function, then `option_env!("BAZEL_PACKAGE")` would always be `codex-rs/utils/cargo-bin`, which is not what we want.) Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR` environment variable is set at runtime, indicating that the test is being run by Bazel. In this case, we have to concatenate the runtime `RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the path to the resource. In testing this change, I discovered one funky edge case in `codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_ (but not canonicalize!) the result from `find_resource!` because the path contains a `common/..` component that does not exist on disk when the test is run under Bazel, so it must be semantically normalized using the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate before it is passed to `dotslash fetch`. Because this new behavior may be non-obvious, this PR also updates `AGENTS.md` to make humans/Codex aware that this API is preferred.	2026-01-07 18:06:08 -08:00
Michael Bolin	54b290ec1d	fix: update resource path resolution logic so it works with Bazel (#8861 ) The Bazelification work in-flight over at https://github.com/openai/codex/pull/8832 needs this fix so that Bazel can find the path to the DotSlash file for `bash`. With this change, the following almost works: ``` bazel test --test_output=errors //codex-rs/exec-server:exec-server-all-test ``` That is, now the `list_tools` test passes, but `accept_elicitation_for_prompt_rule` still fails because it runs Seatbelt itself, so it needs to be run outside Bazel's local sandboxing.	2026-01-07 22:33:05 +00:00
Michael Bolin	e61bae12e3	feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496 ) This PR introduces a `codex-utils-cargo-bin` utility crate that wraps/replaces our use of `assert_cmd::Command` and `escargot::CargoBuild`. As you can infer from the introduction of `buck_project_root()` in this PR, I am attempting to make it possible to build Codex under [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to achieve faster incremental local builds (largely due to Buck2's [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/) build strategy, as well as benefits from its local build daemon) as well as faster CI builds if we invest in remote execution and caching. See https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages for more details about the performance advantages of Buck2. Buck2 enforces stronger requirements in terms of build and test isolation. It discourages assumptions about absolute paths (which is key to enabling remote execution). Because the `CARGO_BIN_EXE_` environment variables that Cargo provides are absolute paths (which `assert_cmd::Command` reads), this is a problem for Buck2, which is why we need this `codex-utils-cargo-bin` utility. My WIP-Buck2 setup sets the `CARGO_BIN_EXE_` environment variables passed to a `rust_test()` build rule as relative paths. `codex-utils-cargo-bin` will resolve these values to absolute paths, when necessary. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496). * #8498 * __->__ #8496	2025-12-23 19:29:32 -08:00
Michael Bolin	277babba79	feat: load ExecPolicyManager from ConfigLayerStack (#8453 ) https://github.com/openai/codex/pull/8354 added support for in-repo `.config/` files, so this PR updates the logic for loading `.rules` files to load `.rules` files from all relevant layers. The main change to the business logic is `load_exec_policy()` in `codex-rs/core/src/exec_policy.rs`. Note this adds a `config_folder()` method to `ConfigLayerSource` that returns `Option<AbsolutePathBuf>` so that it is straightforward to iterate over the sources and get the associated config folder, if any.	2025-12-22 17:24:17 -08:00
Michael Bolin	46baedd7cb	fix: change codex/sandbox-state/update from a notification to a request (#8142 ) Historically, `accept_elicitation_for_prompt_rule()` was flaky because we were using a notification to update the sandbox followed by a `shell` tool request that we expected to be subject to the new sandbox config, but because [rmcp](https://crates.io/crates/rmcp) MCP servers delegate each incoming message to a new Tokio task, messages are not guaranteed to be processed in order, so sometimes the `shell` tool call would run before the notification was processed. Prior to this PR, we relied on a generous `sleep()` between the notification and the request to reduce the change of the test flaking out. This PR implements a proper fix, which is to use a _request_ instead of a notification for the sandbox update so that we can wait for the response to the sandbox request before sending the request to the `shell` tool call. Previously, `rmcp` did not support custom requests, but I fixed that in https://github.com/modelcontextprotocol/rust-sdk/pull/590, which made it into the `0.12.0` release (see #8288). This PR updates `shell-tool-mcp` to expect `"codex/sandbox-state/update"` as a _request_ instead of a notification and sends the appropriate ack. Note this behavior is tied to our custom `codex/sandbox-state` capability, which Codex honors as an MCP client, which is why `core/src/mcp_connection_manager.rs` had to be updated as part of this PR, as well. This PR also updates the docs at `shell-tool-mcp/README.md`.	2025-12-18 15:32:01 -08:00
Michael Bolin	53f53173a8	chore: upgrade rmcp crate from 0.10.0 to 0.12.0 (#8288 ) Version `0.12.0` includes https://github.com/modelcontextprotocol/rust-sdk/pull/590, which I will use in https://github.com/openai/codex/pull/8142. Changes: - `rmcp::model::CustomClientNotification` was renamed to `rmcp::model::CustomNotification` - a bunch of types have a `meta` field now, but it is `Option`, so I added `meta: None` to a bunch of things	2025-12-18 14:28:46 -08:00
Eric Traut	d9554c8191	Fixes mcp elicitation test that fails for me when run locally (#8020 )	2025-12-15 16:23:04 -08:00
Jeremy Rose	2c6995ca4d	exec-server: additional context for errors (#7935 ) Add a .context() on some exec-server errors for debugging CI flakes. Also, "login": false in the test to make the test not affected by user profile.	2025-12-15 11:40:40 -08:00
Michael Bolin	a2c86e5d88	docs: update the docs for @openai/codex-shell-tool-mcp (#7962 ) The existing version of `shell-tool-mcp/README.md` was not written in a way that was meant to be consumed by end-users. This is now fixed. Added `codex-rs/exec-server/README.md` for the more technical bits.	2025-12-13 09:44:26 -08:00
Michael Bolin	e0d7ac51d3	fix: policy/.codexpolicy -> rules/.rules (#7888 ) We decided that `.rules` is a more fitting (and concise) file extension than `.codexpolicy`, so we are changing the file extension for the "execpolicy" effort. We are also changing the subfolder of `$CODEX_HOME` from `policy` to `rules` to match. This PR updates the in-repo docs and we will update the public docs once the next CLI release goes out. Locally, I created `~/.codex/rules/default.rules` with the following contents: ``` prefix_rule(pattern=["gh", "pr", "view"]) ``` And then I asked Codex to run: ``` gh pr view 7888 --json title,body,comments ``` and it was able to!	2025-12-11 14:46:00 -08:00
Michael Bolin	038767af69	fix: add a hopefully-temporary sleep to reduce test flakiness (#7848 ) Let's see if this `sleep()` call is good enough to fix the test flakiness we currently see in CI. It will take me some time to upstream a proper fix, and I would prefer not to disable this test in the interim.	2025-12-11 00:51:33 +00:00
Michael Bolin	87f5b69b24	fix: ensure accept_elicitation_for_prompt_rule() test passes locally (#7832 ) When I originally introduced `accept_elicitation_for_prompt_rule()` in https://github.com/openai/codex/pull/7617, it worked for me locally because I had run `codex-rs/exec-server/tests/suite/bash` once myself, which had the side-effect of installing the corresponding DotSlash artifact. In CI, I added explicit logic to do this as part of `.github/workflows/rust-ci.yml`, which meant the test also passed in CI, but this logic should have been done as part of the test so that it would work locally for devs who had not installed the DotSlash artifact for `codex-rs/exec-server/tests/suite/bash` before. This PR updates the test to do this (and deletes the setup logic from `rust-ci.yml`), creating a new `DOTSLASH_CACHE` in a temp directory so that this is handled independently for each test. While here, also added a check to ensure that the `codex` binary has been built prior to running the test, as we have to ensure it is symlinked as `codex-linux-sandbox` on Linux in order for the integration test to work on that platform.	2025-12-10 15:17:13 -08:00
zhao-oai	e0fb3ca1db	refactoring with_escalated_permissions to use SandboxPermissions instead (#7750 ) helpful in the future if we want more granularity for requesting escalated permissions: e.g when running in readonly sandbox, model can request to escalate to a sandbox that allows writes	2025-12-10 17:18:48 +00:00
Michael Bolin	a7e3e37da8	fix: allow sendmsg(2) and recvmsg(2) syscalls in our Linux sandbox (#7779 ) This changes our default Landlock policy to allow `sendmsg(2)` and `recvmsg(2)` syscalls. We believe these were originally denied out of an abundance of caution, but given that `send(2)` nor `recv(2)` are allowed today [which provide comparable capability to the `*msg` equivalents], we do not believe allowing them grants any privileges beyond what we already allow. Rather than using the syscall as the security boundary, preventing access to the potentially hazardous file descriptor in the first place seems like the right layer of defense. In particular, this makes it possible for `shell-tool-mcp` to run on Linux when using a read-only sandbox for the Bash process, as demonstrated by `accept_elicitation_for_prompt_rule()` now succeeding in CI.	2025-12-09 09:24:01 -08:00
Michael Bolin	3c3d3d1adc	fix: add integration tests for codex-exec-mcp-server with execpolicy (#7617 ) This PR introduces integration tests that run [codex-shell-tool-mcp](https://www.npmjs.com/package/@openai/codex-shell-tool-mcp) as a user would. Note that this requires running our fork of Bash, so we introduce a [DotSlash](https://dotslash-cli.com/) file for `bash` so that we can run the integration tests on multiple platforms without having to check the binaries into the repository. (As noted in the DotSlash file, it is slightly more heavyweight than necessary, which may be worth addressing as disk space in CI is limited: https://github.com/openai/codex/pull/7678.) To start, this PR adds two tests: - `list_tools()` makes the `list_tools` request to the MCP server and verifies we get the expected response - `accept_elicitation_for_prompt_rule()` defines a `prefix_rule()` with `decision="prompt"` and verifies the elicitation flow works as expected Though the `accept_elicitation_for_prompt_rule()` test only works on Linux, as this PR reveals that there are currently issues when running the Bash fork in a read-only sandbox on Linux. This will have to be fixed in a follow-up PR. Incidentally, getting this test run to correctly on macOS also requires a recent fix we made to `brew` that hasn't hit a mainline release yet, so getting CI green in this PR required https://github.com/openai/codex/pull/7680.	2025-12-07 06:39:38 +00:00
Michael Bolin	82090803d9	fix: exec-server stream was erroring for large requests (#7654 ) Previous to this change, large `EscalateRequest` payloads exceeded the kernel send buffer, causing our single `sendmsg(2)` call (with attached FDs) to be split and retried without proper control handling; this led to `EINVAL`/broken pipe in the `handle_escalate_session_respects_run_in_sandbox_decision()` test when using an `env` with large contents. Before: `AsyncSocket::send_with_fds()` called `send_json_message()`, which called `send_message_bytes()`, which made one `socket.sendmsg()` call followed by additional `socket.send()` calls, as necessary: `2e4a402521/codex-rs/exec-server/src/posix/socket.rs (L198-L209)` After: `AsyncSocket::send_with_fds()` now calls `send_stream_frame()`, which calls `send_stream_chunk()` one or more times. Each call to `send_stream_chunk()` calls `socket.sendmsg()`. In the previous implementation, the subsequent `socket.send()` writes had no control information associated with them, whereas in the new `send_stream_chunk()` implementation, a fresh `MsgHdr` (using `with_control()`, as appropriate) is created for `socket.sendmsg()` each time. Additionally, with this PR, stream sending attaches `SCM_RIGHTS` only on the first chunk, and omits control data when there are no FDs, allowing oversized payloads to deliver correctly while preserving FD limits and error checks.	2025-12-06 10:16:47 -08:00
zhao-oai	b1c918d8f7	feat: exec policy integration in shell mcp (#7609 ) adding execpolicy support into the `posix` mcp Co-authored-by: Michael Bolin <mbolin@openai.com>	2025-12-04 21:55:54 -08:00
Michael Bolin	6b5b9a687e	feat: support --version flag for @openai/codex-shell-tool-mcp (#7504 ) I find it helpful to easily verify which version is running. Tested: ```shell ~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --help Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.19s Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --help` Usage: codex-exec-mcp-server [OPTIONS] Options: --execve <EXECVE_WRAPPER> Executable to delegate execve(2) calls to in Bash --bash <BASH_PATH> Path to Bash that has been patched to support execve() wrapping -h, --help Print help -V, --version Print version ~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --version Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --version` codex-exec-server 0.0.0 ```	2025-12-02 23:43:25 +00:00
Josh McKinney	ec49b56874	chore: add cargo-deny configuration (#7119 ) - add GitHub workflow running cargo-deny on push/PR - document cargo-deny allowlist with workspace-dep notes and advisory ignores - align workspace crates to inherit version/edition/license for consistent checks	2025-11-24 12:22:18 -08:00
Michael Bolin	c6f68c9df8	feat: declare server capability in shell-tool-mcp (#7112 ) This introduces a new feature to Codex when it operates as an MCP _client_ where if an MCP _server_ replies that it has an entry named `"codex/sandbox-state"` in its _server capabilities_, then Codex will send it an MCP notification with the following structure: ```json { "method": "codex/sandbox-state/update", "params": { "sandboxPolicy": { "type": "workspace-write", "network-access": false, "exclude-tmpdir-env-var": false "exclude-slash-tmp": false }, "codexLinuxSandboxExe": null, "sandboxCwd": "/Users/mbolin/code/codex2" } } ``` or with whatever values are appropriate for the initial `sandboxPolicy`. NOTE: Codex _should_ continue to send the MCP server notifications of the same format if these things change over the lifetime of the thread, but that isn't wired up yet. The result is that `shell-tool-mcp` can consume these values so that when it calls `codex_core::exec::process_exec_tool_call()` in `codex-rs/exec-server/src/posix/escalate_server.rs`, it is now sure to call it with the correct values (whereas previously we relied on hardcoded values). While I would argue this is a supported use case within the MCP protocol, the `rmcp` crate that we are using today does not support custom notifications. As such, I had to patch it and I submitted it for review, so hopefully it will be accepted in some form: https://github.com/modelcontextprotocol/rust-sdk/pull/556 To test out this change from end-to-end: - I ran `cargo build` in `~/code/codex2/codex-rs/exec-server` - I built the fork of Bash in `~/code/bash/bash` - I added the following to my `~/.codex/config.toml`: ```toml # Use with `codex --disable shell_tool`. [mcp_servers.execshell] args = ["--bash", "/Users/mbolin/code/bash/bash"] command = "/Users/mbolin/code/codex2/codex-rs/target/debug/codex-exec-mcp-server" ``` - From `~/code/codex2/codex-rs`, I ran `just codex --disable shell_tool` - When the TUI started up, I verified that the sandbox mode is `workspace-write` - I ran `/mcp` to verify that the shell tool from the MCP is there: <img width="1387" height="1400" alt="image" src="https://github.com/user-attachments/assets/1a8addcc-5005-4e16-b59f-95cfd06fd4ab" /> - Then I asked it: > what is the output of `gh issue list` because this should be auto-approved with our existing dummy policy: `af63e6eccc/codex-rs/exec-server/src/posix.rs (L157-L164)` And it worked: <img width="1387" height="1400" alt="image" src="https://github.com/user-attachments/assets/7568d2f7-80da-4d68-86d0-c265a6f5e6c1" />	2025-11-21 16:11:01 -08:00
Michael Bolin	67975ed33a	refactor: inline sandbox type lookup in process_exec_tool_call (#7122 ) `process_exec_tool_call()` was taking `SandboxType` as a param, but in practice, the only place it was constructed was in `codex_message_processor.rs` where it was derived from the other `sandbox_policy` param, so this PR inlines the logic that decides the `SandboxType` into `process_exec_tool_call()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122). * #7112 * __->__ #7122	2025-11-21 22:53:05 +00:00
Michael Bolin	e8ef6d3c16	feat: support login as an option on shell-tool-mcp (#7120 ) The unified exec tool has a `login` option that defaults to `true`: `3bdcbc7292/codex-rs/core/src/tools/handlers/unified_exec.rs (L35-L36)` This updates the `ExecParams` for `shell-tool-mcp` to support the same parameter. Note it is declared as `Option<bool>` to ensure it is marked optional in the generated JSON schema.	2025-11-21 22:14:41 +00:00
Michael Bolin	8e5f38c0f0	feat: waiting for an elicitation should not count against a shell tool timeout (#6973 ) Previously, we were running into an issue where we would run the `shell` tool call with a timeout of 10s, but it fired an elicitation asking for user approval, the time the user took to respond to the elicitation was counted agains the 10s timeout, so the `shell` tool call would fail with a timeout error unless the user is very fast! This PR addresses this issue by introducing a "stopwatch" abstraction that is used to manage the timeout. The idea is: - `Stopwatch::new()` is called with the _real_ timeout of the `shell` tool call. - `process_exec_tool_call()` is called with the `Cancellation` variant of `ExecExpiration` because it should not manage its own timeout in this case - the `Stopwatch` expiration is wired up to the `cancel_rx` passed to `process_exec_tool_call()` - when an elicitation for the `shell` tool call is received, the `Stopwatch` pauses - because it is possible for multiple elicitations to arrive concurrently, it keeps track of the number of "active pauses" and does not resume until that counter goes down to zero I verified that I can test the MCP server using `@modelcontextprotocol/inspector` and specify `git status` as the `command` with a timeout of 500ms and that the elicitation pops up and I have all the time in the world to respond whereas previous to this PR, that would not have been possible. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6973). * #7005 * __->__ #6973 * #6972	2025-11-20 16:45:38 -08:00
Michael Bolin	f56d1dc8fc	feat: update process_exec_tool_call() to take a cancellation token (#6972 ) This updates `ExecParams` so that instead of taking `timeout_ms: Option<u64>`, it now takes a more general cancellation mechanism, `ExecExpiration`, which is an enum that includes a `Cancellation(tokio_util::sync::CancellationToken)` variant. If the cancellation token is fired, then `process_exec_tool_call()` returns in the same way as if a timeout was exceeded. This is necessary so that in #6973, we can manage the timeout logic external to the `process_exec_tool_call()` because we want to "suspend" the timeout when an elicitation from a human user is pending. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972). * #7005 * #6973 * __->__ #6972	2025-11-20 16:29:57 -08:00
Michael Bolin	54e6e4ac32	fix: when displaying execv, show file instead of arg0 (#6966 ) After merging https://github.com/openai/codex/pull/6958, I realized that the `command` I was displaying was not quite right. Since we know it, we should show the _exact_ program being executed (the first arg to `execve(3)`) rather than `arg0` to be more precise. Below is the same command I used to test https://github.com/openai/codex/pull/6958, but now you can see it shows `/Users/mbolin/.openai/bin/git` instead of just `git`. <img width="1526" height="1444" alt="image" src="https://github.com/user-attachments/assets/428128d1-c658-456e-a64e-fc6a0009cb34" />	2025-11-19 22:42:58 -08:00
Michael Bolin	e8af41de8a	fix: clean up elicitation used by exec-server (#6958 ) Using appropriate message/title fields, I think this looks better now: <img width="3370" height="3208" alt="image" src="https://github.com/user-attachments/assets/e9bbf906-4ba8-4563-affc-62cdc6c97342" /> Though note that in the current version of the Inspector (`0.17.2`), you cannot hit Submit until you fill out the field. I believe this is a bug in the Inspector, as it does not properly handle the case when all fields are optional. I put up a fix: https://github.com/modelcontextprotocol/inspector/pull/926	2025-11-20 04:59:17 +00:00
zhao-oai	fb9849e1e3	migrating execpolicy -> execpolicy-legacy and execpolicy2 -> execpolicy (#6956 )	2025-11-19 19:14:10 -08:00
Michael Bolin	13d378f2ce	chore: refactor exec-server to prepare it for standalone MCP use (#6944 ) This PR reorganizes things slightly so that: - Instead of a single multitool executable, `codex-exec-server`, we now have two executables: - `codex-exec-mcp-server` to launch the MCP server - `codex-execve-wrapper` is the `execve(2)` wrapper to use with the `BASH_EXEC_WRAPPER` environment variable - `BASH_EXEC_WRAPPER` must be a single executable: it cannot be a command string composed of an executable with args (i.e., it no longer adds the `escalate` subcommand, as before) - `codex-exec-mcp-server` takes `--bash` and `--execve` as options. Though if `--execve` is not specified, the MCP server will check the directory containing `std::env::current_exe()` and attempt to use the file named `codex-execve-wrapper` within it. In development, this works out since these executables are side-by-side in the `target/debug` folder. With respect to testing, this also fixes an important bug in `dummy_exec_policy()`, as I was using `ends_with()` as if it applied to a `String`, but in this case, it is used with a `&Path`, so the semantics are slightly different. Putting this all together, I was able to test this by running the following: ``` ~/code/codex/codex-rs$ npx @modelcontextprotocol/inspector \ ./target/debug/codex-exec-mcp-server --bash ~/code/bash/bash ``` If I try to run `git status` in `/Users/mbolin/code/codex` via the `shell` tool from the MCP server: <img width="1589" height="1335" alt="image" src="https://github.com/user-attachments/assets/9db6aea8-7fbc-4675-8b1f-ec446685d6c4" /> then I get prompted with the following elicitation, as expected: <img width="1589" height="1335" alt="image" src="https://github.com/user-attachments/assets/21b68fe0-494d-4562-9bad-0ddc55fc846d" /> Though a current limitation is that the `shell` tool defaults to a timeout of 10s, which means I only have 10s to respond to the elicitation. Ideally, the time spent waiting for a response from a human should not count against the timeout for the command execution. I will address this in a subsequent PR. --- Note `~/code/bash/bash` was created by doing: ``` cd ~/code git clone https://github.com/bminor/bash cd bash git checkout a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b <apply the patch below> ./configure make ``` The patch: ``` diff --git a/execute_cmd.c b/execute_cmd.c index 070f5119..d20ad2b9 100644 --- a/execute_cmd.c +++ b/execute_cmd.c @@ -6129,6 +6129,19 @@ shell_execve (char command, char args, char env) char sample[HASH_BANG_BUFSIZ]; size_t larray; + char exec_wrapper = getenv("BASH_EXEC_WRAPPER"); + if (exec_wrapper && exec_wrapper && !whitespace (exec_wrapper)) + { + char orig_command = command; + + larray = strvec_len (args); + + memmove (args + 2, args, (++larray) sizeof (char *)); + args[0] = exec_wrapper; + args[1] = orig_command; + command = exec_wrapper; + } + ```	2025-11-19 16:38:14 -08:00

1 2

52 commits