I find it helpful to easily verify which version is running.
Tested:
```shell
~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --help
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.19s
Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --help`
Usage: codex-exec-mcp-server [OPTIONS]
Options:
--execve <EXECVE_WRAPPER> Executable to delegate execve(2) calls to in Bash
--bash <BASH_PATH> Path to Bash that has been patched to support execve() wrapping
-h, --help Print help
-V, --version Print version
~/code/codex3/codex-rs/exec-server$ cargo run --bin codex-exec-mcp-server -- --version
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
Running `/Users/mbolin/code/codex3/codex-rs/target/debug/codex-exec-mcp-server --version`
codex-exec-server 0.0.0
```
This introduces a new feature to Codex when it operates as an MCP
_client_ where if an MCP _server_ replies that it has an entry named
`"codex/sandbox-state"` in its _server capabilities_, then Codex will
send it an MCP notification with the following structure:
```json
{
"method": "codex/sandbox-state/update",
"params": {
"sandboxPolicy": {
"type": "workspace-write",
"network-access": false,
"exclude-tmpdir-env-var": false
"exclude-slash-tmp": false
},
"codexLinuxSandboxExe": null,
"sandboxCwd": "/Users/mbolin/code/codex2"
}
}
```
or with whatever values are appropriate for the initial `sandboxPolicy`.
**NOTE:** Codex _should_ continue to send the MCP server notifications
of the same format if these things change over the lifetime of the
thread, but that isn't wired up yet.
The result is that `shell-tool-mcp` can consume these values so that
when it calls `codex_core::exec::process_exec_tool_call()` in
`codex-rs/exec-server/src/posix/escalate_server.rs`, it is now sure to
call it with the correct values (whereas previously we relied on
hardcoded values).
While I would argue this is a supported use case within the MCP
protocol, the `rmcp` crate that we are using today does not support
custom notifications. As such, I had to patch it and I submitted it for
review, so hopefully it will be accepted in some form:
https://github.com/modelcontextprotocol/rust-sdk/pull/556
To test out this change from end-to-end:
- I ran `cargo build` in `~/code/codex2/codex-rs/exec-server`
- I built the fork of Bash in `~/code/bash/bash`
- I added the following to my `~/.codex/config.toml`:
```toml
# Use with `codex --disable shell_tool`.
[mcp_servers.execshell]
args = ["--bash", "/Users/mbolin/code/bash/bash"]
command = "/Users/mbolin/code/codex2/codex-rs/target/debug/codex-exec-mcp-server"
```
- From `~/code/codex2/codex-rs`, I ran `just codex --disable shell_tool`
- When the TUI started up, I verified that the sandbox mode is
`workspace-write`
- I ran `/mcp` to verify that the shell tool from the MCP is there:
<img width="1387" height="1400" alt="image"
src="https://github.com/user-attachments/assets/1a8addcc-5005-4e16-b59f-95cfd06fd4ab"
/>
- Then I asked it:
> what is the output of `gh issue list`
because this should be auto-approved with our existing dummy policy:
af63e6eccc/codex-rs/exec-server/src/posix.rs (L157-L164)
And it worked:
<img width="1387" height="1400" alt="image"
src="https://github.com/user-attachments/assets/7568d2f7-80da-4d68-86d0-c265a6f5e6c1"
/>
`process_exec_tool_call()` was taking `SandboxType` as a param, but in
practice, the only place it was constructed was in
`codex_message_processor.rs` where it was derived from the other
`sandbox_policy` param, so this PR inlines the logic that decides the
`SandboxType` into `process_exec_tool_call()`.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/7122).
* #7112
* __->__ #7122
The unified exec tool has a `login` option that defaults to `true`:
3bdcbc7292/codex-rs/core/src/tools/handlers/unified_exec.rs (L35-L36)
This updates the `ExecParams` for `shell-tool-mcp` to support the same
parameter. Note it is declared as `Option<bool>` to ensure it is marked
optional in the generated JSON schema.
Previously, we were running into an issue where we would run the `shell`
tool call with a timeout of 10s, but it fired an elicitation asking for
user approval, the time the user took to respond to the elicitation was
counted agains the 10s timeout, so the `shell` tool call would fail with
a timeout error unless the user is very fast!
This PR addresses this issue by introducing a "stopwatch" abstraction
that is used to manage the timeout. The idea is:
- `Stopwatch::new()` is called with the _real_ timeout of the `shell`
tool call.
- `process_exec_tool_call()` is called with the `Cancellation` variant
of `ExecExpiration` because it should not manage its own timeout in this
case
- the `Stopwatch` expiration is wired up to the `cancel_rx` passed to
`process_exec_tool_call()`
- when an elicitation for the `shell` tool call is received, the
`Stopwatch` pauses
- because it is possible for multiple elicitations to arrive
concurrently, it keeps track of the number of "active pauses" and does
not resume until that counter goes down to zero
I verified that I can test the MCP server using
`@modelcontextprotocol/inspector` and specify `git status` as the
`command` with a timeout of 500ms and that the elicitation pops up and I
have all the time in the world to respond whereas previous to this PR,
that would not have been possible.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6973).
* #7005
* __->__ #6973
* #6972
This updates `ExecParams` so that instead of taking `timeout_ms:
Option<u64>`, it now takes a more general cancellation mechanism,
`ExecExpiration`, which is an enum that includes a
`Cancellation(tokio_util::sync::CancellationToken)` variant.
If the cancellation token is fired, then `process_exec_tool_call()`
returns in the same way as if a timeout was exceeded.
This is necessary so that in #6973, we can manage the timeout logic
external to the `process_exec_tool_call()` because we want to "suspend"
the timeout when an elicitation from a human user is pending.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/6972).
* #7005
* #6973
* __->__ #6972
This PR reorganizes things slightly so that:
- Instead of a single multitool executable, `codex-exec-server`, we now
have two executables:
- `codex-exec-mcp-server` to launch the MCP server
- `codex-execve-wrapper` is the `execve(2)` wrapper to use with the
`BASH_EXEC_WRAPPER` environment variable
- `BASH_EXEC_WRAPPER` must be a single executable: it cannot be a
command string composed of an executable with args (i.e., it no longer
adds the `escalate` subcommand, as before)
- `codex-exec-mcp-server` takes `--bash` and `--execve` as options.
Though if `--execve` is not specified, the MCP server will check the
directory containing `std::env::current_exe()` and attempt to use the
file named `codex-execve-wrapper` within it. In development, this works
out since these executables are side-by-side in the `target/debug`
folder.
With respect to testing, this also fixes an important bug in
`dummy_exec_policy()`, as I was using `ends_with()` as if it applied to
a `String`, but in this case, it is used with a `&Path`, so the
semantics are slightly different.
Putting this all together, I was able to test this by running the
following:
```
~/code/codex/codex-rs$ npx @modelcontextprotocol/inspector \
./target/debug/codex-exec-mcp-server --bash ~/code/bash/bash
```
If I try to run `git status` in `/Users/mbolin/code/codex` via the
`shell` tool from the MCP server:
<img width="1589" height="1335" alt="image"
src="https://github.com/user-attachments/assets/9db6aea8-7fbc-4675-8b1f-ec446685d6c4"
/>
then I get prompted with the following elicitation, as expected:
<img width="1589" height="1335" alt="image"
src="https://github.com/user-attachments/assets/21b68fe0-494d-4562-9bad-0ddc55fc846d"
/>
Though a current limitation is that the `shell` tool defaults to a
timeout of 10s, which means I only have 10s to respond to the
elicitation. Ideally, the time spent waiting for a response from a human
should not count against the timeout for the command execution. I will
address this in a subsequent PR.
---
Note `~/code/bash/bash` was created by doing:
```
cd ~/code
git clone https://github.com/bminor/bash
cd bash
git checkout a8a1c2fac029404d3f42cd39f5a20f24b6e4fe4b
<apply the patch below>
./configure
make
```
The patch:
```
diff --git a/execute_cmd.c b/execute_cmd.c
index 070f5119..d20ad2b9 100644
--- a/execute_cmd.c
+++ b/execute_cmd.c
@@ -6129,6 +6129,19 @@ shell_execve (char *command, char **args, char **env)
char sample[HASH_BANG_BUFSIZ];
size_t larray;
+ char* exec_wrapper = getenv("BASH_EXEC_WRAPPER");
+ if (exec_wrapper && *exec_wrapper && !whitespace (*exec_wrapper))
+ {
+ char *orig_command = command;
+
+ larray = strvec_len (args);
+
+ memmove (args + 2, args, (++larray) * sizeof (char *));
+ args[0] = exec_wrapper;
+ args[1] = orig_command;
+ command = exec_wrapper;
+ }
+
```
This PR introduces an extra layer of abstraction to prepare us for the
migration to execpolicy2:
- introduces a new trait, `EscalationPolicy`, whose `determine_action()`
method is responsible for producing the `EscalateAction`
- the existing `ExecPolicy` typedef is changed to return an intermediate
`ExecPolicyOutcome` instead of `EscalateAction`
- the default implementation of `EscalationPolicy`,
`McpEscalationPolicy`, composes `ExecPolicy`
- the `ExecPolicyOutcome` includes `codex_execpolicy2::Decision`, which
has a `Prompt` variant
- when `McpEscalationPolicy` gets `Decision::Prompt` back from
`ExecPolicy`, it prompts the user via an MCP elicitation and maps the
result into an `ElicitationAction`
- now that the end user can reply to an elicitation with `Decline` or
`Cancel`, we introduce a new variant, `EscalateAction::Deny`, which the
client handles by returning exit code `1` without running anything
Note the way the elicitation is created is still not quite right, but I
will fix that once we have things running end-to-end for real in a
follow-up PR.