2025-05-16 14:38:08 -07:00
use std ::collections ::HashMap ;
2026-01-12 15:12:59 -08:00
use std ::path ::Path ;
2025-05-16 14:38:08 -07:00
2025-10-27 16:58:10 +00:00
use codex_utils_image ::load_and_resize_to_fit ;
fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151)
The output of an MCP server tool call can be one of several types, but
to date, we treated all outputs as text by showing the serialized JSON
as the "tool output" in Codex:
https://github.com/openai/codex/blob/25a9949c49194d5a64de54a11bcc5b4724ac9bd5/codex-rs/mcp-types/src/lib.rs#L96-L101
This PR adds support for the `ImageContent` variant so we can now
display an image output from an MCP tool call.
In making this change, we introduce a new
`ResponseInputItem::McpToolCallOutput` variant so that we can work with
the `mcp_types::CallToolResult` directly when the function call is made
to an MCP server.
Though arguably the more significant change is the introduction of
`HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that
uses `ratatui_image` to render an image into the terminal. To support
this, we introduce `ImageRenderCache`, cache a
`ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the
appropriate scaled image data and dimensions based on the current
terminal size.
To test, I created a minimal `package.json`:
```json
{
"name": "kitty-mcp",
"version": "1.0.0",
"type": "module",
"description": "MCP that returns image of kitty",
"main": "index.js",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.12.0"
}
}
```
with the following `index.js` to define the MCP server:
```js
#!/usr/bin/env node
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { readFile } from "node:fs/promises";
import { join } from "node:path";
const IMAGE_URI = "image://Ada.png";
const server = new McpServer({
name: "Demo",
version: "1.0.0",
});
server.tool(
"get-cat-image",
"If you need a cat image, this tool will provide one.",
async () => ({
content: [
{ type: "image", data: await getAdaPngBase64(), mimeType: "image/png" },
],
})
);
server.resource("Ada the Cat", IMAGE_URI, async (uri) => {
const base64Image = await getAdaPngBase64();
return {
contents: [
{
uri: uri.href,
mimeType: "image/png",
blob: base64Image,
},
],
};
});
async function getAdaPngBase64() {
const __dirname = new URL(".", import.meta.url).pathname;
// From https://github.com/benjajaja/ratatui-image/blob/9705ce2c59ec669abbce2924cbfd1f5ae22c9860/assets/Ada.png
const filePath = join(__dirname, "Ada.png");
const imageData = await readFile(filePath);
const base64Image = imageData.toString("base64");
return base64Image;
}
const transport = new StdioServerTransport();
await server.connect(transport);
```
With the local changes from this PR, I added the following to my
`config.toml`:
```toml
[mcp_servers.kitty]
command = "node"
args = ["/Users/mbolin/code/kitty-mcp/index.js"]
```
Running the TUI from source:
```
cargo run --bin codex -- --model o3 'I need a picture of a cat'
```
I get:
<img width="732" alt="image"
src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869"
/>
Now, that said, I have only tested in iTerm and there is definitely some
funny business with getting an accurate character-to-pixel ratio
(sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10
rows to render instead of 4), so there is still work to be done here.
2025-05-28 19:03:17 -07:00
use mcp_types ::CallToolResult ;
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
use mcp_types ::ContentBlock ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
use serde ::Deserialize ;
2025-07-23 10:37:45 -07:00
use serde ::Deserializer ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
use serde ::Serialize ;
2025-05-07 08:37:48 -07:00
use serde ::ser ::Serializer ;
2025-09-08 14:54:47 -07:00
use ts_rs ::TS ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
2026-01-17 17:31:14 -08:00
use crate ::config_types ::CollaborationMode ;
2026-01-12 15:12:59 -08:00
use crate ::config_types ::SandboxMode ;
use crate ::protocol ::AskForApproval ;
2026-01-17 17:31:14 -08:00
use crate ::protocol ::COLLABORATION_MODE_CLOSE_TAG ;
use crate ::protocol ::COLLABORATION_MODE_OPEN_TAG ;
2026-01-12 15:12:59 -08:00
use crate ::protocol ::NetworkAccess ;
use crate ::protocol ::SandboxPolicy ;
use crate ::protocol ::WritableRoot ;
2025-10-20 13:34:44 -07:00
use crate ::user_input ::UserInput ;
2026-01-28 01:43:17 -07:00
use codex_execpolicy ::Policy ;
2025-10-29 12:11:44 +00:00
use codex_git ::GhostCommit ;
2025-10-27 16:58:10 +00:00
use codex_utils_image ::error ::ImageProcessingError ;
2025-10-20 11:45:11 -07:00
use schemars ::JsonSchema ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
2025-12-10 09:18:48 -08:00
/// Controls whether a command should use the session sandbox or bypass it.
#[ derive(
Debug , Clone , Copy , Default , Eq , Hash , PartialEq , Serialize , Deserialize , JsonSchema , TS ,
) ]
#[ serde(rename_all = " snake_case " ) ]
pub enum SandboxPermissions {
/// Run with the configured sandbox
#[ default ]
UseDefault ,
/// Request to run outside the sandbox
RequireEscalated ,
}
impl SandboxPermissions {
pub fn requires_escalated_permissions ( self ) -> bool {
matches! ( self , SandboxPermissions ::RequireEscalated )
}
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum ResponseInputItem {
Message {
role : String ,
content : Vec < ContentItem > ,
} ,
FunctionCallOutput {
call_id : String ,
output : FunctionCallOutputPayload ,
} ,
fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151)
The output of an MCP server tool call can be one of several types, but
to date, we treated all outputs as text by showing the serialized JSON
as the "tool output" in Codex:
https://github.com/openai/codex/blob/25a9949c49194d5a64de54a11bcc5b4724ac9bd5/codex-rs/mcp-types/src/lib.rs#L96-L101
This PR adds support for the `ImageContent` variant so we can now
display an image output from an MCP tool call.
In making this change, we introduce a new
`ResponseInputItem::McpToolCallOutput` variant so that we can work with
the `mcp_types::CallToolResult` directly when the function call is made
to an MCP server.
Though arguably the more significant change is the introduction of
`HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that
uses `ratatui_image` to render an image into the terminal. To support
this, we introduce `ImageRenderCache`, cache a
`ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the
appropriate scaled image data and dimensions based on the current
terminal size.
To test, I created a minimal `package.json`:
```json
{
"name": "kitty-mcp",
"version": "1.0.0",
"type": "module",
"description": "MCP that returns image of kitty",
"main": "index.js",
"dependencies": {
"@modelcontextprotocol/sdk": "^1.12.0"
}
}
```
with the following `index.js` to define the MCP server:
```js
#!/usr/bin/env node
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { readFile } from "node:fs/promises";
import { join } from "node:path";
const IMAGE_URI = "image://Ada.png";
const server = new McpServer({
name: "Demo",
version: "1.0.0",
});
server.tool(
"get-cat-image",
"If you need a cat image, this tool will provide one.",
async () => ({
content: [
{ type: "image", data: await getAdaPngBase64(), mimeType: "image/png" },
],
})
);
server.resource("Ada the Cat", IMAGE_URI, async (uri) => {
const base64Image = await getAdaPngBase64();
return {
contents: [
{
uri: uri.href,
mimeType: "image/png",
blob: base64Image,
},
],
};
});
async function getAdaPngBase64() {
const __dirname = new URL(".", import.meta.url).pathname;
// From https://github.com/benjajaja/ratatui-image/blob/9705ce2c59ec669abbce2924cbfd1f5ae22c9860/assets/Ada.png
const filePath = join(__dirname, "Ada.png");
const imageData = await readFile(filePath);
const base64Image = imageData.toString("base64");
return base64Image;
}
const transport = new StdioServerTransport();
await server.connect(transport);
```
With the local changes from this PR, I added the following to my
`config.toml`:
```toml
[mcp_servers.kitty]
command = "node"
args = ["/Users/mbolin/code/kitty-mcp/index.js"]
```
Running the TUI from source:
```
cargo run --bin codex -- --model o3 'I need a picture of a cat'
```
I get:
<img width="732" alt="image"
src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869"
/>
Now, that said, I have only tested in iTerm and there is definitely some
funny business with getting an accurate character-to-pixel ratio
(sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10
rows to render instead of 4), so there is still work to be done here.
2025-05-28 19:03:17 -07:00
McpToolCallOutput {
call_id : String ,
result : Result < CallToolResult , String > ,
} ,
2025-08-22 13:42:34 -07:00
CustomToolCallOutput {
call_id : String ,
output : String ,
} ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum ContentItem {
InputText { text : String } ,
InputImage { image_url : String } ,
OutputText { text : String } ,
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum ResponseItem {
Message {
2025-10-30 11:18:53 -07:00
#[ serde(default, skip_serializing) ]
#[ ts(skip) ]
2025-07-23 10:37:45 -07:00
id : Option < String > ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
role : String ,
content : Vec < ContentItem > ,
2026-01-22 09:27:48 -08:00
// Do not use directly, no available consistently across all providers.
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
end_turn : Option < bool > ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
} ,
2025-05-10 21:43:27 -07:00
Reasoning {
2025-09-09 14:47:06 -07:00
#[ serde(default, skip_serializing) ]
2025-10-30 11:18:53 -07:00
#[ ts(skip) ]
2025-05-10 21:43:27 -07:00
id : String ,
summary : Vec < ReasoningItemReasoningSummary > ,
2025-08-13 18:39:58 -07:00
#[ serde(default, skip_serializing_if = " should_serialize_reasoning_content " ) ]
2025-10-30 11:18:53 -07:00
#[ ts(optional) ]
2025-08-05 01:56:13 -07:00
content : Option < Vec < ReasoningItemContent > > ,
2025-07-23 10:37:45 -07:00
encrypted_content : Option < String > ,
2025-05-10 21:43:27 -07:00
} ,
2025-05-16 14:38:08 -07:00
LocalShellCall {
/// Set when using the chat completions API.
2025-10-30 11:18:53 -07:00
#[ serde(default, skip_serializing) ]
#[ ts(skip) ]
2025-05-16 14:38:08 -07:00
id : Option < String > ,
/// Set when using the Responses API.
call_id : Option < String > ,
status : LocalShellStatus ,
action : LocalShellAction ,
} ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
FunctionCall {
2025-10-30 11:18:53 -07:00
#[ serde(default, skip_serializing) ]
#[ ts(skip) ]
2025-07-23 10:37:45 -07:00
id : Option < String > ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
name : String ,
// The Responses API returns the function call arguments as a *string* that contains
// JSON, not as an already‑ parsed object. We keep it as a raw string here and let
// Session::handle_function_call parse it into a Value. This exactly matches the
// Chat Completions + Responses API behavior.
arguments : String ,
call_id : String ,
} ,
// NOTE: The input schema for `function_call_output` objects that clients send to the
// OpenAI /v1/responses endpoint is NOT the same shape as the objects the server returns on the
// SSE stream. When *sending* we must wrap the string output inside an object that includes a
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
// required `success` boolean. To ensure we serialize exactly the expected shape we introduce
// a dedicated payload struct and flatten it here.
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
FunctionCallOutput {
call_id : String ,
output : FunctionCallOutputPayload ,
} ,
2025-08-22 13:42:34 -07:00
CustomToolCall {
2025-10-30 11:18:53 -07:00
#[ serde(default, skip_serializing) ]
#[ ts(skip) ]
2025-08-22 13:42:34 -07:00
id : Option < String > ,
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
2025-10-30 11:18:53 -07:00
#[ ts(optional) ]
2025-08-22 13:42:34 -07:00
status : Option < String > ,
call_id : String ,
name : String ,
input : String ,
} ,
CustomToolCallOutput {
call_id : String ,
output : String ,
} ,
2025-08-28 19:24:38 -07:00
// Emitted by the Responses API when the agent triggers a web search.
// Example payload (from SSE `response.output_item.done`):
// {
// "id":"ws_...",
// "type":"web_search_call",
// "status":"completed",
// "action": {"type":"search","query":"weather: San Francisco, CA"}
// }
WebSearchCall {
2025-10-30 11:18:53 -07:00
#[ serde(default, skip_serializing) ]
#[ ts(skip) ]
2025-08-28 19:24:38 -07:00
id : Option < String > ,
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
2025-10-30 11:18:53 -07:00
#[ ts(optional) ]
2025-08-28 19:24:38 -07:00
status : Option < String > ,
2026-01-26 19:33:48 -08:00
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
action : Option < WebSearchAction > ,
2025-08-28 19:24:38 -07:00
} ,
2025-10-27 10:09:10 +00:00
// Generated by the harness but considered exactly as a model response.
GhostSnapshot {
2025-10-27 10:55:29 +00:00
ghost_commit : GhostCommit ,
2025-10-27 10:09:10 +00:00
} ,
2025-12-12 10:05:02 -08:00
#[ serde(alias = " compaction_summary " ) ]
Compaction {
2025-11-18 16:51:16 +00:00
encrypted_content : String ,
} ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
#[ serde(other) ]
Other ,
}
2026-01-19 21:59:36 -08:00
pub const BASE_INSTRUCTIONS_DEFAULT : & str = include_str! ( " prompts/base_instructions/default.md " ) ;
/// Base instructions for the model in a thread. Corresponds to the `instructions` field in the ResponsesAPI.
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
#[ serde(rename = " base_instructions " , rename_all = " snake_case " ) ]
pub struct BaseInstructions {
pub text : String ,
}
impl Default for BaseInstructions {
fn default ( ) -> Self {
Self {
text : BASE_INSTRUCTIONS_DEFAULT . to_string ( ) ,
}
}
}
2026-01-12 15:12:59 -08:00
/// Developer-provided guidance that is injected into a turn as a developer role
/// message.
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
#[ serde(rename = " developer_instructions " , rename_all = " snake_case " ) ]
pub struct DeveloperInstructions {
text : String ,
}
const APPROVAL_POLICY_NEVER : & str = include_str! ( " prompts/permissions/approval_policy/never.md " ) ;
const APPROVAL_POLICY_UNLESS_TRUSTED : & str =
include_str! ( " prompts/permissions/approval_policy/unless_trusted.md " ) ;
const APPROVAL_POLICY_ON_FAILURE : & str =
include_str! ( " prompts/permissions/approval_policy/on_failure.md " ) ;
const APPROVAL_POLICY_ON_REQUEST : & str =
include_str! ( " prompts/permissions/approval_policy/on_request.md " ) ;
2026-01-28 01:43:17 -07:00
const APPROVAL_POLICY_ON_REQUEST_RULE : & str =
include_str! ( " prompts/permissions/approval_policy/on_request_rule.md " ) ;
2026-01-12 15:12:59 -08:00
const SANDBOX_MODE_DANGER_FULL_ACCESS : & str =
include_str! ( " prompts/permissions/sandbox_mode/danger_full_access.md " ) ;
const SANDBOX_MODE_WORKSPACE_WRITE : & str =
include_str! ( " prompts/permissions/sandbox_mode/workspace_write.md " ) ;
const SANDBOX_MODE_READ_ONLY : & str = include_str! ( " prompts/permissions/sandbox_mode/read_only.md " ) ;
impl DeveloperInstructions {
pub fn new < T : Into < String > > ( text : T ) -> Self {
Self { text : text . into ( ) }
}
2026-01-28 01:43:17 -07:00
pub fn from (
approval_policy : AskForApproval ,
exec_policy : & Policy ,
request_rule_enabled : bool ,
) -> DeveloperInstructions {
let text = match approval_policy {
AskForApproval ::Never = > APPROVAL_POLICY_NEVER . to_string ( ) ,
AskForApproval ::UnlessTrusted = > APPROVAL_POLICY_UNLESS_TRUSTED . to_string ( ) ,
AskForApproval ::OnFailure = > APPROVAL_POLICY_ON_FAILURE . to_string ( ) ,
AskForApproval ::OnRequest = > {
if ! request_rule_enabled {
APPROVAL_POLICY_ON_REQUEST . to_string ( )
} else {
2026-02-01 18:26:15 -08:00
let command_prefixes =
format_allow_prefixes ( exec_policy . get_allowed_prefixes ( ) ) ;
2026-01-28 01:43:17 -07:00
match command_prefixes {
Some ( prefixes ) = > {
2026-02-01 18:26:15 -08:00
format! (
" {APPROVAL_POLICY_ON_REQUEST_RULE} \n Approved command prefixes: \n {prefixes} "
)
2026-01-28 01:43:17 -07:00
}
None = > APPROVAL_POLICY_ON_REQUEST_RULE . to_string ( ) ,
}
}
}
} ;
DeveloperInstructions ::new ( text )
}
2026-01-12 15:12:59 -08:00
pub fn into_text ( self ) -> String {
self . text
}
pub fn concat ( self , other : impl Into < DeveloperInstructions > ) -> Self {
let mut text = self . text ;
2026-01-28 01:43:17 -07:00
if ! text . ends_with ( '\n' ) {
text . push ( '\n' ) ;
}
2026-01-12 15:12:59 -08:00
text . push_str ( & other . into ( ) . text ) ;
Self { text }
}
2026-01-22 12:04:23 -08:00
pub fn personality_spec_message ( spec : String ) -> Self {
let message = format! (
" <personality_spec> The user has requested a new communication style. Future messages should adhere to the following personality: \n {spec} </personality_spec> "
) ;
DeveloperInstructions ::new ( message )
}
2026-01-12 15:12:59 -08:00
pub fn from_policy (
sandbox_policy : & SandboxPolicy ,
approval_policy : AskForApproval ,
2026-01-28 01:43:17 -07:00
exec_policy : & Policy ,
request_rule_enabled : bool ,
2026-01-12 15:12:59 -08:00
cwd : & Path ,
) -> Self {
let network_access = if sandbox_policy . has_full_network_access ( ) {
NetworkAccess ::Enabled
} else {
NetworkAccess ::Restricted
} ;
let ( sandbox_mode , writable_roots ) = match sandbox_policy {
SandboxPolicy ::DangerFullAccess = > ( SandboxMode ::DangerFullAccess , None ) ,
SandboxPolicy ::ReadOnly = > ( SandboxMode ::ReadOnly , None ) ,
SandboxPolicy ::ExternalSandbox { .. } = > ( SandboxMode ::DangerFullAccess , None ) ,
SandboxPolicy ::WorkspaceWrite { .. } = > {
let roots = sandbox_policy . get_writable_roots_with_cwd ( cwd ) ;
( SandboxMode ::WorkspaceWrite , Some ( roots ) )
}
} ;
DeveloperInstructions ::from_permissions_with_network (
sandbox_mode ,
network_access ,
approval_policy ,
2026-01-28 01:43:17 -07:00
exec_policy ,
request_rule_enabled ,
2026-01-12 15:12:59 -08:00
writable_roots ,
)
}
2026-01-17 17:31:14 -08:00
/// Returns developer instructions from a collaboration mode if they exist and are non-empty.
pub fn from_collaboration_mode ( collaboration_mode : & CollaborationMode ) -> Option < Self > {
2026-01-23 17:00:23 -08:00
collaboration_mode
. settings
2026-01-17 17:31:14 -08:00
. developer_instructions
. as_ref ( )
. filter ( | instructions | ! instructions . is_empty ( ) )
. map ( | instructions | {
DeveloperInstructions ::new ( format! (
" {COLLABORATION_MODE_OPEN_TAG}{instructions}{COLLABORATION_MODE_CLOSE_TAG} "
) )
} )
}
2026-01-12 15:12:59 -08:00
fn from_permissions_with_network (
sandbox_mode : SandboxMode ,
network_access : NetworkAccess ,
approval_policy : AskForApproval ,
2026-01-28 01:43:17 -07:00
exec_policy : & Policy ,
request_rule_enabled : bool ,
2026-01-12 15:12:59 -08:00
writable_roots : Option < Vec < WritableRoot > > ,
) -> Self {
let start_tag = DeveloperInstructions ::new ( " <permissions instructions> " ) ;
let end_tag = DeveloperInstructions ::new ( " </permissions instructions> " ) ;
start_tag
. concat ( DeveloperInstructions ::sandbox_text (
sandbox_mode ,
network_access ,
) )
2026-01-28 01:43:17 -07:00
. concat ( DeveloperInstructions ::from (
approval_policy ,
exec_policy ,
request_rule_enabled ,
) )
2026-01-12 15:12:59 -08:00
. concat ( DeveloperInstructions ::from_writable_roots ( writable_roots ) )
. concat ( end_tag )
}
fn from_writable_roots ( writable_roots : Option < Vec < WritableRoot > > ) -> Self {
let Some ( roots ) = writable_roots else {
return DeveloperInstructions ::new ( " " ) ;
} ;
if roots . is_empty ( ) {
return DeveloperInstructions ::new ( " " ) ;
}
let roots_list : Vec < String > = roots
. iter ( )
. map ( | r | format! ( " ` {} ` " , r . root . to_string_lossy ( ) ) )
. collect ( ) ;
let text = if roots_list . len ( ) = = 1 {
format! ( " The writable root is {} . " , roots_list [ 0 ] )
} else {
format! ( " The writable roots are {} . " , roots_list . join ( " , " ) )
} ;
DeveloperInstructions ::new ( text )
}
fn sandbox_text ( mode : SandboxMode , network_access : NetworkAccess ) -> DeveloperInstructions {
let template = match mode {
SandboxMode ::DangerFullAccess = > SANDBOX_MODE_DANGER_FULL_ACCESS . trim_end ( ) ,
SandboxMode ::WorkspaceWrite = > SANDBOX_MODE_WORKSPACE_WRITE . trim_end ( ) ,
SandboxMode ::ReadOnly = > SANDBOX_MODE_READ_ONLY . trim_end ( ) ,
} ;
let text = template . replace ( " {network_access} " , & network_access . to_string ( ) ) ;
DeveloperInstructions ::new ( text )
}
}
2026-02-01 18:26:15 -08:00
const MAX_RENDERED_PREFIXES : usize = 100 ;
const MAX_ALLOW_PREFIX_TEXT_BYTES : usize = 5000 ;
const TRUNCATED_MARKER : & str = " ... \n [Some commands were truncated] " ;
pub fn format_allow_prefixes ( prefixes : Vec < Vec < String > > ) -> Option < String > {
let mut truncated = false ;
if prefixes . len ( ) > MAX_RENDERED_PREFIXES {
truncated = true ;
}
let mut prefixes = prefixes ;
prefixes . sort_by ( | a , b | {
a . len ( )
. cmp ( & b . len ( ) )
. then_with ( | | prefix_combined_str_len ( a ) . cmp ( & prefix_combined_str_len ( b ) ) )
. then_with ( | | a . cmp ( b ) )
} ) ;
let full_text = prefixes
2026-01-28 01:43:17 -07:00
. into_iter ( )
2026-02-01 18:26:15 -08:00
. take ( MAX_RENDERED_PREFIXES )
. map ( | prefix | format! ( " - {} " , render_command_prefix ( & prefix ) ) )
. collect ::< Vec < _ > > ( )
. join ( " \n " ) ;
// truncate to last UTF8 char
let mut output = full_text ;
let byte_idx = output
. char_indices ( )
. nth ( MAX_ALLOW_PREFIX_TEXT_BYTES )
. map ( | ( i , _ ) | i ) ;
if let Some ( byte_idx ) = byte_idx {
truncated = true ;
output = output [ .. byte_idx ] . to_string ( ) ;
2026-01-28 01:43:17 -07:00
}
2026-02-01 18:26:15 -08:00
if truncated {
Some ( format! ( " {output} {TRUNCATED_MARKER} " ) )
} else {
Some ( output )
}
}
fn prefix_combined_str_len ( prefix : & [ String ] ) -> usize {
prefix . iter ( ) . map ( String ::len ) . sum ( )
2026-01-28 01:43:17 -07:00
}
fn render_command_prefix ( prefix : & [ String ] ) -> String {
let tokens = prefix
. iter ( )
. map ( | token | serde_json ::to_string ( token ) . unwrap_or_else ( | _ | format! ( " {token:?} " ) ) )
. collect ::< Vec < _ > > ( )
. join ( " , " ) ;
format! ( " [ {tokens} ] " )
}
2026-01-12 15:12:59 -08:00
impl From < DeveloperInstructions > for ResponseItem {
fn from ( di : DeveloperInstructions ) -> Self {
ResponseItem ::Message {
id : None ,
role : " developer " . to_string ( ) ,
content : vec ! [ ContentItem ::InputText {
text : di . into_text ( ) ,
} ] ,
2026-01-22 09:27:48 -08:00
end_turn : None ,
2026-01-12 15:12:59 -08:00
}
}
}
impl From < SandboxMode > for DeveloperInstructions {
fn from ( mode : SandboxMode ) -> Self {
let network_access = match mode {
SandboxMode ::DangerFullAccess = > NetworkAccess ::Enabled ,
SandboxMode ::WorkspaceWrite | SandboxMode ::ReadOnly = > NetworkAccess ::Restricted ,
} ;
DeveloperInstructions ::sandbox_text ( mode , network_access )
}
}
2025-08-13 18:39:58 -07:00
fn should_serialize_reasoning_content ( content : & Option < Vec < ReasoningItemContent > > ) -> bool {
match content {
Some ( content ) = > ! content
. iter ( )
. any ( | c | matches! ( c , ReasoningItemContent ::ReasoningText { .. } ) ) ,
None = > false ,
}
}
2025-10-27 16:58:10 +00:00
fn local_image_error_placeholder (
path : & std ::path ::Path ,
error : impl std ::fmt ::Display ,
) -> ContentItem {
ContentItem ::InputText {
text : format ! (
" Codex could not read the local image at `{}`: {} " ,
path . display ( ) ,
error
) ,
}
}
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
2026-01-09 21:33:45 -08:00
pub const VIEW_IMAGE_TOOL_NAME : & str = " view_image " ;
const IMAGE_OPEN_TAG : & str = " <image> " ;
const IMAGE_CLOSE_TAG : & str = " </image> " ;
const LOCAL_IMAGE_OPEN_TAG_PREFIX : & str = " <image name= " ;
const LOCAL_IMAGE_OPEN_TAG_SUFFIX : & str = " > " ;
const LOCAL_IMAGE_CLOSE_TAG : & str = IMAGE_CLOSE_TAG ;
pub fn image_open_tag_text ( ) -> String {
IMAGE_OPEN_TAG . to_string ( )
}
pub fn image_close_tag_text ( ) -> String {
IMAGE_CLOSE_TAG . to_string ( )
}
pub fn local_image_label_text ( label_number : usize ) -> String {
format! ( " [Image # {label_number} ] " )
}
pub fn local_image_open_tag_text ( label_number : usize ) -> String {
let label = local_image_label_text ( label_number ) ;
format! ( " {LOCAL_IMAGE_OPEN_TAG_PREFIX} {label} {LOCAL_IMAGE_OPEN_TAG_SUFFIX} " )
}
pub fn is_local_image_open_tag_text ( text : & str ) -> bool {
text . strip_prefix ( LOCAL_IMAGE_OPEN_TAG_PREFIX )
. is_some_and ( | rest | rest . ends_with ( LOCAL_IMAGE_OPEN_TAG_SUFFIX ) )
}
pub fn is_local_image_close_tag_text ( text : & str ) -> bool {
is_image_close_tag_text ( text )
}
pub fn is_image_open_tag_text ( text : & str ) -> bool {
text = = IMAGE_OPEN_TAG
}
pub fn is_image_close_tag_text ( text : & str ) -> bool {
text = = IMAGE_CLOSE_TAG
}
2025-11-17 17:10:53 +01:00
fn invalid_image_error_placeholder (
path : & std ::path ::Path ,
error : impl std ::fmt ::Display ,
) -> ContentItem {
ContentItem ::InputText {
text : format ! (
" Image located at `{}` is invalid: {} " ,
path . display ( ) ,
error
) ,
}
}
2025-12-10 02:28:41 +08:00
fn unsupported_image_error_placeholder ( path : & std ::path ::Path , mime : & str ) -> ContentItem {
ContentItem ::InputText {
text : format ! (
" Codex cannot attach image at `{}`: unsupported image format `{}`. " ,
path . display ( ) ,
mime
) ,
}
}
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
2026-01-09 21:33:45 -08:00
pub fn local_image_content_items_with_label_number (
path : & std ::path ::Path ,
label_number : Option < usize > ,
) -> Vec < ContentItem > {
match load_and_resize_to_fit ( path ) {
Ok ( image ) = > {
let mut items = Vec ::with_capacity ( 3 ) ;
if let Some ( label_number ) = label_number {
items . push ( ContentItem ::InputText {
text : local_image_open_tag_text ( label_number ) ,
} ) ;
}
items . push ( ContentItem ::InputImage {
image_url : image . into_data_url ( ) ,
} ) ;
if label_number . is_some ( ) {
items . push ( ContentItem ::InputText {
text : LOCAL_IMAGE_CLOSE_TAG . to_string ( ) ,
} ) ;
}
items
}
Err ( err ) = > {
if matches! ( & err , ImageProcessingError ::Read { .. } ) {
vec! [ local_image_error_placeholder ( path , & err ) ]
} else if err . is_invalid_image ( ) {
vec! [ invalid_image_error_placeholder ( path , & err ) ]
} else {
let Some ( mime_guess ) = mime_guess ::from_path ( path ) . first ( ) else {
return vec! [ local_image_error_placeholder (
path ,
" unsupported MIME type (unknown) " ,
) ] ;
} ;
let mime = mime_guess . essence_str ( ) . to_owned ( ) ;
if ! mime . starts_with ( " image/ " ) {
return vec! [ local_image_error_placeholder (
path ,
format! ( " unsupported MIME type ` {mime} ` " ) ,
) ] ;
}
vec! [ unsupported_image_error_placeholder ( path , & mime ) ]
}
}
}
}
2025-04-25 12:08:18 -07:00
impl From < ResponseInputItem > for ResponseItem {
fn from ( item : ResponseInputItem ) -> Self {
match item {
2025-07-23 10:37:45 -07:00
ResponseInputItem ::Message { role , content } = > Self ::Message {
role ,
content ,
id : None ,
2026-01-22 09:27:48 -08:00
end_turn : None ,
2025-07-23 10:37:45 -07:00
} ,
2025-04-25 12:08:18 -07:00
ResponseInputItem ::FunctionCallOutput { call_id , output } = > {
Self ::FunctionCallOutput { call_id , output }
}
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
ResponseInputItem ::McpToolCallOutput { call_id , result } = > {
let output = match result {
Ok ( result ) = > FunctionCallOutputPayload ::from ( & result ) ,
Err ( tool_call_err ) = > FunctionCallOutputPayload {
content : format ! ( " err: {tool_call_err:?} " ) ,
success : Some ( false ) ,
.. Default ::default ( )
} ,
} ;
Self ::FunctionCallOutput { call_id , output }
}
2025-08-22 13:42:34 -07:00
ResponseInputItem ::CustomToolCallOutput { call_id , output } = > {
Self ::CustomToolCallOutput { call_id , output }
}
2025-04-25 12:08:18 -07:00
}
}
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
2025-05-16 14:38:08 -07:00
#[ serde(rename_all = " snake_case " ) ]
pub enum LocalShellStatus {
Completed ,
InProgress ,
Incomplete ,
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
2025-05-16 14:38:08 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum LocalShellAction {
Exec ( LocalShellExecAction ) ,
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
2025-05-16 14:38:08 -07:00
pub struct LocalShellExecAction {
pub command : Vec < String > ,
pub timeout_ms : Option < u64 > ,
pub working_directory : Option < String > ,
pub env : Option < HashMap < String , String > > ,
pub user : Option < String > ,
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
2025-08-28 19:24:38 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum WebSearchAction {
Search {
2025-11-20 20:45:28 -08:00
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
query : Option < String > ,
2026-01-30 16:37:56 -08:00
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
queries : Option < Vec < String > > ,
2025-11-20 20:45:28 -08:00
} ,
OpenPage {
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
url : Option < String > ,
} ,
FindInPage {
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
url : Option < String > ,
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
pattern : Option < String > ,
2025-08-28 19:24:38 -07:00
} ,
2025-11-20 20:45:28 -08:00
2025-08-28 19:24:38 -07:00
#[ serde(other) ]
Other ,
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
2025-05-10 21:43:27 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum ReasoningItemReasoningSummary {
SummaryText { text : String } ,
}
2025-10-20 11:45:11 -07:00
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
2025-08-05 01:56:13 -07:00
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum ReasoningItemContent {
ReasoningText { text : String } ,
2025-08-13 18:39:58 -07:00
Text { text : String } ,
2025-08-05 01:56:13 -07:00
}
2025-10-20 13:34:44 -07:00
impl From < Vec < UserInput > > for ResponseInputItem {
fn from ( items : Vec < UserInput > ) -> Self {
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
2026-01-09 21:33:45 -08:00
let mut image_index = 0 ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
Self ::Message {
role : " user " . to_string ( ) ,
content : items
. into_iter ( )
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
2026-01-09 21:33:45 -08:00
. flat_map ( | c | match c {
2026-01-14 16:41:50 -08:00
UserInput ::Text { text , .. } = > vec! [ ContentItem ::InputText { text } ] ,
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
2026-01-09 21:33:45 -08:00
UserInput ::Image { image_url } = > vec! [
ContentItem ::InputText {
text : image_open_tag_text ( ) ,
} ,
ContentItem ::InputImage { image_url } ,
ContentItem ::InputText {
text : image_close_tag_text ( ) ,
} ,
] ,
UserInput ::LocalImage { path } = > {
image_index + = 1 ;
local_image_content_items_with_label_number ( & path , Some ( image_index ) )
}
2026-01-28 19:51:58 -08:00
UserInput ::Skill { .. } | UserInput ::Mention { .. } = > Vec ::new ( ) , // Tool bodies are injected later in core
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
} )
. collect ::< Vec < ContentItem > > ( ) ,
}
}
}
2025-05-04 10:57:12 -07:00
/// If the `name` of a `ResponseItem::FunctionCall` is either `container.exec`
feat: shell_command tool (#6510)
This adds support for a new variant of the shell tool behind a flag. To
test, run `codex` with `--enable shell_command_tool`, which will
register the tool with Codex under the name `shell_command` that accepts
the following shape:
```python
{
command: str
workdir: str | None,
timeout_ms: int | None,
with_escalated_permissions: bool | None,
justification: str | None,
}
```
This is comparable to the existing tool registered under
`shell`/`container.exec`. The primary difference is that it accepts
`command` as a `str` instead of a `str[]`. The `shell_command` tool
executes by running `execvp(["bash", "-lc", command])`, though the exact
arguments to `execvp(3)` depend on the user's default shell.
The hypothesis is that this will simplify things for the model. For
example, on Windows, instead of generating:
```json
{"command": ["pwsh.exe", "-NoLogo", "-Command", "ls -Name"]}
```
The model could simply generate:
```json
{"command": "ls -Name"}
```
As part of this change, I extracted some logic out of `user_shell.rs` as
`Shell::derive_exec_args()` so that it can be reused in
`codex-rs/core/src/tools/handlers/shell.rs`. Note the original code
generated exec arg lists like:
```javascript
["bash", "-lc", command]
["zsh", "-lc", command]
["pwsh.exe", "-NoProfile", "-Command", command]
```
Using `-l` for Bash and Zsh, but then specifying `-NoProfile` for
PowerShell seemed inconsistent to me, so I changed this in the new
implementation while also adding a `use_login_shell: bool` option to
make this explicit. If we decide to add a `login: bool` to
`ShellCommandToolCallParams` like we have for unified exec:
https://github.com/openai/codex/blob/807e2c27f0a9f2e85c50e7e6df5533f0d9b853c7/codex-rs/core/src/tools/handlers/unified_exec.rs#L33-L34
Then this should make it straightforward to support.
2025-11-12 08:18:57 -08:00
/// or `shell`, the `arguments` field should deserialize to this struct.
2025-10-20 11:45:11 -07:00
#[ derive(Deserialize, Debug, Clone, PartialEq, JsonSchema, TS) ]
2025-05-04 10:57:12 -07:00
pub struct ShellToolCallParams {
pub command : Vec < String > ,
pub workdir : Option < String > ,
2025-08-21 19:58:07 -07:00
/// This is the maximum time in milliseconds that the command is allowed to run.
#[ serde(alias = " timeout " ) ]
2025-05-04 10:57:12 -07:00
pub timeout_ms : Option < u64 > ,
2025-12-10 09:18:48 -08:00
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
pub sandbox_permissions : Option < SandboxPermissions > ,
2026-01-28 01:43:17 -07:00
/// Suggests a command prefix to persist for future sessions
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
pub prefix_rule : Option < Vec < String > > ,
feat: shell_command tool (#6510)
This adds support for a new variant of the shell tool behind a flag. To
test, run `codex` with `--enable shell_command_tool`, which will
register the tool with Codex under the name `shell_command` that accepts
the following shape:
```python
{
command: str
workdir: str | None,
timeout_ms: int | None,
with_escalated_permissions: bool | None,
justification: str | None,
}
```
This is comparable to the existing tool registered under
`shell`/`container.exec`. The primary difference is that it accepts
`command` as a `str` instead of a `str[]`. The `shell_command` tool
executes by running `execvp(["bash", "-lc", command])`, though the exact
arguments to `execvp(3)` depend on the user's default shell.
The hypothesis is that this will simplify things for the model. For
example, on Windows, instead of generating:
```json
{"command": ["pwsh.exe", "-NoLogo", "-Command", "ls -Name"]}
```
The model could simply generate:
```json
{"command": "ls -Name"}
```
As part of this change, I extracted some logic out of `user_shell.rs` as
`Shell::derive_exec_args()` so that it can be reused in
`codex-rs/core/src/tools/handlers/shell.rs`. Note the original code
generated exec arg lists like:
```javascript
["bash", "-lc", command]
["zsh", "-lc", command]
["pwsh.exe", "-NoProfile", "-Command", command]
```
Using `-l` for Bash and Zsh, but then specifying `-NoProfile` for
PowerShell seemed inconsistent to me, so I changed this in the new
implementation while also adding a `use_login_shell: bool` option to
make this explicit. If we decide to add a `login: bool` to
`ShellCommandToolCallParams` like we have for unified exec:
https://github.com/openai/codex/blob/807e2c27f0a9f2e85c50e7e6df5533f0d9b853c7/codex-rs/core/src/tools/handlers/unified_exec.rs#L33-L34
Then this should make it straightforward to support.
2025-11-12 08:18:57 -08:00
#[ serde(skip_serializing_if = " Option::is_none " ) ]
pub justification : Option < String > ,
}
/// If the `name` of a `ResponseItem::FunctionCall` is `shell_command`, the
/// `arguments` field should deserialize to this struct.
#[ derive(Deserialize, Debug, Clone, PartialEq, JsonSchema, TS) ]
pub struct ShellCommandToolCallParams {
pub command : String ,
pub workdir : Option < String > ,
2025-12-05 11:03:25 -08:00
/// Whether to run the shell with login shell semantics
#[ serde(skip_serializing_if = " Option::is_none " ) ]
pub login : Option < bool > ,
feat: shell_command tool (#6510)
This adds support for a new variant of the shell tool behind a flag. To
test, run `codex` with `--enable shell_command_tool`, which will
register the tool with Codex under the name `shell_command` that accepts
the following shape:
```python
{
command: str
workdir: str | None,
timeout_ms: int | None,
with_escalated_permissions: bool | None,
justification: str | None,
}
```
This is comparable to the existing tool registered under
`shell`/`container.exec`. The primary difference is that it accepts
`command` as a `str` instead of a `str[]`. The `shell_command` tool
executes by running `execvp(["bash", "-lc", command])`, though the exact
arguments to `execvp(3)` depend on the user's default shell.
The hypothesis is that this will simplify things for the model. For
example, on Windows, instead of generating:
```json
{"command": ["pwsh.exe", "-NoLogo", "-Command", "ls -Name"]}
```
The model could simply generate:
```json
{"command": "ls -Name"}
```
As part of this change, I extracted some logic out of `user_shell.rs` as
`Shell::derive_exec_args()` so that it can be reused in
`codex-rs/core/src/tools/handlers/shell.rs`. Note the original code
generated exec arg lists like:
```javascript
["bash", "-lc", command]
["zsh", "-lc", command]
["pwsh.exe", "-NoProfile", "-Command", command]
```
Using `-l` for Bash and Zsh, but then specifying `-NoProfile` for
PowerShell seemed inconsistent to me, so I changed this in the new
implementation while also adding a `use_login_shell: bool` option to
make this explicit. If we decide to add a `login: bool` to
`ShellCommandToolCallParams` like we have for unified exec:
https://github.com/openai/codex/blob/807e2c27f0a9f2e85c50e7e6df5533f0d9b853c7/codex-rs/core/src/tools/handlers/unified_exec.rs#L33-L34
Then this should make it straightforward to support.
2025-11-12 08:18:57 -08:00
/// This is the maximum time in milliseconds that the command is allowed to run.
#[ serde(alias = " timeout " ) ]
pub timeout_ms : Option < u64 > ,
2025-12-10 09:18:48 -08:00
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
pub sandbox_permissions : Option < SandboxPermissions > ,
2026-01-28 01:43:17 -07:00
#[ serde(default, skip_serializing_if = " Option::is_none " ) ]
#[ ts(optional) ]
pub prefix_rule : Option < Vec < String > > ,
2025-08-05 20:44:20 -07:00
#[ serde(skip_serializing_if = " Option::is_none " ) ]
pub justification : Option < String > ,
2025-05-04 10:57:12 -07:00
}
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
/// Responses API compatible content items that can be returned by a tool call.
/// This is a subset of ContentItem with the types we support as function call outputs.
#[ derive(Debug, Clone, Serialize, Deserialize, PartialEq, JsonSchema, TS) ]
#[ serde(tag = " type " , rename_all = " snake_case " ) ]
pub enum FunctionCallOutputContentItem {
// Do not rename, these are serialized and used directly in the responses API.
InputText { text : String } ,
// Do not rename, these are serialized and used directly in the responses API.
InputImage { image_url : String } ,
}
/// The payload we send back to OpenAI when reporting a tool call result.
///
/// `content` preserves the historical plain-string payload so downstream
/// integrations (tests, logging, etc.) can keep treating tool output as
/// `String`. When an MCP server returns richer data we additionally populate
/// `content_items` with the structured form that the Responses/Chat
/// Completions APIs understand.
#[ derive(Debug, Default, Clone, PartialEq, JsonSchema, TS) ]
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
pub struct FunctionCallOutputPayload {
pub content : String ,
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
#[ serde(skip_serializing_if = " Option::is_none " ) ]
pub content_items : Option < Vec < FunctionCallOutputContentItem > > ,
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
pub success : Option < bool > ,
}
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
#[ derive(Deserialize) ]
#[ serde(untagged) ]
enum FunctionCallOutputPayloadSerde {
Text ( String ) ,
Items ( Vec < FunctionCallOutputContentItem > ) ,
}
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
// The Responses API expects two *different* shapes depending on success vs failure:
// • success → output is a plain string (no nested object)
// • failure → output is an object { content, success:false }
impl Serialize for FunctionCallOutputPayload {
fn serialize < S > ( & self , serializer : S ) -> Result < S ::Ok , S ::Error >
where
S : Serializer ,
{
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
if let Some ( items ) = & self . content_items {
items . serialize ( serializer )
} else {
serializer . serialize_str ( & self . content )
}
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
}
}
2025-07-23 10:37:45 -07:00
impl < ' de > Deserialize < ' de > for FunctionCallOutputPayload {
fn deserialize < D > ( deserializer : D ) -> Result < Self , D ::Error >
where
D : Deserializer < ' de > ,
{
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
match FunctionCallOutputPayloadSerde ::deserialize ( deserializer ) ? {
FunctionCallOutputPayloadSerde ::Text ( content ) = > Ok ( FunctionCallOutputPayload {
content ,
.. Default ::default ( )
} ) ,
FunctionCallOutputPayloadSerde ::Items ( items ) = > {
let content = serde_json ::to_string ( & items ) . map_err ( serde ::de ::Error ::custom ) ? ;
Ok ( FunctionCallOutputPayload {
content ,
content_items : Some ( items ) ,
success : None ,
} )
}
}
2025-07-23 10:37:45 -07:00
}
}
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
impl From < & CallToolResult > for FunctionCallOutputPayload {
fn from ( call_tool_result : & CallToolResult ) -> Self {
let CallToolResult {
content ,
structured_content ,
is_error ,
} = call_tool_result ;
let is_success = is_error ! = & Some ( true ) ;
if let Some ( structured_content ) = structured_content
& & ! structured_content . is_null ( )
{
match serde_json ::to_string ( structured_content ) {
Ok ( serialized_structured_content ) = > {
return FunctionCallOutputPayload {
content : serialized_structured_content ,
success : Some ( is_success ) ,
.. Default ::default ( )
} ;
}
Err ( err ) = > {
return FunctionCallOutputPayload {
content : err . to_string ( ) ,
success : Some ( false ) ,
.. Default ::default ( )
} ;
}
}
}
let serialized_content = match serde_json ::to_string ( content ) {
Ok ( serialized_content ) = > serialized_content ,
Err ( err ) = > {
return FunctionCallOutputPayload {
content : err . to_string ( ) ,
success : Some ( false ) ,
.. Default ::default ( )
} ;
}
} ;
let content_items = convert_content_blocks_to_items ( content ) ;
FunctionCallOutputPayload {
content : serialized_content ,
content_items ,
success : Some ( is_success ) ,
}
}
}
fn convert_content_blocks_to_items (
blocks : & [ ContentBlock ] ,
) -> Option < Vec < FunctionCallOutputContentItem > > {
let mut saw_image = false ;
let mut items = Vec ::with_capacity ( blocks . len ( ) ) ;
2025-11-17 17:10:53 +01:00
tracing ::warn! ( " Blocks: {:?} " , blocks ) ;
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
for block in blocks {
match block {
ContentBlock ::TextContent ( text ) = > {
items . push ( FunctionCallOutputContentItem ::InputText {
text : text . text . clone ( ) ,
} ) ;
}
ContentBlock ::ImageContent ( image ) = > {
saw_image = true ;
// Just in case the content doesn't include a data URL, add it.
let image_url = if image . data . starts_with ( " data: " ) {
image . data . clone ( )
} else {
format! ( " data: {} ;base64, {} " , image . mime_type , image . data )
} ;
items . push ( FunctionCallOutputContentItem ::InputImage { image_url } ) ;
}
// TODO: render audio, resource, and embedded resource content to the model.
_ = > return None ,
}
}
if saw_image { Some ( items ) } else { None }
}
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
// Implement Display so callers can treat the payload like a plain string when logging or doing
// trivial substring checks in tests (existing tests call `.contains()` on the output). Display
// returns the raw `content` field.
impl std ::fmt ::Display for FunctionCallOutputPayload {
fn fmt ( & self , f : & mut std ::fmt ::Formatter < '_ > ) -> std ::fmt ::Result {
f . write_str ( & self . content )
}
}
impl std ::ops ::Deref for FunctionCallOutputPayload {
type Target = str ;
fn deref ( & self ) -> & Self ::Target {
& self . content
}
}
2025-09-03 22:34:50 -07:00
// (Moved event mapping logic into codex-core to avoid coupling protocol to UI-facing events.)
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
#[ cfg(test) ]
mod tests {
use super ::* ;
2026-01-12 15:12:59 -08:00
use crate ::config_types ::SandboxMode ;
use crate ::protocol ::AskForApproval ;
2025-09-23 13:31:36 -07:00
use anyhow ::Result ;
2026-01-28 01:43:17 -07:00
use codex_execpolicy ::Policy ;
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
use mcp_types ::ImageContent ;
use mcp_types ::TextContent ;
2025-12-02 09:21:30 +00:00
use pretty_assertions ::assert_eq ;
2026-01-12 15:12:59 -08:00
use std ::path ::PathBuf ;
2025-10-27 16:58:10 +00:00
use tempfile ::tempdir ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
2026-01-12 15:12:59 -08:00
#[ test ]
fn converts_sandbox_mode_into_developer_instructions ( ) {
2026-01-28 01:43:17 -07:00
let workspace_write : DeveloperInstructions = SandboxMode ::WorkspaceWrite . into ( ) ;
2026-01-12 15:12:59 -08:00
assert_eq! (
2026-01-28 01:43:17 -07:00
workspace_write ,
2026-01-12 15:12:59 -08:00
DeveloperInstructions ::new (
" Filesystem sandboxing defines which files can be read or written. `sandbox_mode` is `workspace-write`: The sandbox permits reading files, and editing files in `cwd` and `writable_roots`. Editing files in other directories requires approval. Network access is restricted. "
)
) ;
2026-01-28 01:43:17 -07:00
let read_only : DeveloperInstructions = SandboxMode ::ReadOnly . into ( ) ;
2026-01-12 15:12:59 -08:00
assert_eq! (
2026-01-28 01:43:17 -07:00
read_only ,
2026-01-12 15:12:59 -08:00
DeveloperInstructions ::new (
" Filesystem sandboxing defines which files can be read or written. `sandbox_mode` is `read-only`: The sandbox only permits reading files. Network access is restricted. "
)
) ;
}
#[ test ]
fn builds_permissions_with_network_access_override ( ) {
let instructions = DeveloperInstructions ::from_permissions_with_network (
SandboxMode ::WorkspaceWrite ,
NetworkAccess ::Enabled ,
AskForApproval ::OnRequest ,
2026-01-28 01:43:17 -07:00
& Policy ::empty ( ) ,
false ,
2026-01-12 15:12:59 -08:00
None ,
) ;
let text = instructions . into_text ( ) ;
assert! (
text . contains ( " Network access is enabled. " ) ,
" expected network access to be enabled in message "
) ;
assert! (
text . contains ( " `approval_policy` is `on-request` " ) ,
" expected approval guidance to be included "
) ;
}
#[ test ]
fn builds_permissions_from_policy ( ) {
let policy = SandboxPolicy ::WorkspaceWrite {
writable_roots : vec ! [ ] ,
network_access : true ,
exclude_tmpdir_env_var : false ,
exclude_slash_tmp : false ,
} ;
let instructions = DeveloperInstructions ::from_policy (
& policy ,
AskForApproval ::UnlessTrusted ,
2026-01-28 01:43:17 -07:00
& Policy ::empty ( ) ,
false ,
2026-01-12 15:12:59 -08:00
& PathBuf ::from ( " /tmp " ) ,
) ;
let text = instructions . into_text ( ) ;
assert! ( text . contains ( " Network access is enabled. " ) ) ;
assert! ( text . contains ( " `approval_policy` is `unless-trusted` " ) ) ;
}
2026-01-28 01:43:17 -07:00
#[ test ]
fn includes_request_rule_instructions_when_enabled ( ) {
let mut exec_policy = Policy ::empty ( ) ;
exec_policy
. add_prefix_rule (
& [ " git " . to_string ( ) , " pull " . to_string ( ) ] ,
codex_execpolicy ::Decision ::Allow ,
)
. expect ( " add rule " ) ;
let instructions = DeveloperInstructions ::from_permissions_with_network (
SandboxMode ::WorkspaceWrite ,
NetworkAccess ::Enabled ,
AskForApproval ::OnRequest ,
& exec_policy ,
true ,
None ,
) ;
let text = instructions . into_text ( ) ;
assert! ( text . contains ( " prefix_rule " ) ) ;
assert! ( text . contains ( " Approved command prefixes " ) ) ;
assert! ( text . contains ( r # "["git", "pull"]"# ) ) ;
}
2026-02-01 18:26:15 -08:00
#[ test ]
fn render_command_prefix_list_sorts_by_len_then_total_len_then_alphabetical ( ) {
let prefixes = vec! [
vec! [ " b " . to_string ( ) , " zz " . to_string ( ) ] ,
vec! [ " aa " . to_string ( ) ] ,
vec! [ " b " . to_string ( ) ] ,
vec! [ " a " . to_string ( ) , " b " . to_string ( ) , " c " . to_string ( ) ] ,
vec! [ " a " . to_string ( ) ] ,
vec! [ " b " . to_string ( ) , " a " . to_string ( ) ] ,
] ;
let output = format_allow_prefixes ( prefixes ) . expect ( " rendered list " ) ;
assert_eq! (
output ,
r #" - [ " a " ]
- [ " b " ]
- [ " aa " ]
- [ " b " , " a " ]
- [ " b " , " zz " ]
- [ " a " , " b " , " c " ] " #
. to_string ( ) ,
) ;
}
#[ test ]
fn render_command_prefix_list_limits_output_to_max_prefixes ( ) {
let prefixes = ( 0 .. ( MAX_RENDERED_PREFIXES + 5 ) )
. map ( | i | vec! [ format! ( " {i:03} " ) ] )
. collect ::< Vec < _ > > ( ) ;
let output = format_allow_prefixes ( prefixes ) . expect ( " rendered list " ) ;
assert_eq! ( output . ends_with ( TRUNCATED_MARKER ) , true ) ;
eprintln! ( " output: {output} " ) ;
assert_eq! ( output . lines ( ) . count ( ) , MAX_RENDERED_PREFIXES + 1 ) ;
}
#[ test ]
fn format_allow_prefixes_limits_output ( ) {
let mut exec_policy = Policy ::empty ( ) ;
for i in 0 .. 200 {
exec_policy
. add_prefix_rule (
& [ format! ( " tool- {i:03} " ) , " x " . repeat ( 500 ) ] ,
codex_execpolicy ::Decision ::Allow ,
)
. expect ( " add rule " ) ;
}
let output =
format_allow_prefixes ( exec_policy . get_allowed_prefixes ( ) ) . expect ( " formatted prefixes " ) ;
assert! (
output . len ( ) < = MAX_ALLOW_PREFIX_TEXT_BYTES + TRUNCATED_MARKER . len ( ) ,
" output length exceeds expected limit: {output} " ,
) ;
}
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
#[ test ]
2025-09-23 13:31:36 -07:00
fn serializes_success_as_plain_string ( ) -> Result < ( ) > {
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
let item = ResponseInputItem ::FunctionCallOutput {
call_id : " call1 " . into ( ) ,
output : FunctionCallOutputPayload {
content : " ok " . into ( ) ,
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
.. Default ::default ( )
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
} ,
} ;
2025-09-23 13:31:36 -07:00
let json = serde_json ::to_string ( & item ) ? ;
let v : serde_json ::Value = serde_json ::from_str ( & json ) ? ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
// Success case -> output should be a plain string
assert_eq! ( v . get ( " output " ) . unwrap ( ) . as_str ( ) . unwrap ( ) , " ok " ) ;
2025-09-23 13:31:36 -07:00
Ok ( ( ) )
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
}
#[ test ]
2025-09-23 13:31:36 -07:00
fn serializes_failure_as_string ( ) -> Result < ( ) > {
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
let item = ResponseInputItem ::FunctionCallOutput {
call_id : " call1 " . into ( ) ,
output : FunctionCallOutputPayload {
content : " bad " . into ( ) ,
success : Some ( false ) ,
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
.. Default ::default ( )
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
} ,
} ;
2025-09-23 13:31:36 -07:00
let json = serde_json ::to_string ( & item ) ? ;
let v : serde_json ::Value = serde_json ::from_str ( & json ) ? ;
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
assert_eq! ( v . get ( " output " ) . unwrap ( ) . as_str ( ) . unwrap ( ) , " bad " ) ;
2025-09-23 13:31:36 -07:00
Ok ( ( ) )
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
}
2025-05-04 10:57:12 -07:00
[MCP] Render MCP tool call result images to the model (#5600)
It's pretty amazing we have gotten here without the ability for the
model to see image content from MCP tool calls.
This PR builds off of 4391 and fixes #4819. I would like @KKcorps to get
adequete credit here but I also want to get this fix in ASAP so I gave
him a week to update it and haven't gotten a response so I'm going to
take it across the finish line.
This test highlights how absured the current situation is. I asked the
model to read this image using the Chrome MCP
<img width="2378" height="674" alt="image"
src="https://github.com/user-attachments/assets/9ef52608-72a2-4423-9f5e-7ae36b2b56e0"
/>
After this change, it correctly outputs:
> Captured the page: image dhows a dark terminal-style UI labeled
`OpenAI Codex (v0.0.0)` with prompt `model: gpt-5-codex medium` and
working directory `/codex/codex-rs`
(and more)
Before this change, it said:
> Took the full-page screenshot you asked for. It shows a long,
horizontally repeating pattern of stylized people in orange, light-blue,
and mustard clothing, holding hands in alternating poses against a white
background. No text or other graphics-just rows of flat illustration
stretching off to the right.
Without this change, the Figma, Playwright, Chrome, and other visual MCP
servers are pretty much entirely useless.
I tested this change with the openai respones api as well as a third
party completions api
2025-10-27 14:55:57 -07:00
#[ test ]
fn serializes_image_outputs_as_array ( ) -> Result < ( ) > {
let call_tool_result = CallToolResult {
content : vec ! [
ContentBlock ::TextContent ( TextContent {
annotations : None ,
text : " caption " . into ( ) ,
r#type : " text " . into ( ) ,
} ) ,
ContentBlock ::ImageContent ( ImageContent {
annotations : None ,
data : " BASE64 " . into ( ) ,
mime_type : " image/png " . into ( ) ,
r#type : " image " . into ( ) ,
} ) ,
] ,
is_error : None ,
structured_content : None ,
} ;
let payload = FunctionCallOutputPayload ::from ( & call_tool_result ) ;
assert_eq! ( payload . success , Some ( true ) ) ;
let items = payload . content_items . clone ( ) . expect ( " content items " ) ;
assert_eq! (
items ,
vec! [
FunctionCallOutputContentItem ::InputText {
text : " caption " . into ( ) ,
} ,
FunctionCallOutputContentItem ::InputImage {
image_url : " data:image/png;base64,BASE64 " . into ( ) ,
} ,
]
) ;
let item = ResponseInputItem ::FunctionCallOutput {
call_id : " call1 " . into ( ) ,
output : payload ,
} ;
let json = serde_json ::to_string ( & item ) ? ;
let v : serde_json ::Value = serde_json ::from_str ( & json ) ? ;
let output = v . get ( " output " ) . expect ( " output field " ) ;
assert! ( output . is_array ( ) , " expected array output " ) ;
Ok ( ( ) )
}
#[ test ]
fn deserializes_array_payload_into_items ( ) -> Result < ( ) > {
let json = r #" [
{ " type " : " input_text " , " text " : " note " } ,
{ " type " : " input_image " , " image_url " : " data:image/png;base64,XYZ " }
] " #;
let payload : FunctionCallOutputPayload = serde_json ::from_str ( json ) ? ;
assert_eq! ( payload . success , None ) ;
let expected_items = vec! [
FunctionCallOutputContentItem ::InputText {
text : " note " . into ( ) ,
} ,
FunctionCallOutputContentItem ::InputImage {
image_url : " data:image/png;base64,XYZ " . into ( ) ,
} ,
] ;
assert_eq! ( payload . content_items , Some ( expected_items . clone ( ) ) ) ;
let expected_content = serde_json ::to_string ( & expected_items ) ? ;
assert_eq! ( payload . content , expected_content ) ;
Ok ( ( ) )
}
2025-12-02 09:21:30 +00:00
#[ test ]
fn deserializes_compaction_alias ( ) -> Result < ( ) > {
2025-12-12 10:05:02 -08:00
let json = r # "{"type":"compaction_summary","encrypted_content":"abc"}"# ;
2025-12-02 09:21:30 +00:00
let item : ResponseItem = serde_json ::from_str ( json ) ? ;
assert_eq! (
item ,
2025-12-12 10:05:02 -08:00
ResponseItem ::Compaction {
2025-12-02 09:21:30 +00:00
encrypted_content : " abc " . into ( ) ,
}
) ;
Ok ( ( ) )
}
2025-11-20 20:45:28 -08:00
#[ test ]
fn roundtrips_web_search_call_actions ( ) -> Result < ( ) > {
let cases = vec! [
(
r #" {
" type " : " web_search_call " ,
" status " : " completed " ,
" action " : {
" type " : " search " ,
2026-01-30 16:37:56 -08:00
" query " : " weather seattle " ,
" queries " : [ " weather seattle " , " seattle weather now " ]
2025-11-20 20:45:28 -08:00
}
} " #,
2026-01-26 19:33:48 -08:00
None ,
Some ( WebSearchAction ::Search {
2025-11-20 20:45:28 -08:00
query : Some ( " weather seattle " . into ( ) ) ,
2026-01-30 16:37:56 -08:00
queries : Some ( vec! [ " weather seattle " . into ( ) , " seattle weather now " . into ( ) ] ) ,
2026-01-26 19:33:48 -08:00
} ) ,
2025-11-20 20:45:28 -08:00
Some ( " completed " . into ( ) ) ,
2026-01-26 19:33:48 -08:00
true ,
2025-11-20 20:45:28 -08:00
) ,
(
r #" {
" type " : " web_search_call " ,
" status " : " open " ,
" action " : {
" type " : " open_page " ,
" url " : " https://example.com "
}
} " #,
2026-01-26 19:33:48 -08:00
None ,
Some ( WebSearchAction ::OpenPage {
2025-11-20 20:45:28 -08:00
url : Some ( " https://example.com " . into ( ) ) ,
2026-01-26 19:33:48 -08:00
} ) ,
2025-11-20 20:45:28 -08:00
Some ( " open " . into ( ) ) ,
2026-01-26 19:33:48 -08:00
true ,
2025-11-20 20:45:28 -08:00
) ,
(
r #" {
" type " : " web_search_call " ,
" status " : " in_progress " ,
" action " : {
" type " : " find_in_page " ,
" url " : " https://example.com/docs " ,
" pattern " : " installation "
}
} " #,
2026-01-26 19:33:48 -08:00
None ,
Some ( WebSearchAction ::FindInPage {
2025-11-20 20:45:28 -08:00
url : Some ( " https://example.com/docs " . into ( ) ) ,
pattern : Some ( " installation " . into ( ) ) ,
2026-01-26 19:33:48 -08:00
} ) ,
Some ( " in_progress " . into ( ) ) ,
true ,
) ,
(
r #" {
" type " : " web_search_call " ,
" status " : " in_progress " ,
" id " : " ws_partial "
} " #,
Some ( " ws_partial " . into ( ) ) ,
None ,
2025-11-20 20:45:28 -08:00
Some ( " in_progress " . into ( ) ) ,
2026-01-26 19:33:48 -08:00
false ,
2025-11-20 20:45:28 -08:00
) ,
] ;
2026-01-26 19:33:48 -08:00
for ( json_literal , expected_id , expected_action , expected_status , expect_roundtrip ) in cases
{
2025-11-20 20:45:28 -08:00
let parsed : ResponseItem = serde_json ::from_str ( json_literal ) ? ;
let expected = ResponseItem ::WebSearchCall {
2026-01-26 19:33:48 -08:00
id : expected_id . clone ( ) ,
2025-11-20 20:45:28 -08:00
status : expected_status . clone ( ) ,
action : expected_action . clone ( ) ,
} ;
assert_eq! ( parsed , expected ) ;
let serialized = serde_json ::to_value ( & parsed ) ? ;
2026-01-26 19:33:48 -08:00
let mut expected_serialized : serde_json ::Value = serde_json ::from_str ( json_literal ) ? ;
if ! expect_roundtrip & & let Some ( obj ) = expected_serialized . as_object_mut ( ) {
obj . remove ( " id " ) ;
}
assert_eq! ( serialized , expected_serialized ) ;
2025-11-20 20:45:28 -08:00
}
Ok ( ( ) )
}
2025-05-04 10:57:12 -07:00
#[ test ]
2025-09-23 13:31:36 -07:00
fn deserialize_shell_tool_call_params ( ) -> Result < ( ) > {
2025-05-04 10:57:12 -07:00
let json = r #" {
" command " : [ " ls " , " -l " ] ,
" workdir " : " /tmp " ,
" timeout " : 1000
} " #;
2025-09-23 13:31:36 -07:00
let params : ShellToolCallParams = serde_json ::from_str ( json ) ? ;
2025-05-04 10:57:12 -07:00
assert_eq! (
ShellToolCallParams {
command : vec ! [ " ls " . to_string ( ) , " -l " . to_string ( ) ] ,
workdir : Some ( " /tmp " . to_string ( ) ) ,
timeout_ms : Some ( 1000 ) ,
2025-12-10 09:18:48 -08:00
sandbox_permissions : None ,
2026-01-28 01:43:17 -07:00
prefix_rule : None ,
2025-08-05 20:44:20 -07:00
justification : None ,
2025-05-04 10:57:12 -07:00
} ,
params
) ;
2025-09-23 13:31:36 -07:00
Ok ( ( ) )
2025-05-04 10:57:12 -07:00
}
2025-10-27 16:58:10 +00:00
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
2026-01-09 21:33:45 -08:00
#[ test ]
fn wraps_image_user_input_with_tags ( ) -> Result < ( ) > {
let image_url = " data:image/png;base64,abc " . to_string ( ) ;
let item = ResponseInputItem ::from ( vec! [ UserInput ::Image {
image_url : image_url . clone ( ) ,
} ] ) ;
match item {
ResponseInputItem ::Message { content , .. } = > {
let expected = vec! [
ContentItem ::InputText {
text : image_open_tag_text ( ) ,
} ,
ContentItem ::InputImage { image_url } ,
ContentItem ::InputText {
text : image_close_tag_text ( ) ,
} ,
] ;
assert_eq! ( content , expected ) ;
}
other = > panic! ( " expected message response but got {other:?} " ) ,
}
Ok ( ( ) )
}
2025-10-27 16:58:10 +00:00
#[ test ]
fn local_image_read_error_adds_placeholder ( ) -> Result < ( ) > {
let dir = tempdir ( ) ? ;
let missing_path = dir . path ( ) . join ( " missing-image.png " ) ;
let item = ResponseInputItem ::from ( vec! [ UserInput ::LocalImage {
path : missing_path . clone ( ) ,
} ] ) ;
match item {
ResponseInputItem ::Message { content , .. } = > {
assert_eq! ( content . len ( ) , 1 ) ;
match & content [ 0 ] {
ContentItem ::InputText { text } = > {
let display_path = missing_path . display ( ) . to_string ( ) ;
assert! (
text . contains ( & display_path ) ,
" placeholder should mention missing path: {text} "
) ;
assert! (
text . contains ( " could not read " ) ,
" placeholder should mention read issue: {text} "
) ;
}
other = > panic! ( " expected placeholder text but found {other:?} " ) ,
}
}
other = > panic! ( " expected message response but got {other:?} " ) ,
}
Ok ( ( ) )
}
2025-10-28 14:52:51 -07:00
#[ test ]
fn local_image_non_image_adds_placeholder ( ) -> Result < ( ) > {
let dir = tempdir ( ) ? ;
let json_path = dir . path ( ) . join ( " example.json " ) ;
std ::fs ::write ( & json_path , br # "{"hello":"world"}"# ) ? ;
let item = ResponseInputItem ::from ( vec! [ UserInput ::LocalImage {
path : json_path . clone ( ) ,
} ] ) ;
match item {
ResponseInputItem ::Message { content , .. } = > {
assert_eq! ( content . len ( ) , 1 ) ;
match & content [ 0 ] {
ContentItem ::InputText { text } = > {
assert! (
text . contains ( " unsupported MIME type `application/json` " ) ,
" placeholder should mention unsupported MIME: {text} "
) ;
assert! (
text . contains ( & json_path . display ( ) . to_string ( ) ) ,
" placeholder should mention path: {text} "
) ;
}
other = > panic! ( " expected placeholder text but found {other:?} " ) ,
}
}
other = > panic! ( " expected message response but got {other:?} " ) ,
}
Ok ( ( ) )
}
2025-12-10 02:28:41 +08:00
#[ test ]
fn local_image_unsupported_image_format_adds_placeholder ( ) -> Result < ( ) > {
let dir = tempdir ( ) ? ;
let svg_path = dir . path ( ) . join ( " example.svg " ) ;
std ::fs ::write (
& svg_path ,
br #" <?xml version= " 1.0 " encoding= " UTF - 8 " ?>
< svg xmlns = " http://www.w3.org/2000/svg " width = " 1 " height = " 1 " > < / svg > " #,
) ? ;
let item = ResponseInputItem ::from ( vec! [ UserInput ::LocalImage {
path : svg_path . clone ( ) ,
} ] ) ;
match item {
ResponseInputItem ::Message { content , .. } = > {
assert_eq! ( content . len ( ) , 1 ) ;
let expected = format! (
" Codex cannot attach image at `{}`: unsupported image format `image/svg+xml`. " ,
svg_path . display ( )
) ;
match & content [ 0 ] {
ContentItem ::InputText { text } = > assert_eq! ( text , & expected ) ,
other = > panic! ( " expected placeholder text but found {other:?} " ) ,
}
}
other = > panic! ( " expected message response but got {other:?} " ) ,
}
Ok ( ( ) )
}
feat: initial import of Rust implementation of Codex CLI in codex-rs/ (#629)
As stated in `codex-rs/README.md`:
Today, Codex CLI is written in TypeScript and requires Node.js 22+ to
run it. For a number of users, this runtime requirement inhibits
adoption: they would be better served by a standalone executable. As
maintainers, we want Codex to run efficiently in a wide range of
environments with minimal overhead. We also want to take advantage of
operating system-specific APIs to provide better sandboxing, where
possible.
To that end, we are moving forward with a Rust implementation of Codex
CLI contained in this folder, which has the following benefits:
- The CLI compiles to small, standalone, platform-specific binaries.
- Can make direct, native calls to
[seccomp](https://man7.org/linux/man-pages/man2/seccomp.2.html) and
[landlock](https://man7.org/linux/man-pages/man7/landlock.7.html) in
order to support sandboxing on Linux.
- No runtime garbage collection, resulting in lower memory consumption
and better, more predictable performance.
Currently, the Rust implementation is materially behind the TypeScript
implementation in functionality, so continue to use the TypeScript
implmentation for the time being. We will publish native executables via
GitHub Releases as soon as we feel the Rust version is usable.
2025-04-24 13:31:40 -07:00
}