core-agent-ide/codex-rs/protocol/src/protocol.rs

2702 lines
90 KiB
Rust
Raw Normal View History

//! Defines the protocol for a Codex session between a client and an agent.
//!
//! Uses a SQ (Submission Queue) / EQ (Event Queue) pattern to asynchronously communicate
//! between user and agent.
use std::collections::HashMap;
use std::ffi::OsStr;
use std::fmt;
use std::path::Path;
use std::path::PathBuf;
use std::str::FromStr;
use std::time::Duration;
use crate::ThreadId;
use crate::approvals::ElicitationRequestEvent;
use crate::config_types::CollaborationMode;
Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - **Normal text** → `AgentMessageContentDelta` - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
2026-01-30 10:59:30 -08:00
use crate::config_types::ModeKind;
use crate::config_types::Personality;
use crate::config_types::ReasoningSummary as ReasoningSummaryConfig;
use crate::config_types::WindowsSandboxLevel;
use crate::custom_prompts::CustomPrompt;
use crate::dynamic_tools::DynamicToolCallRequest;
use crate::dynamic_tools::DynamicToolResponse;
use crate::dynamic_tools::DynamicToolSpec;
use crate::items::TurnItem;
use crate::message_history::HistoryEntry;
fix(core) Preserve base_instructions in SessionMeta (#9427) ## Summary This PR consolidates base_instructions onto SessionMeta / SessionConfiguration, so we ensure `base_instructions` is set once per session and should be (mostly) immutable, unless: - overridden by config on resume / fork - sub-agent tasks, like review or collab In a future PR, we should convert all references to `base_instructions` to consistently used the typed struct, so it's less likely that we put other strings there. See #9423. However, this PR is already quite complex, so I'm deferring that to a follow-up. ## Testing - [x] Added a resume test to assert that instructions are preserved. In particular, `resume_switches_models_preserves_base_instructions` fails against main. Existing test coverage thats assert base instructions are preserved across multiple requests in a session: - Manual compact keeps baseline instructions: core/tests/suite/compact.rs:199 - Auto-compact keeps baseline instructions: core/tests/suite/compact.rs:1142 - Prompt caching reuses the same instructions across two requests: core/tests/suite/prompt_caching.rs:150 and core/tests/suite/prompt_caching.rs:157 - Prompt caching with explicit expected string across two requests: core/tests/suite/prompt_caching.rs:213 and core/tests/suite/prompt_caching.rs:222 - Resume with model switch keeps original instructions: core/tests/suite/resume.rs:136 - Compact/resume/fork uses request 0 instructions for later expected payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
use crate::models::BaseInstructions;
use crate::models::ContentItem;
use crate::models::ResponseItem;
use crate::models::WebSearchAction;
use crate::num_format::format_with_separators;
use crate::openai_models::ReasoningEffort as ReasoningEffortConfig;
use crate::parse_command::ParsedCommand;
use crate::plan_tool::UpdatePlanArgs;
Feat: request user input tool (#9472) ### Summary * Add `requestUserInput` tool that the model can use for gather feedback/asking question mid turn. ### Tool input schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput input", "type": "object", "additionalProperties": false, "required": ["questions"], "properties": { "questions": { "type": "array", "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.", "minItems": 1, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["id", "header", "question"], "properties": { "id": { "type": "string", "description": "Stable identifier for mapping answers (snake_case)." }, "header": { "type": "string", "description": "Short header label shown in the UI (12 or fewer chars)." }, "question": { "type": "string", "description": "Single-sentence prompt shown to the user." }, "options": { "type": "array", "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.", "minItems": 2, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["value", "label", "description"], "properties": { "value": { "type": "string", "description": "Machine-readable value (snake_case)." }, "label": { "type": "string", "description": "User-facing label (1-5 words)." }, "description": { "type": "string", "description": "One short sentence explaining impact/tradeoff if selected." } } } } } } } } } ``` ### Tool output schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput output", "type": "object", "additionalProperties": false, "required": ["answers"], "properties": { "answers": { "type": "object", "description": "Map of question id to user answer.", "additionalProperties": { "type": "object", "additionalProperties": false, "required": ["selected"], "properties": { "selected": { "type": "array", "items": { "type": "string" } }, "other": { "type": ["string", "null"] } } } } } } ```
2026-01-19 10:17:30 -08:00
use crate::request_user_input::RequestUserInputResponse;
use crate::user_input::UserInput;
use codex_utils_absolute_path::AbsolutePathBuf;
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
use mcp_types::CallToolResult;
use mcp_types::RequestId;
use mcp_types::Resource as McpResource;
use mcp_types::ResourceTemplate as McpResourceTemplate;
use mcp_types::Tool as McpTool;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use serde_json::Value;
use serde_with::serde_as;
use strum_macros::Display;
use tracing::error;
2025-08-18 13:08:53 -07:00
use ts_rs::TS;
pub use crate::approvals::ApplyPatchApprovalRequestEvent;
pub use crate::approvals::ElicitationAction;
pub use crate::approvals::ExecApprovalRequestEvent;
pub use crate::approvals::ExecPolicyAmendment;
Feat: request user input tool (#9472) ### Summary * Add `requestUserInput` tool that the model can use for gather feedback/asking question mid turn. ### Tool input schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput input", "type": "object", "additionalProperties": false, "required": ["questions"], "properties": { "questions": { "type": "array", "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.", "minItems": 1, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["id", "header", "question"], "properties": { "id": { "type": "string", "description": "Stable identifier for mapping answers (snake_case)." }, "header": { "type": "string", "description": "Short header label shown in the UI (12 or fewer chars)." }, "question": { "type": "string", "description": "Single-sentence prompt shown to the user." }, "options": { "type": "array", "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.", "minItems": 2, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["value", "label", "description"], "properties": { "value": { "type": "string", "description": "Machine-readable value (snake_case)." }, "label": { "type": "string", "description": "User-facing label (1-5 words)." }, "description": { "type": "string", "description": "One short sentence explaining impact/tradeoff if selected." } } } } } } } } } ``` ### Tool output schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput output", "type": "object", "additionalProperties": false, "required": ["answers"], "properties": { "answers": { "type": "object", "description": "Map of question id to user answer.", "additionalProperties": { "type": "object", "additionalProperties": false, "required": ["selected"], "properties": { "selected": { "type": "array", "items": { "type": "string" } }, "other": { "type": ["string", "null"] } } } } } } ```
2026-01-19 10:17:30 -08:00
pub use crate::request_user_input::RequestUserInputEvent;
/// Open/close tags for special user-input blocks. Used across crates to avoid
/// duplicated hardcoded strings.
pub const USER_INSTRUCTIONS_OPEN_TAG: &str = "<user_instructions>";
pub const USER_INSTRUCTIONS_CLOSE_TAG: &str = "</user_instructions>";
pub const ENVIRONMENT_CONTEXT_OPEN_TAG: &str = "<environment_context>";
pub const ENVIRONMENT_CONTEXT_CLOSE_TAG: &str = "</environment_context>";
pub const COLLABORATION_MODE_OPEN_TAG: &str = "<collaboration_mode>";
pub const COLLABORATION_MODE_CLOSE_TAG: &str = "</collaboration_mode>";
pub const USER_MESSAGE_BEGIN: &str = "## My request for Codex:";
/// Submission Queue Entry - requests from user
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)]
pub struct Submission {
/// Unique id for this Submission to correlate with Events
pub id: String,
/// Payload
pub op: Op,
}
/// Config payload for refreshing MCP servers.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema)]
pub struct McpServerRefreshConfig {
pub mcp_servers: Value,
pub mcp_oauth_credentials_store_mode: Value,
}
/// Submission operation
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema)]
#[serde(tag = "type", rename_all = "snake_case")]
feat: support the chat completions API in the Rust CLI (#862) This is a substantial PR to add support for the chat completions API, which in turn makes it possible to use non-OpenAI model providers (just like in the TypeScript CLI): * It moves a number of structs from `client.rs` to `client_common.rs` so they can be shared. * It introduces support for the chat completions API in `chat_completions.rs`. * It updates `ModelProviderInfo` so that `env_key` is `Option<String>` instead of `String` (for e.g., ollama) and adds a `wire_api` field * It updates `client.rs` to choose between `stream_responses()` and `stream_chat_completions()` based on the `wire_api` for the `ModelProviderInfo` * It updates the `exec` and TUI CLIs to no longer fail if the `OPENAI_API_KEY` environment variable is not set * It updates the TUI so that `EventMsg::Error` is displayed more prominently when it occurs, particularly now that it is important to alert users to the `CodexErr::EnvVar` variant. * `CodexErr::EnvVar` was updated to include an optional `instructions` field so we can preserve the behavior where we direct users to https://platform.openai.com if `OPENAI_API_KEY` is not set. * Cleaned up the "welcome message" in the TUI to ensure the model provider is displayed. * Updated the docs in `codex-rs/README.md`. To exercise the chat completions API from OpenAI models, I added the following to my `config.toml`: ```toml model = "gpt-4o" model_provider = "openai-chat-completions" [model_providers.openai-chat-completions] name = "OpenAI using Chat Completions" base_url = "https://api.openai.com/v1" env_key = "OPENAI_API_KEY" wire_api = "chat" ``` Though to test a non-OpenAI provider, I installed ollama with mistral locally on my Mac because ChatGPT said that would be a good match for my hardware: ```shell brew install ollama ollama serve ollama pull mistral ``` Then I added the following to my `~/.codex/config.toml`: ```toml model = "mistral" model_provider = "ollama" ``` Note this code could certainly use more test coverage, but I want to get this in so folks can start playing with it. For reference, I believe https://github.com/openai/codex/pull/247 was roughly the comparable PR on the TypeScript side.
2025-05-08 21:46:06 -07:00
#[allow(clippy::large_enum_variant)]
#[non_exhaustive]
pub enum Op {
/// Abort current task.
/// This server sends [`EventMsg::TurnAborted`] in response.
Interrupt,
/// Legacy user input.
///
/// Prefer [`Op::UserTurn`] so the caller provides full turn context
/// (cwd/approval/sandbox/model/etc.) for each turn.
UserInput {
/// User input items, see `InputItem`
items: Vec<UserInput>,
feat: expose outputSchema to user_turn/turn_start app_server API (#8377) What changed - Added `outputSchema` support to the app-server APIs, mirroring `codex exec --output-schema` behavior. - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final assistant message for that turn. - V2 `turn/start` now accepts `outputSchema` and constrains the final assistant message for that turn (explicitly per-turn only). Core behavior - `Op::UserTurn` already supported `final_output_json_schema`; now V1 `sendUserTurn` forwards `outputSchema` into that field. - `Op::UserInput` now carries `final_output_json_schema` for per-turn settings updates; core maps it into `SessionSettingsUpdate.final_output_json_schema` so it applies to the created turn context. - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext` (it’s applied only for the current turn). Other overrides (cwd/model/etc) keep their existing persistent behavior. API / docs - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema: Option<serde_json::Value>` to `SendUserTurnParams` (serialized as `outputSchema`). - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema: Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`). - `codex-rs/app-server/README.md`: document `outputSchema` for `turn/start` and clarify it applies only to the current turn. - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1 `sendUserTurn` and v2 `turn/start`. Tests added/updated - New app-server integration tests asserting `outputSchema` is forwarded into outbound `/responses` requests as `text.format`: - `codex-rs/app-server/tests/suite/output_schema.rs` - `codex-rs/app-server/tests/suite/v2/output_schema.rs` - Added per-turn semantics tests (schema does not leak to the next turn): - `send_user_turn_output_schema_is_per_turn_v1` - `turn_start_output_schema_is_per_turn_v2` - Added protocol wire-compat tests for the merged op: - serialize omits `final_output_json_schema` when `None` - deserialize works when field is missing - serialize includes `final_output_json_schema` when `Some(schema)` Call site updates (high level) - Updated all `Op::UserInput { .. }` constructions to include `final_output_json_schema`: - `codex-rs/app-server/src/codex_message_processor.rs` - `codex-rs/core/src/codex_delegate.rs` - `codex-rs/mcp-server/src/codex_tool_runner.rs` - `codex-rs/tui/src/chatwidget.rs` - `codex-rs/tui2/src/chatwidget.rs` - plus impacted core tests. Validation - `just fmt` - `cargo test -p codex-core` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` - `cargo test -p codex-tui` - `cargo test -p codex-tui2` - `cargo test -p codex-protocol` - `cargo clippy --all-features --tests --profile dev --fix -- -D warnings`
2026-01-05 10:27:00 -08:00
/// Optional JSON Schema used to constrain the final assistant message for this turn.
#[serde(skip_serializing_if = "Option::is_none")]
final_output_json_schema: Option<Value>,
},
/// Similar to [`Op::UserInput`], but contains additional context required
/// for a turn of a [`crate::codex_thread::CodexThread`].
UserTurn {
/// User input items, see `InputItem`
items: Vec<UserInput>,
/// `cwd` to use with the [`SandboxPolicy`] and potentially tool calls
/// such as `local_shell`.
cwd: PathBuf,
/// Policy to use for command approval.
approval_policy: AskForApproval,
/// Policy to use for tool calls such as `local_shell`.
sandbox_policy: SandboxPolicy,
/// Must be a valid model slug for the configured client session
/// associated with this conversation.
model: String,
/// Will only be honored if the model is configured to use reasoning.
#[serde(skip_serializing_if = "Option::is_none")]
effort: Option<ReasoningEffortConfig>,
/// Will only be honored if the model is configured to use reasoning.
summary: ReasoningSummaryConfig,
// The JSON schema to use for the final assistant message
final_output_json_schema: Option<Value>,
/// EXPERIMENTAL - set a pre-set collaboration mode.
/// Takes precedence over model, effort, and developer instructions if set.
#[serde(skip_serializing_if = "Option::is_none")]
collaboration_mode: Option<CollaborationMode>,
/// Optional personality override for this turn.
#[serde(skip_serializing_if = "Option::is_none")]
personality: Option<Personality>,
},
/// Override parts of the persistent turn context for subsequent turns.
///
/// All fields are optional; when omitted, the existing value is preserved.
/// This does not enqueue any input it only updates defaults used for
/// turns that rely on persistent session-level context (for example,
/// [`Op::UserInput`]).
OverrideTurnContext {
/// Updated `cwd` for sandbox/tool calls.
#[serde(skip_serializing_if = "Option::is_none")]
cwd: Option<PathBuf>,
/// Updated command approval policy.
#[serde(skip_serializing_if = "Option::is_none")]
approval_policy: Option<AskForApproval>,
/// Updated sandbox policy for tool calls.
#[serde(skip_serializing_if = "Option::is_none")]
sandbox_policy: Option<SandboxPolicy>,
/// Updated Windows sandbox mode for tool execution.
#[serde(skip_serializing_if = "Option::is_none")]
windows_sandbox_level: Option<WindowsSandboxLevel>,
/// Updated model slug. When set, the model info is derived
/// automatically.
#[serde(skip_serializing_if = "Option::is_none")]
model: Option<String>,
/// Updated reasoning effort (honored only for reasoning-capable models).
///
/// Use `Some(Some(_))` to set a specific effort, `Some(None)` to clear
/// the effort, or `None` to leave the existing value unchanged.
#[serde(skip_serializing_if = "Option::is_none")]
effort: Option<Option<ReasoningEffortConfig>>,
/// Updated reasoning summary preference (honored only for reasoning-capable models).
#[serde(skip_serializing_if = "Option::is_none")]
summary: Option<ReasoningSummaryConfig>,
/// EXPERIMENTAL - set a pre-set collaboration mode.
/// Takes precedence over model, effort, and developer instructions if set.
#[serde(skip_serializing_if = "Option::is_none")]
collaboration_mode: Option<CollaborationMode>,
/// Updated personality preference.
#[serde(skip_serializing_if = "Option::is_none")]
personality: Option<Personality>,
},
/// Approve a command execution
ExecApproval {
/// The id of the submission we are approving
id: String,
/// The user's decision in response to the request.
decision: ReviewDecision,
},
/// Approve a code patch
PatchApproval {
/// The id of the submission we are approving
id: String,
/// The user's decision in response to the request.
decision: ReviewDecision,
},
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Resolve an MCP elicitation request.
ResolveElicitation {
/// Name of the MCP server that issued the request.
server_name: String,
/// Request identifier from the MCP server.
request_id: RequestId,
/// User's decision for the request.
decision: ElicitationAction,
},
Feat: request user input tool (#9472) ### Summary * Add `requestUserInput` tool that the model can use for gather feedback/asking question mid turn. ### Tool input schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput input", "type": "object", "additionalProperties": false, "required": ["questions"], "properties": { "questions": { "type": "array", "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.", "minItems": 1, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["id", "header", "question"], "properties": { "id": { "type": "string", "description": "Stable identifier for mapping answers (snake_case)." }, "header": { "type": "string", "description": "Short header label shown in the UI (12 or fewer chars)." }, "question": { "type": "string", "description": "Single-sentence prompt shown to the user." }, "options": { "type": "array", "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.", "minItems": 2, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["value", "label", "description"], "properties": { "value": { "type": "string", "description": "Machine-readable value (snake_case)." }, "label": { "type": "string", "description": "User-facing label (1-5 words)." }, "description": { "type": "string", "description": "One short sentence explaining impact/tradeoff if selected." } } } } } } } } } ``` ### Tool output schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput output", "type": "object", "additionalProperties": false, "required": ["answers"], "properties": { "answers": { "type": "object", "description": "Map of question id to user answer.", "additionalProperties": { "type": "object", "additionalProperties": false, "required": ["selected"], "properties": { "selected": { "type": "array", "items": { "type": "string" } }, "other": { "type": ["string", "null"] } } } } } } ```
2026-01-19 10:17:30 -08:00
/// Resolve a request_user_input tool call.
#[serde(rename = "user_input_answer", alias = "request_user_input_response")]
UserInputAnswer {
/// Turn id for the in-flight request.
id: String,
/// User-provided answers.
response: RequestUserInputResponse,
},
/// Resolve a dynamic tool call request.
DynamicToolResponse {
/// Call id for the in-flight request.
id: String,
/// Tool output payload.
response: DynamicToolResponse,
},
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Append an entry to the persistent cross-session message history.
///
/// Note the entry is not guaranteed to be logged if the user has
/// history disabled, it matches the list of "sensitive" patterns, etc.
AddToHistory {
/// The message text to be stored.
text: String,
},
/// Request a single history entry identified by `log_id` + `offset`.
GetHistoryEntryRequest { offset: usize, log_id: u64 },
/// Request the list of MCP tools available across all configured servers.
/// Reply is delivered via `EventMsg::McpListToolsResponse`.
ListMcpTools,
/// Request MCP servers to reinitialize and refresh cached tool lists.
RefreshMcpServers { config: McpServerRefreshConfig },
/// Request the list of available custom prompts.
ListCustomPrompts,
/// Request the list of skills for the provided `cwd` values or the session default.
ListSkills {
/// Working directories to scope repo skills discovery.
///
/// When empty, the session default working directory is used.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
cwds: Vec<PathBuf>,
/// When true, recompute skills even if a cached result exists.
#[serde(default, skip_serializing_if = "std::ops::Not::not")]
force_reload: bool,
},
/// Request the agent to summarize the current conversation context.
/// The agent will use its existing context (either conversation history or previous response id)
/// to generate a summary which will be returned as an AgentMessage event.
Compact,
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
/// Set a user-facing thread name in the persisted rollout metadata.
/// This is a local-only operation handled by codex-core; it does not
/// involve the model.
SetThreadName { name: String },
2025-10-27 10:55:29 +00:00
/// Request Codex to undo a turn (turn are stacked so it is the same effect as CMD + Z).
Undo,
feat(app-server): thread/rollback API (#8454) Add `thread/rollback` to app-server to support IDEs undo-ing the last N turns of a thread. For context, an IDE partner will be supporting an "undo" capability where the IDE (the app-server client) will be responsible for reverting the local changes made during the last turn. To support this well, we also need a way to drop the last turn (or more generally, the last N turns) from the agent's context. This is what `thread/rollback` does. **Core idea**: A Thread rollback is represented as a persisted event message (EventMsg::ThreadRollback) in the rollout JSONL file, not by rewriting history. On resume, both the model's context (core replay) and the UI turn list (app-server v2's thread history builder) apply these markers so the pruned history is consistent across live conversations and `thread/resume`. Implementation notes: - Rollback only affects agent context and appends to the rollout file; clients are responsible for reverting files on disk. - If a thread rollback is currently in progress, subsequent `thread/rollback` calls are rejected. - Because we use `CodexConversation::submit` and codex core tracks active turns, returning an error on concurrent rollbacks is communicated via an `EventMsg::Error` with a new variant `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and sends the BAD_REQUEST RPC response. Tests cover thread rollbacks in both core and app-server, including when `num_turns` > existing turns (which clears all turns). **Note**: this explicitly does **not** behave like `/undo` which we just removed from the CLI, which does the opposite of what `thread/rollback` does. `/undo` reverts local changes via ghost commits/snapshots and does not modify the agent's context / conversation history.
2026-01-06 13:23:48 -08:00
/// Request Codex to drop the last N user turns from in-memory context.
///
/// This does not attempt to revert local filesystem changes. Clients are
/// responsible for undoing any edits on disk.
ThreadRollback { num_turns: u32 },
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
/// Request a code review from the agent.
Review { review_request: ReviewRequest },
/// Request to shut down codex instance.
Shutdown,
feature: Add "!cmd" user shell execution (#2471) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). **Summary of changes** - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. **End-to-end flow** **TUI handling** 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. **Core submission loop** 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. **Task execution** 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. **Shell tool runtime** 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. **Completion** 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. **Conversation context** - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. **Demo video** https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
2025-10-29 00:31:20 -07:00
/// Execute a user-initiated one-off shell command (triggered by "!cmd").
///
/// The command string is executed using the user's default shell and may
/// include shell syntax (pipes, redirects, etc.). Output is streamed via
2026-01-09 17:31:17 +00:00
/// `ExecCommand*` events and the UI regains control upon `TurnComplete`.
feature: Add "!cmd" user shell execution (#2471) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). **Summary of changes** - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. **End-to-end flow** **TUI handling** 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. **Core submission loop** 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. **Task execution** 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. **Shell tool runtime** 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. **Completion** 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. **Conversation context** - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. **Demo video** https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
2025-10-29 00:31:20 -07:00
RunUserShellCommand {
/// The raw command string after '!'
command: String,
},
/// Request the list of available models.
ListModels,
}
/// Determines the conditions under which the user is consulted to approve
/// running the command proposed by Codex.
#[derive(
Debug,
Clone,
Copy,
Default,
PartialEq,
Eq,
Hash,
Serialize,
Deserialize,
Display,
JsonSchema,
TS,
)]
#[serde(rename_all = "kebab-case")]
#[strum(serialize_all = "kebab-case")]
pub enum AskForApproval {
/// Under this policy, only "known safe" commands—as determined by
/// `is_safe_command()`—that **only read files** are autoapproved.
/// Everything else will ask the user to approve.
#[serde(rename = "untrusted")]
#[strum(serialize = "untrusted")]
UnlessTrusted,
/// *All* commands are autoapproved, but they are expected to run inside a
/// sandbox where network access is disabled and writes are confined to a
/// specific set of paths. If the command fails, it will be escalated to
/// the user to approve execution without a sandbox.
OnFailure,
/// The model decides when to ask the user for approval.
#[default]
OnRequest,
/// Never ask the user to approve commands. Failures are immediately returned
/// to the model, and never escalated to the user for approval.
Never,
}
/// Represents whether outbound network access is available to the agent.
#[derive(
Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Display, Default, JsonSchema, TS,
)]
#[serde(rename_all = "kebab-case")]
#[strum(serialize_all = "kebab-case")]
pub enum NetworkAccess {
#[default]
Restricted,
Enabled,
}
impl NetworkAccess {
pub fn is_enabled(self) -> bool {
matches!(self, NetworkAccess::Enabled)
}
}
/// Determines execution restrictions for model shell commands.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Display, JsonSchema, TS)]
#[strum(serialize_all = "kebab-case")]
#[serde(tag = "type", rename_all = "kebab-case")]
pub enum SandboxPolicy {
/// No restrictions whatsoever. Use with caution.
#[serde(rename = "danger-full-access")]
DangerFullAccess,
/// Read-only access to the entire file-system.
#[serde(rename = "read-only")]
ReadOnly,
/// Indicates the process is already in an external sandbox. Allows full
/// disk access while honoring the provided network setting.
#[serde(rename = "external-sandbox")]
ExternalSandbox {
/// Whether the external sandbox permits outbound network traffic.
#[serde(default)]
network_access: NetworkAccess,
},
/// Same as `ReadOnly` but additionally grants write access to the current
/// working directory ("workspace").
#[serde(rename = "workspace-write")]
WorkspaceWrite {
/// Additional folders (beyond cwd and possibly TMPDIR) that should be
/// writable from within the sandbox.
#[serde(default, skip_serializing_if = "Vec::is_empty")]
writable_roots: Vec<AbsolutePathBuf>,
/// When set to `true`, outbound network access is allowed. `false` by
/// default.
#[serde(default)]
network_access: bool,
/// When set to `true`, will NOT include the per-user `TMPDIR`
/// environment variable among the default writable roots. Defaults to
/// `false`.
#[serde(default)]
exclude_tmpdir_env_var: bool,
/// When set to `true`, will NOT include the `/tmp` among the default
/// writable roots on UNIX. Defaults to `false`.
#[serde(default)]
exclude_slash_tmp: bool,
},
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
/// A writable root path accompanied by a list of subpaths that should remain
/// readonly even when the root is writable. This is primarily used to ensure
/// that folders containing files that could be modified to escalate the
/// privileges of the agent (e.g. `.codex`, `.git`, notably `.git/hooks`) under
/// a writable root are not modified by the agent.
#[derive(Debug, Clone, PartialEq, Eq, JsonSchema)]
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
pub struct WritableRoot {
pub root: AbsolutePathBuf,
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
/// By construction, these subpaths are all under `root`.
pub read_only_subpaths: Vec<AbsolutePathBuf>,
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
}
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
impl WritableRoot {
pub fn is_path_writable(&self, path: &Path) -> bool {
fix: tighten up checks against writable folders for SandboxPolicy (#2338) I was looking at the implementation of `Session::get_writable_roots()`, which did not seem right, as it was a copy of writable roots, which is not guaranteed to be in sync with the `sandbox_policy` field. I looked at who was calling `get_writable_roots()` and its only call site was `apply_patch()` in `codex-rs/core/src/apply_patch.rs`, which took the roots and forwarded them to `assess_patch_safety()` in `safety.rs`. I updated `assess_patch_safety()` to take `sandbox_policy: &SandboxPolicy` instead of `writable_roots: &[PathBuf]` (and replaced `Session::get_writable_roots()` with `Session::get_sandbox_policy()`). Within `safety.rs`, it was fairly easy to update `is_write_patch_constrained_to_writable_paths()` to work with `SandboxPolicy`, and in particular, it is far more accurate because, for better or worse, `SandboxPolicy::get_writable_roots_with_cwd()` _returns an empty vec_ for `SandboxPolicy::DangerFullAccess`, suggesting that _nothing_ is writable when in reality _everything_ is writable. With this PR, `is_write_patch_constrained_to_writable_paths()` now does the right thing for each variant of `SandboxPolicy`. I thought this would be the end of the story, but it turned out that `test_writable_roots_constraint()` in `safety.rs` needed to be updated, as well. In particular, the test was writing to `std::env::current_dir()` instead of a `TempDir`, which I suspect was a holdover from earlier when `SandboxPolicy::WorkspaceWrite` would always make `TMPDIR` writable on macOS, which made it hard to write tests to verify `SandboxPolicy` in `TMPDIR`. Fortunately, we now have `exclude_tmpdir_env_var` as an option on `SandboxPolicy::WorkspaceWrite`, so I was able to update the test to preserve the existing behavior, but to no longer write to `std::env::current_dir()`. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/2338). * #2345 * #2329 * #2343 * #2340 * __->__ #2338
2025-08-15 09:06:15 -07:00
// Check if the path is under the root.
if !path.starts_with(&self.root) {
return false;
}
// Check if the path is under any of the read-only subpaths.
for subpath in &self.read_only_subpaths {
if path.starts_with(subpath) {
return false;
}
}
true
}
}
impl FromStr for SandboxPolicy {
type Err = serde_json::Error;
fn from_str(s: &str) -> Result<Self, Self::Err> {
serde_json::from_str(s)
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
}
impl SandboxPolicy {
/// Returns a policy with read-only disk access and no network.
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
pub fn new_read_only_policy() -> Self {
SandboxPolicy::ReadOnly
}
/// Returns a policy that can read the entire disk, but can only write to
/// the current working directory and the per-user tmp dir on macOS. It does
/// not allow network access.
pub fn new_workspace_write_policy() -> Self {
SandboxPolicy::WorkspaceWrite {
writable_roots: vec![],
network_access: false,
exclude_tmpdir_env_var: false,
exclude_slash_tmp: false,
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
}
/// Always returns `true`; restricting read access is not supported.
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
pub fn has_full_disk_read_access(&self) -> bool {
true
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
pub fn has_full_disk_write_access(&self) -> bool {
match self {
SandboxPolicy::DangerFullAccess => true,
SandboxPolicy::ExternalSandbox { .. } => true,
SandboxPolicy::ReadOnly => false,
SandboxPolicy::WorkspaceWrite { .. } => false,
}
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
pub fn has_full_network_access(&self) -> bool {
match self {
SandboxPolicy::DangerFullAccess => true,
SandboxPolicy::ExternalSandbox { network_access } => network_access.is_enabled(),
SandboxPolicy::ReadOnly => false,
SandboxPolicy::WorkspaceWrite { network_access, .. } => *network_access,
}
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
/// Returns the list of writable roots (tailored to the current working
/// directory) together with subpaths that should remain readonly under
/// each writable root.
pub fn get_writable_roots_with_cwd(&self, cwd: &Path) -> Vec<WritableRoot> {
match self {
SandboxPolicy::DangerFullAccess => Vec::new(),
SandboxPolicy::ExternalSandbox { .. } => Vec::new(),
SandboxPolicy::ReadOnly => Vec::new(),
SandboxPolicy::WorkspaceWrite {
writable_roots,
exclude_tmpdir_env_var,
exclude_slash_tmp,
network_access: _,
} => {
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
// Start from explicitly configured writable roots.
let mut roots: Vec<AbsolutePathBuf> = writable_roots.clone();
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
// Always include defaults: cwd, /tmp (if present on Unix), and
// on macOS, the per-user TMPDIR unless explicitly excluded.
// TODO(mbolin): cwd param should be AbsolutePathBuf.
let cwd_absolute = AbsolutePathBuf::from_absolute_path(cwd);
match cwd_absolute {
Ok(cwd) => {
roots.push(cwd);
}
Err(e) => {
error!(
"Ignoring invalid cwd {:?} for sandbox writable root: {}",
cwd, e
);
}
}
// Include /tmp on Unix unless explicitly excluded.
if cfg!(unix) && !exclude_slash_tmp {
#[allow(clippy::expect_used)]
let slash_tmp =
AbsolutePathBuf::from_absolute_path("/tmp").expect("/tmp is absolute");
if slash_tmp.as_path().is_dir() {
roots.push(slash_tmp);
}
}
// Include $TMPDIR unless explicitly excluded. On macOS, TMPDIR
// is per-user, so writes to TMPDIR should not be readable by
// other users on the system.
//
// By comparison, TMPDIR is not guaranteed to be defined on
// Linux or Windows, but supporting it here gives users a way to
// provide the model with their own temporary directory without
// having to hardcode it in the config.
if !exclude_tmpdir_env_var
&& let Some(tmpdir) = std::env::var_os("TMPDIR")
&& !tmpdir.is_empty()
{
match AbsolutePathBuf::from_absolute_path(PathBuf::from(&tmpdir)) {
Ok(tmpdir_path) => {
roots.push(tmpdir_path);
}
Err(e) => {
error!(
"Ignoring invalid TMPDIR value {tmpdir:?} for sandbox writable root: {e}",
);
}
}
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
// For each root, compute subpaths that should remain read-only.
roots
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
.into_iter()
.map(|writable_root| {
let mut subpaths: Vec<AbsolutePathBuf> = Vec::new();
#[allow(clippy::expect_used)]
let top_level_git = writable_root
.join(".git")
.expect(".git is a valid relative path");
// This applies to typical repos (directory .git), worktrees/submodules
// (file .git with gitdir pointer), and bare repos when the gitdir is the
// writable root itself.
let top_level_git_is_file = top_level_git.as_path().is_file();
let top_level_git_is_dir = top_level_git.as_path().is_dir();
if top_level_git_is_dir || top_level_git_is_file {
if top_level_git_is_file
&& is_git_pointer_file(&top_level_git)
&& let Some(gitdir) = resolve_gitdir_from_file(&top_level_git)
&& !subpaths
.iter()
.any(|subpath| subpath.as_path() == gitdir.as_path())
{
subpaths.push(gitdir);
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
subpaths.push(top_level_git);
}
#[allow(clippy::expect_used)]
let top_level_codex = writable_root
.join(".codex")
.expect(".codex is a valid relative path");
if top_level_codex.as_path().is_dir() {
subpaths.push(top_level_codex);
}
feat: make .git read-only within a writable root when using Seatbelt (#1765) To make `--full-auto` safer, this PR updates the Seatbelt policy so that a `SandboxPolicy` with a `writable_root` that contains a `.git/` _directory_ will make `.git/` _read-only_ (though as a follow-up, we should also consider the case where `.git` is a _file_ with a `gitdir: /path/to/actual/repo/.git` entry that should also be protected). The two major changes in this PR: - Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a `Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot` can specify a list of read-only subpaths. - Updating `create_seatbelt_command_args()` to honor the read-only subpaths in `WritableRoot`. The logic to update the policy is a fairly straightforward update to `create_seatbelt_command_args()`, but perhaps the more interesting part of this PR is the introduction of an integration test in `tests/sandbox.rs`. Leveraging the new API in #1785, we test `SandboxPolicy` under various conditions, including ones where `$TMPDIR` is not readable, which is critical for verifying the new behavior. To ensure that Codex can run its own tests, e.g.: ``` just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only ``` I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being used. Adding a comparable change for Landlock will be done in a subsequent PR.
2025-08-01 16:11:24 -07:00
WritableRoot {
root: writable_root,
read_only_subpaths: subpaths,
}
})
.collect()
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
}
}
}
}
fix: overhaul SandboxPolicy and config loading in Rust (#732) Previous to this PR, `SandboxPolicy` was a bit difficult to work with: https://github.com/openai/codex/blob/237f8a11e11fdcc793a09e787e48215676d9b95b/codex-rs/core/src/protocol.rs#L98-L108 Specifically: * It was an `enum` and therefore options were mutually exclusive as opposed to additive. * It defined things in terms of what the agent _could not_ do as opposed to what they _could_ do. This made things hard to support because we would prefer to build up a sandbox config by starting with something extremely restrictive and only granting permissions for things the user as explicitly allowed. This PR changes things substantially by redefining the policy in terms of two concepts: * A `SandboxPermission` enum that defines permissions that can be granted to the agent/sandbox. * A `SandboxPolicy` that internally stores a `Vec<SandboxPermission>`, but externally exposes a simpler API that can be used to configure Seatbelt/Landlock. Previous to this PR, we supported a `--sandbox` flag that effectively mapped to an enum value in `SandboxPolicy`. Though now that `SandboxPolicy` is a wrapper around `Vec<SandboxPermission>`, the single `--sandbox` flag no longer makes sense. While I could have turned it into a flag that the user can specify multiple times, I think the current values to use with such a flag are long and potentially messy, so for the moment, I have dropped support for `--sandbox` altogether and we can bring it back once we have figured out the naming thing. Since `--sandbox` is gone, users now have to specify `--full-auto` to get a sandbox that allows writes in `cwd`. Admittedly, there is no clean way to specify the equivalent of `--full-auto` in your `config.toml` right now, so we will have to revisit that, as well. Because `Config` presents a `SandboxPolicy` field and `SandboxPolicy` changed considerably, I had to overhaul how config loading works, as well. There are now two distinct concepts, `ConfigToml` and `Config`: * `ConfigToml` is the deserialization of `~/.codex/config.toml`. As one might expect, every field is `Optional` and it is `#[derive(Deserialize, Default)]`. Consistent use of `Optional` makes it clear what the user has specified explicitly. * `Config` is the "normalized config" and is produced by merging `ConfigToml` with `ConfigOverrides`. Where `ConfigToml` contains a raw `Option<Vec<SandboxPermission>>`, `Config` presents only the final `SandboxPolicy`. The changes to `core/src/exec.rs` and `core/src/linux.rs` merit extra special attention to ensure we are faithfully mapping the `SandboxPolicy` to the Seatbelt and Landlock configs, respectively. Also, take note that `core/src/seatbelt_readonly_policy.sbpl` has been renamed to `codex-rs/core/src/seatbelt_base_policy.sbpl` and that `(allow file-read*)` has been removed from the `.sbpl` file as now this is added to the policy in `core/src/exec.rs` when `sandbox_policy.has_full_disk_read_access()` is `true`.
2025-04-29 15:01:16 -07:00
fn is_git_pointer_file(path: &AbsolutePathBuf) -> bool {
path.as_path().is_file() && path.as_path().file_name() == Some(OsStr::new(".git"))
}
fn resolve_gitdir_from_file(dot_git: &AbsolutePathBuf) -> Option<AbsolutePathBuf> {
let contents = match std::fs::read_to_string(dot_git.as_path()) {
Ok(contents) => contents,
Err(err) => {
error!(
"Failed to read {path} for gitdir pointer: {err}",
path = dot_git.as_path().display()
);
return None;
}
};
let trimmed = contents.trim();
let (_, gitdir_raw) = match trimmed.split_once(':') {
Some(parts) => parts,
None => {
error!(
"Expected {path} to contain a gitdir pointer, but it did not match `gitdir: <path>`.",
path = dot_git.as_path().display()
);
return None;
}
};
let gitdir_raw = gitdir_raw.trim();
if gitdir_raw.is_empty() {
error!(
"Expected {path} to contain a gitdir pointer, but it was empty.",
path = dot_git.as_path().display()
);
return None;
}
let base = match dot_git.as_path().parent() {
Some(base) => base,
None => {
error!(
"Unable to resolve parent directory for {path}.",
path = dot_git.as_path().display()
);
return None;
}
};
let gitdir_path = match AbsolutePathBuf::resolve_path_against_base(gitdir_raw, base) {
Ok(path) => path,
Err(err) => {
error!(
"Failed to resolve gitdir path {gitdir_raw} from {path}: {err}",
path = dot_git.as_path().display()
);
return None;
}
};
if !gitdir_path.as_path().exists() {
error!(
"Resolved gitdir path {path} does not exist.",
path = gitdir_path.as_path().display()
);
return None;
}
Some(gitdir_path)
}
/// Event Queue Entry - events from agent
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct Event {
/// Submission `id` that this event is correlated with.
pub id: String,
/// Payload
pub msg: EventMsg,
}
/// Response event from the agent
2025-09-14 17:34:33 -07:00
/// NOTE: Make sure none of these values have optional types, as it will mess up the extension code-gen.
#[derive(Debug, Clone, Deserialize, Serialize, Display, JsonSchema, TS)]
#[serde(tag = "type", rename_all = "snake_case")]
#[ts(tag = "type")]
#[strum(serialize_all = "snake_case")]
pub enum EventMsg {
/// Error while executing a submission
Error(ErrorEvent),
/// Warning issued while processing a submission. Unlike `Error`, this
2026-01-09 17:31:17 +00:00
/// indicates the turn continued but the user should still be notified.
Warning(WarningEvent),
2025-11-25 16:12:14 +00:00
/// Conversation history was compacted (either automatically or manually).
ContextCompacted(ContextCompactedEvent),
feat(app-server): thread/rollback API (#8454) Add `thread/rollback` to app-server to support IDEs undo-ing the last N turns of a thread. For context, an IDE partner will be supporting an "undo" capability where the IDE (the app-server client) will be responsible for reverting the local changes made during the last turn. To support this well, we also need a way to drop the last turn (or more generally, the last N turns) from the agent's context. This is what `thread/rollback` does. **Core idea**: A Thread rollback is represented as a persisted event message (EventMsg::ThreadRollback) in the rollout JSONL file, not by rewriting history. On resume, both the model's context (core replay) and the UI turn list (app-server v2's thread history builder) apply these markers so the pruned history is consistent across live conversations and `thread/resume`. Implementation notes: - Rollback only affects agent context and appends to the rollout file; clients are responsible for reverting files on disk. - If a thread rollback is currently in progress, subsequent `thread/rollback` calls are rejected. - Because we use `CodexConversation::submit` and codex core tracks active turns, returning an error on concurrent rollbacks is communicated via an `EventMsg::Error` with a new variant `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and sends the BAD_REQUEST RPC response. Tests cover thread rollbacks in both core and app-server, including when `num_turns` > existing turns (which clears all turns). **Note**: this explicitly does **not** behave like `/undo` which we just removed from the CLI, which does the opposite of what `thread/rollback` does. `/undo` reverts local changes via ghost commits/snapshots and does not modify the agent's context / conversation history.
2026-01-06 13:23:48 -08:00
/// Conversation history was rolled back by dropping the last N user turns.
ThreadRolledBack(ThreadRolledBackEvent),
2026-01-09 17:31:17 +00:00
/// Agent has started a turn.
/// v1 wire format uses `task_started`; accept `turn_started` for v2 interop.
#[serde(rename = "task_started", alias = "turn_started")]
TurnStarted(TurnStartedEvent),
2026-01-09 17:31:17 +00:00
/// Agent has completed all actions.
/// v1 wire format uses `task_complete`; accept `turn_complete` for v2 interop.
#[serde(rename = "task_complete", alias = "turn_complete")]
TurnComplete(TurnCompleteEvent),
/// Usage update for the current session, including totals and last turn.
/// Optional means unknown — UIs should not display when `None`.
TokenCount(TokenCountEvent),
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
/// Agent text output message
AgentMessage(AgentMessageEvent),
/// User/system input message (what was sent to the model)
UserMessage(UserMessageEvent),
/// Agent text output delta message
AgentMessageDelta(AgentMessageDeltaEvent),
/// Reasoning event from agent.
AgentReasoning(AgentReasoningEvent),
/// Agent reasoning delta event from agent.
AgentReasoningDelta(AgentReasoningDeltaEvent),
/// Raw chain-of-thought from agent.
AgentReasoningRawContent(AgentReasoningRawContentEvent),
/// Agent reasoning content delta event from agent.
AgentReasoningRawContentDelta(AgentReasoningRawContentDeltaEvent),
/// Signaled when the model begins a new reasoning summary section (e.g., a new titled block).
AgentReasoningSectionBreak(AgentReasoningSectionBreakEvent),
/// Ack the client's configure message.
SessionConfigured(SessionConfiguredEvent),
/// Updated session metadata (e.g., thread name changes).
ThreadNameUpdated(ThreadNameUpdatedEvent),
/// Incremental MCP startup progress updates.
McpStartupUpdate(McpStartupUpdateEvent),
/// Aggregate MCP startup completion summary.
McpStartupComplete(McpStartupCompleteEvent),
McpToolCallBegin(McpToolCallBeginEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
McpToolCallEnd(McpToolCallEndEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
WebSearchBegin(WebSearchBeginEvent),
WebSearchEnd(WebSearchEndEvent),
/// Notification that the server is about to execute a command.
ExecCommandBegin(ExecCommandBeginEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
/// Incremental chunk of output from a running command.
ExecCommandOutputDelta(ExecCommandOutputDeltaEvent),
/// Terminal interaction for an in-progress command (stdin sent and stdout observed).
TerminalInteraction(TerminalInteractionEvent),
ExecCommandEnd(ExecCommandEndEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
/// Notification that the agent attached a local image via the view_image tool.
ViewImageToolCall(ViewImageToolCallEvent),
ExecApprovalRequest(ExecApprovalRequestEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
Feat: request user input tool (#9472) ### Summary * Add `requestUserInput` tool that the model can use for gather feedback/asking question mid turn. ### Tool input schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput input", "type": "object", "additionalProperties": false, "required": ["questions"], "properties": { "questions": { "type": "array", "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.", "minItems": 1, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["id", "header", "question"], "properties": { "id": { "type": "string", "description": "Stable identifier for mapping answers (snake_case)." }, "header": { "type": "string", "description": "Short header label shown in the UI (12 or fewer chars)." }, "question": { "type": "string", "description": "Single-sentence prompt shown to the user." }, "options": { "type": "array", "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.", "minItems": 2, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["value", "label", "description"], "properties": { "value": { "type": "string", "description": "Machine-readable value (snake_case)." }, "label": { "type": "string", "description": "User-facing label (1-5 words)." }, "description": { "type": "string", "description": "One short sentence explaining impact/tradeoff if selected." } } } } } } } } } ``` ### Tool output schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput output", "type": "object", "additionalProperties": false, "required": ["answers"], "properties": { "answers": { "type": "object", "description": "Map of question id to user answer.", "additionalProperties": { "type": "object", "additionalProperties": false, "required": ["selected"], "properties": { "selected": { "type": "array", "items": { "type": "string" } }, "other": { "type": ["string", "null"] } } } } } } ```
2026-01-19 10:17:30 -08:00
RequestUserInput(RequestUserInputEvent),
DynamicToolCallRequest(DynamicToolCallRequest),
ElicitationRequest(ElicitationRequestEvent),
ApplyPatchApprovalRequest(ApplyPatchApprovalRequestEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
/// Notification advising the user that something they are using has been
/// deprecated and should be phased out.
DeprecationNotice(DeprecationNoticeEvent),
BackgroundEvent(BackgroundEventEvent),
feat: support mcp_servers in config.toml (#829) This adds initial support for MCP servers in the style of Claude Desktop and Cursor. Note this PR is the bare minimum to get things working end to end: all configured MCP servers are launched every time Codex is run, there is no recovery for MCP servers that crash, etc. (Also, I took some shortcuts to change some fields of `Session` to be `pub(crate)`, which also means there are circular deps between `codex.rs` and `mcp_tool_call.rs`, but I will clean that up in a subsequent PR.) `codex-rs/README.md` is updated as part of this PR to explain how to use this feature. There is a bit of plumbing to route the new settings from `Config` to the business logic in `codex.rs`. The most significant chunks for new code are in `mcp_connection_manager.rs` (which defines the `McpConnectionManager` struct) and `mcp_tool_call.rs`, which is responsible for tool calls. This PR also introduces new `McpToolCallBegin` and `McpToolCallEnd` event types to the protocol, but does not add any handlers for them. (See https://github.com/openai/codex/pull/836 for initial usage.) To test, I added the following to my `~/.codex/config.toml`: ```toml # Local build of https://github.com/hideya/mcp-server-weather-js [mcp_servers.weather] command = "/Users/mbolin/code/mcp-server-weather-js/dist/index.js" args = [] ``` And then I ran the following: ``` codex-rs$ cargo run --bin codex exec 'what is the weather in san francisco' [2025-05-06T22:40:05] Task started: 1 [2025-05-06T22:40:18] Agent message: Here’s the latest National Weather Service forecast for San Francisco (downtown, near 37.77° N, 122.42° W): This Afternoon (Tue): • Sunny, high near 69 °F • West-southwest wind around 12 mph Tonight: • Partly cloudy, low around 52 °F • SW wind 7–10 mph ... ``` Note that Codex itself is not able to make network calls, so it would not normally be able to get live weather information like this. However, the weather MCP is [currently] not run under the Codex sandbox, so it is able to hit `api.weather.gov` and fetch current weather information. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/829). * #836 * __->__ #829
2025-05-06 15:47:59 -07:00
2025-10-27 10:55:29 +00:00
UndoStarted(UndoStartedEvent),
UndoCompleted(UndoCompletedEvent),
2025-08-21 01:15:24 -07:00
/// Notification that a model stream experienced an error or disconnect
/// and the system is handling it (e.g., retrying with backoff).
StreamError(StreamErrorEvent),
/// Notification that the agent is about to apply a code patch. Mirrors
/// `ExecCommandBegin` so frontends can show progress indicators.
PatchApplyBegin(PatchApplyBeginEvent),
/// Notification that a patch application has finished.
PatchApplyEnd(PatchApplyEndEvent),
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
TurnDiff(TurnDiffEvent),
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Response to GetHistoryEntryRequest.
GetHistoryEntryResponse(GetHistoryEntryResponseEvent),
/// List of MCP tools available to the agent.
McpListToolsResponse(McpListToolsResponseEvent),
/// List of custom prompts available to the agent.
ListCustomPromptsResponse(ListCustomPromptsResponseEvent),
/// List of skills available to the agent.
ListSkillsResponse(ListSkillsResponseEvent),
/// Notification that skill data may have been updated and clients may want to reload.
SkillsUpdateAvailable,
PlanUpdate(UpdatePlanArgs),
TurnAborted(TurnAbortedEvent),
/// Notification that the agent is shutting down.
ShutdownComplete,
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
/// Entered review mode.
EnteredReviewMode(ReviewRequest),
/// Exited review mode with an optional final result to apply.
2025-09-14 17:34:33 -07:00
ExitedReviewMode(ExitedReviewModeEvent),
RawResponseItem(RawResponseItemEvent),
ItemStarted(ItemStartedEvent),
ItemCompleted(ItemCompletedEvent),
AgentMessageContentDelta(AgentMessageContentDeltaEvent),
Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - **Normal text** → `AgentMessageContentDelta` - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
2026-01-30 10:59:30 -08:00
PlanDelta(PlanDeltaEvent),
ReasoningContentDelta(ReasoningContentDeltaEvent),
ReasoningRawContentDelta(ReasoningRawContentDeltaEvent),
feat: emit events around collab tools (#9095) Emit the following events around the collab tools. On the `app-server` this will be under `item/started` and `item/completed` ``` #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the newly spawned agent, if it was created. pub new_thread_id: Option<ThreadId>, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, /// Last known status of the new agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingBeginEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingEndEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Last known status of the receiver agent reported to the sender agent before /// the close. pub status: AgentStatus, } ```
2026-01-14 17:55:57 +00:00
/// Collab interaction: agent spawn begin.
CollabAgentSpawnBegin(CollabAgentSpawnBeginEvent),
/// Collab interaction: agent spawn end.
CollabAgentSpawnEnd(CollabAgentSpawnEndEvent),
/// Collab interaction: agent interaction begin.
CollabAgentInteractionBegin(CollabAgentInteractionBeginEvent),
/// Collab interaction: agent interaction end.
CollabAgentInteractionEnd(CollabAgentInteractionEndEvent),
/// Collab interaction: waiting begin.
CollabWaitingBegin(CollabWaitingBeginEvent),
/// Collab interaction: waiting end.
CollabWaitingEnd(CollabWaitingEndEvent),
/// Collab interaction: close begin.
CollabCloseBegin(CollabCloseBeginEvent),
/// Collab interaction: close end.
CollabCloseEnd(CollabCloseEndEvent),
}
impl From<CollabAgentSpawnBeginEvent> for EventMsg {
fn from(event: CollabAgentSpawnBeginEvent) -> Self {
EventMsg::CollabAgentSpawnBegin(event)
}
}
impl From<CollabAgentSpawnEndEvent> for EventMsg {
fn from(event: CollabAgentSpawnEndEvent) -> Self {
EventMsg::CollabAgentSpawnEnd(event)
}
}
impl From<CollabAgentInteractionBeginEvent> for EventMsg {
fn from(event: CollabAgentInteractionBeginEvent) -> Self {
EventMsg::CollabAgentInteractionBegin(event)
}
}
impl From<CollabAgentInteractionEndEvent> for EventMsg {
fn from(event: CollabAgentInteractionEndEvent) -> Self {
EventMsg::CollabAgentInteractionEnd(event)
}
}
impl From<CollabWaitingBeginEvent> for EventMsg {
fn from(event: CollabWaitingBeginEvent) -> Self {
EventMsg::CollabWaitingBegin(event)
}
}
impl From<CollabWaitingEndEvent> for EventMsg {
fn from(event: CollabWaitingEndEvent) -> Self {
EventMsg::CollabWaitingEnd(event)
}
}
impl From<CollabCloseBeginEvent> for EventMsg {
fn from(event: CollabCloseBeginEvent) -> Self {
EventMsg::CollabCloseBegin(event)
}
}
impl From<CollabCloseEndEvent> for EventMsg {
fn from(event: CollabCloseEndEvent) -> Self {
EventMsg::CollabCloseEnd(event)
}
}
/// Agent lifecycle status, derived from emitted events.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, Eq, JsonSchema, TS, Default)]
#[serde(rename_all = "snake_case")]
#[ts(rename_all = "snake_case")]
pub enum AgentStatus {
/// Agent is waiting for initialization.
#[default]
PendingInit,
/// Agent is currently running.
Running,
/// Agent is done. Contains the final assistant message.
Completed(Option<String>),
/// Agent encountered an error.
Errored(String),
/// Agent has been shutdown.
Shutdown,
/// Agent is not found.
NotFound,
}
/// Codex errors that we expose to clients.
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
#[ts(rename_all = "snake_case")]
pub enum CodexErrorInfo {
ContextWindowExceeded,
UsageLimitExceeded,
ModelCap {
model: String,
reset_after_seconds: Option<u64>,
},
HttpConnectionFailed {
http_status_code: Option<u16>,
},
/// Failed to connect to the response SSE stream.
ResponseStreamConnectionFailed {
http_status_code: Option<u16>,
},
InternalServerError,
Unauthorized,
BadRequest,
SandboxError,
/// The response SSE stream disconnected in the middle of a turnbefore completion.
ResponseStreamDisconnected {
http_status_code: Option<u16>,
},
/// Reached the retry limit for responses.
ResponseTooManyFailedAttempts {
http_status_code: Option<u16>,
},
feat(app-server): thread/rollback API (#8454) Add `thread/rollback` to app-server to support IDEs undo-ing the last N turns of a thread. For context, an IDE partner will be supporting an "undo" capability where the IDE (the app-server client) will be responsible for reverting the local changes made during the last turn. To support this well, we also need a way to drop the last turn (or more generally, the last N turns) from the agent's context. This is what `thread/rollback` does. **Core idea**: A Thread rollback is represented as a persisted event message (EventMsg::ThreadRollback) in the rollout JSONL file, not by rewriting history. On resume, both the model's context (core replay) and the UI turn list (app-server v2's thread history builder) apply these markers so the pruned history is consistent across live conversations and `thread/resume`. Implementation notes: - Rollback only affects agent context and appends to the rollout file; clients are responsible for reverting files on disk. - If a thread rollback is currently in progress, subsequent `thread/rollback` calls are rejected. - Because we use `CodexConversation::submit` and codex core tracks active turns, returning an error on concurrent rollbacks is communicated via an `EventMsg::Error` with a new variant `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and sends the BAD_REQUEST RPC response. Tests cover thread rollbacks in both core and app-server, including when `num_turns` > existing turns (which clears all turns). **Note**: this explicitly does **not** behave like `/undo` which we just removed from the CLI, which does the opposite of what `thread/rollback` does. `/undo` reverts local changes via ghost commits/snapshots and does not modify the agent's context / conversation history.
2026-01-06 13:23:48 -08:00
ThreadRollbackFailed,
Other,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct RawResponseItemEvent {
pub item: ResponseItem,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct ItemStartedEvent {
pub thread_id: ThreadId,
pub turn_id: String,
pub item: TurnItem,
}
impl HasLegacyEvent for ItemStartedEvent {
fn as_legacy_events(&self, _: bool) -> Vec<EventMsg> {
match &self.item {
TurnItem::WebSearch(item) => vec![EventMsg::WebSearchBegin(WebSearchBeginEvent {
call_id: item.id.clone(),
})],
_ => Vec::new(),
}
}
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct ItemCompletedEvent {
pub thread_id: ThreadId,
pub turn_id: String,
pub item: TurnItem,
2025-09-14 17:34:33 -07:00
}
pub trait HasLegacyEvent {
fn as_legacy_events(&self, show_raw_agent_reasoning: bool) -> Vec<EventMsg>;
}
impl HasLegacyEvent for ItemCompletedEvent {
fn as_legacy_events(&self, show_raw_agent_reasoning: bool) -> Vec<EventMsg> {
self.item.as_legacy_events(show_raw_agent_reasoning)
}
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct AgentMessageContentDeltaEvent {
pub thread_id: String,
pub turn_id: String,
pub item_id: String,
pub delta: String,
}
impl HasLegacyEvent for AgentMessageContentDeltaEvent {
fn as_legacy_events(&self, _: bool) -> Vec<EventMsg> {
vec![EventMsg::AgentMessageDelta(AgentMessageDeltaEvent {
delta: self.delta.clone(),
})]
}
}
Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - **Normal text** → `AgentMessageContentDelta` - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
2026-01-30 10:59:30 -08:00
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct PlanDeltaEvent {
pub thread_id: String,
pub turn_id: String,
pub item_id: String,
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct ReasoningContentDeltaEvent {
pub thread_id: String,
pub turn_id: String,
pub item_id: String,
pub delta: String,
// load with default value so it's backward compatible with the old format.
#[serde(default)]
pub summary_index: i64,
}
impl HasLegacyEvent for ReasoningContentDeltaEvent {
fn as_legacy_events(&self, _: bool) -> Vec<EventMsg> {
vec![EventMsg::AgentReasoningDelta(AgentReasoningDeltaEvent {
delta: self.delta.clone(),
})]
}
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct ReasoningRawContentDeltaEvent {
pub thread_id: String,
pub turn_id: String,
pub item_id: String,
pub delta: String,
// load with default value so it's backward compatible with the old format.
#[serde(default)]
pub content_index: i64,
}
impl HasLegacyEvent for ReasoningRawContentDeltaEvent {
fn as_legacy_events(&self, _: bool) -> Vec<EventMsg> {
vec![EventMsg::AgentReasoningRawContentDelta(
AgentReasoningRawContentDeltaEvent {
delta: self.delta.clone(),
},
)]
}
}
impl HasLegacyEvent for EventMsg {
fn as_legacy_events(&self, show_raw_agent_reasoning: bool) -> Vec<EventMsg> {
match self {
EventMsg::ItemStarted(event) => event.as_legacy_events(show_raw_agent_reasoning),
EventMsg::ItemCompleted(event) => event.as_legacy_events(show_raw_agent_reasoning),
EventMsg::AgentMessageContentDelta(event) => {
event.as_legacy_events(show_raw_agent_reasoning)
}
EventMsg::ReasoningContentDelta(event) => {
event.as_legacy_events(show_raw_agent_reasoning)
}
EventMsg::ReasoningRawContentDelta(event) => {
event.as_legacy_events(show_raw_agent_reasoning)
}
_ => Vec::new(),
}
}
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
2025-09-14 17:34:33 -07:00
pub struct ExitedReviewModeEvent {
pub review_output: Option<ReviewOutputEvent>,
}
// Individual event payload types matching each `EventMsg` variant.
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ErrorEvent {
pub message: String,
#[serde(default)]
pub codex_error_info: Option<CodexErrorInfo>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct WarningEvent {
pub message: String,
}
2025-11-25 16:12:14 +00:00
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ContextCompactedEvent;
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
2026-01-09 17:31:17 +00:00
pub struct TurnCompleteEvent {
pub last_agent_message: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
2026-01-09 17:31:17 +00:00
pub struct TurnStartedEvent {
// TODO(aibrahim): make this not optional
pub model_context_window: Option<i64>,
Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - **Normal text** → `AgentMessageContentDelta` - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
2026-01-30 10:59:30 -08:00
#[serde(default)]
pub collaboration_mode_kind: ModeKind,
}
#[derive(Debug, Clone, Deserialize, Serialize, Default, PartialEq, Eq, JsonSchema, TS)]
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
pub struct TokenUsage {
#[ts(type = "number")]
pub input_tokens: i64,
#[ts(type = "number")]
pub cached_input_tokens: i64,
#[ts(type = "number")]
pub output_tokens: i64,
#[ts(type = "number")]
pub reasoning_output_tokens: i64,
#[ts(type = "number")]
pub total_tokens: i64,
feat: show number of tokens remaining in UI (#1388) When using the OpenAI Responses API, we now record the `usage` field for a `"response.completed"` event, which includes metrics about the number of tokens consumed. We also introduce `openai_model_info.rs`, which includes current data about the most common OpenAI models available via the API (specifically `context_window` and `max_output_tokens`). If Codex does not recognize the model, you can set `model_context_window` and `model_max_output_tokens` explicitly in `config.toml`. When then introduce a new event type to `protocol.rs`, `TokenCount`, which includes the `TokenUsage` for the most recent turn. Finally, we update the TUI to record the running sum of tokens used so the percentage of available context window remaining can be reported via the placeholder text for the composer: ![Screenshot 2025-06-25 at 11 20 55 PM](https://github.com/user-attachments/assets/6fd6982f-7247-4f14-84b2-2e600cb1fd49) We could certainly get much fancier with this (such as reporting the estimated cost of the conversation), but for now, we are just trying to achieve feature parity with the TypeScript CLI. Though arguably this improves upon the TypeScript CLI, as the TypeScript CLI uses heuristics to estimate the number of tokens used rather than using the `usage` information directly: https://github.com/openai/codex/blob/296996d74e345b1b05d8c3451a06ace21c5ada96/codex-cli/src/utils/approximate-tokens-used.ts#L3-L16 Fixes https://github.com/openai/codex/issues/1242
2025-06-25 23:31:11 -07:00
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, Eq, JsonSchema, TS)]
pub struct TokenUsageInfo {
pub total_token_usage: TokenUsage,
pub last_token_usage: TokenUsage,
// TODO(aibrahim): make this not optional
#[ts(type = "number | null")]
pub model_context_window: Option<i64>,
}
impl TokenUsageInfo {
pub fn new_or_append(
info: &Option<TokenUsageInfo>,
last: &Option<TokenUsage>,
model_context_window: Option<i64>,
) -> Option<Self> {
if info.is_none() && last.is_none() {
return None;
}
let mut info = match info {
Some(info) => info.clone(),
None => Self {
total_token_usage: TokenUsage::default(),
last_token_usage: TokenUsage::default(),
model_context_window,
},
};
if let Some(last) = last {
info.append_last_usage(last);
}
Some(info)
}
pub fn append_last_usage(&mut self, last: &TokenUsage) {
self.total_token_usage.add_assign(last);
self.last_token_usage = last.clone();
}
pub fn fill_to_context_window(&mut self, context_window: i64) {
let previous_total = self.total_token_usage.total_tokens;
let delta = (context_window - previous_total).max(0);
self.model_context_window = Some(context_window);
self.total_token_usage = TokenUsage {
total_tokens: context_window,
..TokenUsage::default()
};
self.last_token_usage = TokenUsage {
total_tokens: delta,
..TokenUsage::default()
};
}
pub fn full_context_window(context_window: i64) -> Self {
let mut info = Self {
total_token_usage: TokenUsage::default(),
last_token_usage: TokenUsage::default(),
model_context_window: Some(context_window),
};
info.fill_to_context_window(context_window);
info
}
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct TokenCountEvent {
pub info: Option<TokenUsageInfo>,
pub rate_limits: Option<RateLimitSnapshot>,
}
#[derive(Debug, Clone, PartialEq, Deserialize, Serialize, JsonSchema, TS)]
pub struct RateLimitSnapshot {
pub primary: Option<RateLimitWindow>,
pub secondary: Option<RateLimitWindow>,
pub credits: Option<CreditsSnapshot>,
pub plan_type: Option<crate::account::PlanType>,
}
#[derive(Debug, Clone, PartialEq, Deserialize, Serialize, JsonSchema, TS)]
pub struct RateLimitWindow {
/// Percentage (0-100) of the window that has been consumed.
pub used_percent: f64,
/// Rolling window duration, in minutes.
#[ts(type = "number | null")]
pub window_minutes: Option<i64>,
/// Unix timestamp (seconds since epoch) when the window resets.
#[ts(type = "number | null")]
pub resets_at: Option<i64>,
}
#[derive(Debug, Clone, PartialEq, Deserialize, Serialize, JsonSchema, TS)]
pub struct CreditsSnapshot {
pub has_credits: bool,
pub unlimited: bool,
pub balance: Option<String>,
}
// Includes prompts, tools and space to call compact.
const BASELINE_TOKENS: i64 = 12000;
impl TokenUsage {
pub fn is_zero(&self) -> bool {
self.total_tokens == 0
}
pub fn cached_input(&self) -> i64 {
self.cached_input_tokens.max(0)
}
pub fn non_cached_input(&self) -> i64 {
(self.input_tokens - self.cached_input()).max(0)
}
/// Primary count for display as a single absolute value: non-cached input + output.
pub fn blended_total(&self) -> i64 {
(self.non_cached_input() + self.output_tokens.max(0)).max(0)
}
pub fn tokens_in_context_window(&self) -> i64 {
self.total_tokens
}
/// Estimate the remaining user-controllable percentage of the model's context window.
///
/// `context_window` is the total size of the model's context window.
/// `BASELINE_TOKENS` should capture tokens that are always present in
/// the context (e.g., system prompt and fixed tool instructions) so that
/// the percentage reflects the portion the user can influence.
///
/// This normalizes both the numerator and denominator by subtracting the
/// baseline, so immediately after the first prompt the UI shows 100% left
/// and trends toward 0% as the user fills the effective window.
pub fn percent_of_context_window_remaining(&self, context_window: i64) -> i64 {
if context_window <= BASELINE_TOKENS {
return 0;
}
let effective_window = context_window - BASELINE_TOKENS;
let used = (self.tokens_in_context_window() - BASELINE_TOKENS).max(0);
let remaining = (effective_window - used).max(0);
((remaining as f64 / effective_window as f64) * 100.0)
.clamp(0.0, 100.0)
.round() as i64
}
/// In-place element-wise sum of token counts.
pub fn add_assign(&mut self, other: &TokenUsage) {
self.input_tokens += other.input_tokens;
self.cached_input_tokens += other.cached_input_tokens;
self.output_tokens += other.output_tokens;
self.reasoning_output_tokens += other.reasoning_output_tokens;
self.total_tokens += other.total_tokens;
}
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema)]
pub struct FinalOutput {
pub token_usage: TokenUsage,
}
impl From<TokenUsage> for FinalOutput {
fn from(token_usage: TokenUsage) -> Self {
Self { token_usage }
}
}
impl fmt::Display for FinalOutput {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let token_usage = &self.token_usage;
write!(
f,
"Token usage: total={} input={}{} output={}{}",
format_with_separators(token_usage.blended_total()),
format_with_separators(token_usage.non_cached_input()),
if token_usage.cached_input() > 0 {
format!(
" (+ {} cached)",
format_with_separators(token_usage.cached_input())
)
} else {
String::new()
},
format_with_separators(token_usage.output_tokens),
if token_usage.reasoning_output_tokens > 0 {
format!(
" (reasoning {})",
format_with_separators(token_usage.reasoning_output_tokens)
)
} else {
String::new()
}
)
}
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentMessageEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct UserMessageEvent {
pub message: String,
/// Image URLs sourced from `UserInput::Image`. These are safe
/// to replay in legacy UI history events and correspond to images sent to
/// the model.
#[serde(skip_serializing_if = "Option::is_none")]
pub images: Option<Vec<String>>,
/// Local file paths sourced from `UserInput::LocalImage`. These are kept so
/// the UI can reattach images when editing history, and should not be sent
/// to the model or treated as API-ready URLs.
#[serde(default)]
pub local_images: Vec<std::path::PathBuf>,
/// UI-defined spans within `message` used to render or persist special elements.
#[serde(default)]
pub text_elements: Vec<crate::user_input::TextElement>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentMessageDeltaEvent {
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentReasoningEvent {
pub text: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentReasoningRawContentEvent {
pub text: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentReasoningRawContentDeltaEvent {
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentReasoningSectionBreakEvent {
// load with default value so it's backward compatible with the old format.
#[serde(default)]
pub item_id: String,
#[serde(default)]
pub summary_index: i64,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct AgentReasoningDeltaEvent {
pub delta: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, PartialEq)]
pub struct McpInvocation {
/// Name of the MCP server as defined in the config.
pub server: String,
/// Name of the tool as given by the MCP server.
pub tool: String,
/// Arguments to the tool call.
pub arguments: Option<serde_json::Value>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, PartialEq)]
pub struct McpToolCallBeginEvent {
/// Identifier so this can be paired with the McpToolCallEnd event.
pub call_id: String,
pub invocation: McpInvocation,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, PartialEq)]
pub struct McpToolCallEndEvent {
/// Identifier for the corresponding McpToolCallBegin that finished.
pub call_id: String,
pub invocation: McpInvocation,
#[ts(type = "string")]
pub duration: Duration,
/// Result of the tool call. Note this could be an error.
fix: introduce ResponseInputItem::McpToolCallOutput variant (#1151) The output of an MCP server tool call can be one of several types, but to date, we treated all outputs as text by showing the serialized JSON as the "tool output" in Codex: https://github.com/openai/codex/blob/25a9949c49194d5a64de54a11bcc5b4724ac9bd5/codex-rs/mcp-types/src/lib.rs#L96-L101 This PR adds support for the `ImageContent` variant so we can now display an image output from an MCP tool call. In making this change, we introduce a new `ResponseInputItem::McpToolCallOutput` variant so that we can work with the `mcp_types::CallToolResult` directly when the function call is made to an MCP server. Though arguably the more significant change is the introduction of `HistoryCell::CompletedMcpToolCallWithImageOutput`, which is a cell that uses `ratatui_image` to render an image into the terminal. To support this, we introduce `ImageRenderCache`, cache a `ratatui_image::picker::Picker`, and `ensure_image_cache()` to cache the appropriate scaled image data and dimensions based on the current terminal size. To test, I created a minimal `package.json`: ```json { "name": "kitty-mcp", "version": "1.0.0", "type": "module", "description": "MCP that returns image of kitty", "main": "index.js", "dependencies": { "@modelcontextprotocol/sdk": "^1.12.0" } } ``` with the following `index.js` to define the MCP server: ```js #!/usr/bin/env node import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js"; import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; import { readFile } from "node:fs/promises"; import { join } from "node:path"; const IMAGE_URI = "image://Ada.png"; const server = new McpServer({ name: "Demo", version: "1.0.0", }); server.tool( "get-cat-image", "If you need a cat image, this tool will provide one.", async () => ({ content: [ { type: "image", data: await getAdaPngBase64(), mimeType: "image/png" }, ], }) ); server.resource("Ada the Cat", IMAGE_URI, async (uri) => { const base64Image = await getAdaPngBase64(); return { contents: [ { uri: uri.href, mimeType: "image/png", blob: base64Image, }, ], }; }); async function getAdaPngBase64() { const __dirname = new URL(".", import.meta.url).pathname; // From https://github.com/benjajaja/ratatui-image/blob/9705ce2c59ec669abbce2924cbfd1f5ae22c9860/assets/Ada.png const filePath = join(__dirname, "Ada.png"); const imageData = await readFile(filePath); const base64Image = imageData.toString("base64"); return base64Image; } const transport = new StdioServerTransport(); await server.connect(transport); ``` With the local changes from this PR, I added the following to my `config.toml`: ```toml [mcp_servers.kitty] command = "node" args = ["/Users/mbolin/code/kitty-mcp/index.js"] ``` Running the TUI from source: ``` cargo run --bin codex -- --model o3 'I need a picture of a cat' ``` I get: <img width="732" alt="image" src="https://github.com/user-attachments/assets/bf80b721-9ca0-4d81-aec7-77d6899e2869" /> Now, that said, I have only tested in iTerm and there is definitely some funny business with getting an accurate character-to-pixel ratio (sometimes the `CompletedMcpToolCallWithImageOutput` thinks it needs 10 rows to render instead of 4), so there is still work to be done here.
2025-05-28 19:03:17 -07:00
pub result: Result<CallToolResult, String>,
}
impl McpToolCallEndEvent {
pub fn is_success(&self) -> bool {
match &self.result {
Ok(result) => !result.is_error.unwrap_or(false),
Err(_) => false,
}
}
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct WebSearchBeginEvent {
pub call_id: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct WebSearchEndEvent {
pub call_id: String,
pub query: String,
pub action: WebSearchAction,
}
// Conversation kept for backward compatibility.
/// Response payload for `Op::GetHistory` containing the current session's
/// in-memory transcript.
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ConversationPathResponseEvent {
pub conversation_id: ThreadId,
pub path: PathBuf,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ResumedHistory {
pub conversation_id: ThreadId,
pub history: Vec<RolloutItem>,
pub rollout_path: PathBuf,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub enum InitialHistory {
New,
Resumed(ResumedHistory),
Forked(Vec<RolloutItem>),
}
impl InitialHistory {
pub fn forked_from_id(&self) -> Option<ThreadId> {
match self {
InitialHistory::New => None,
InitialHistory::Resumed(resumed) => {
resumed.history.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => meta_line.meta.forked_from_id,
_ => None,
})
}
InitialHistory::Forked(items) => items.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => Some(meta_line.meta.id),
_ => None,
}),
}
}
pub fn session_cwd(&self) -> Option<PathBuf> {
match self {
InitialHistory::New => None,
InitialHistory::Resumed(resumed) => session_cwd_from_items(&resumed.history),
InitialHistory::Forked(items) => session_cwd_from_items(items),
}
}
pub fn get_rollout_items(&self) -> Vec<RolloutItem> {
match self {
InitialHistory::New => Vec::new(),
InitialHistory::Resumed(resumed) => resumed.history.clone(),
InitialHistory::Forked(items) => items.clone(),
}
}
pub fn get_event_msgs(&self) -> Option<Vec<EventMsg>> {
match self {
InitialHistory::New => None,
InitialHistory::Resumed(resumed) => Some(
resumed
.history
.iter()
.filter_map(|ri| match ri {
RolloutItem::EventMsg(ev) => Some(ev.clone()),
_ => None,
})
.collect(),
),
InitialHistory::Forked(items) => Some(
items
.iter()
.filter_map(|ri| match ri {
RolloutItem::EventMsg(ev) => Some(ev.clone()),
_ => None,
})
.collect(),
),
}
}
fix(core) Preserve base_instructions in SessionMeta (#9427) ## Summary This PR consolidates base_instructions onto SessionMeta / SessionConfiguration, so we ensure `base_instructions` is set once per session and should be (mostly) immutable, unless: - overridden by config on resume / fork - sub-agent tasks, like review or collab In a future PR, we should convert all references to `base_instructions` to consistently used the typed struct, so it's less likely that we put other strings there. See #9423. However, this PR is already quite complex, so I'm deferring that to a follow-up. ## Testing - [x] Added a resume test to assert that instructions are preserved. In particular, `resume_switches_models_preserves_base_instructions` fails against main. Existing test coverage thats assert base instructions are preserved across multiple requests in a session: - Manual compact keeps baseline instructions: core/tests/suite/compact.rs:199 - Auto-compact keeps baseline instructions: core/tests/suite/compact.rs:1142 - Prompt caching reuses the same instructions across two requests: core/tests/suite/prompt_caching.rs:150 and core/tests/suite/prompt_caching.rs:157 - Prompt caching with explicit expected string across two requests: core/tests/suite/prompt_caching.rs:213 and core/tests/suite/prompt_caching.rs:222 - Resume with model switch keeps original instructions: core/tests/suite/resume.rs:136 - Compact/resume/fork uses request 0 instructions for later expected payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
pub fn get_base_instructions(&self) -> Option<BaseInstructions> {
// TODO: SessionMeta should (in theory) always be first in the history, so we can probably only check the first item?
match self {
InitialHistory::New => None,
InitialHistory::Resumed(resumed) => {
resumed.history.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => meta_line.meta.base_instructions.clone(),
_ => None,
})
}
InitialHistory::Forked(items) => items.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => meta_line.meta.base_instructions.clone(),
_ => None,
}),
}
}
pub fn get_dynamic_tools(&self) -> Option<Vec<DynamicToolSpec>> {
match self {
InitialHistory::New => None,
InitialHistory::Resumed(resumed) => {
resumed.history.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => meta_line.meta.dynamic_tools.clone(),
_ => None,
})
}
InitialHistory::Forked(items) => items.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => meta_line.meta.dynamic_tools.clone(),
_ => None,
}),
}
}
}
fn session_cwd_from_items(items: &[RolloutItem]) -> Option<PathBuf> {
items.iter().find_map(|item| match item {
RolloutItem::SessionMeta(meta_line) => Some(meta_line.meta.cwd.clone()),
_ => None,
})
}
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, JsonSchema, TS, Default)]
#[serde(rename_all = "lowercase")]
#[ts(rename_all = "lowercase")]
pub enum SessionSource {
Cli,
#[default]
VSCode,
Exec,
Mcp,
SubAgent(SubAgentSource),
#[serde(other)]
Unknown,
}
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
#[ts(rename_all = "snake_case")]
pub enum SubAgentSource {
Review,
Compact,
ThreadSpawn {
parent_thread_id: ThreadId,
depth: i32,
},
Other(String),
}
impl fmt::Display for SessionSource {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
SessionSource::Cli => f.write_str("cli"),
SessionSource::VSCode => f.write_str("vscode"),
SessionSource::Exec => f.write_str("exec"),
SessionSource::Mcp => f.write_str("mcp"),
SessionSource::SubAgent(sub_source) => write!(f, "subagent_{sub_source}"),
SessionSource::Unknown => f.write_str("unknown"),
}
}
}
impl fmt::Display for SubAgentSource {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
SubAgentSource::Review => f.write_str("review"),
SubAgentSource::Compact => f.write_str("compact"),
SubAgentSource::ThreadSpawn {
parent_thread_id,
depth,
} => {
write!(f, "thread_spawn_{parent_thread_id}_d{depth}")
}
SubAgentSource::Other(other) => f.write_str(other),
}
}
}
fix(core) Preserve base_instructions in SessionMeta (#9427) ## Summary This PR consolidates base_instructions onto SessionMeta / SessionConfiguration, so we ensure `base_instructions` is set once per session and should be (mostly) immutable, unless: - overridden by config on resume / fork - sub-agent tasks, like review or collab In a future PR, we should convert all references to `base_instructions` to consistently used the typed struct, so it's less likely that we put other strings there. See #9423. However, this PR is already quite complex, so I'm deferring that to a follow-up. ## Testing - [x] Added a resume test to assert that instructions are preserved. In particular, `resume_switches_models_preserves_base_instructions` fails against main. Existing test coverage thats assert base instructions are preserved across multiple requests in a session: - Manual compact keeps baseline instructions: core/tests/suite/compact.rs:199 - Auto-compact keeps baseline instructions: core/tests/suite/compact.rs:1142 - Prompt caching reuses the same instructions across two requests: core/tests/suite/prompt_caching.rs:150 and core/tests/suite/prompt_caching.rs:157 - Prompt caching with explicit expected string across two requests: core/tests/suite/prompt_caching.rs:213 and core/tests/suite/prompt_caching.rs:222 - Resume with model switch keeps original instructions: core/tests/suite/resume.rs:136 - Compact/resume/fork uses request 0 instructions for later expected payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
/// SessionMeta contains session-level data that doesn't correspond to a specific turn.
///
/// NOTE: There used to be an `instructions` field here, which stored user_instructions, but we
/// now save that on TurnContext. base_instructions stores the base instructions for the session,
/// and should be used when there is no config override.
#[derive(Serialize, Deserialize, Clone, Debug, JsonSchema, TS)]
pub struct SessionMeta {
pub id: ThreadId,
#[serde(skip_serializing_if = "Option::is_none")]
pub forked_from_id: Option<ThreadId>,
pub timestamp: String,
pub cwd: PathBuf,
pub originator: String,
pub cli_version: String,
#[serde(default)]
pub source: SessionSource,
pub model_provider: Option<String>,
fix(core) Preserve base_instructions in SessionMeta (#9427) ## Summary This PR consolidates base_instructions onto SessionMeta / SessionConfiguration, so we ensure `base_instructions` is set once per session and should be (mostly) immutable, unless: - overridden by config on resume / fork - sub-agent tasks, like review or collab In a future PR, we should convert all references to `base_instructions` to consistently used the typed struct, so it's less likely that we put other strings there. See #9423. However, this PR is already quite complex, so I'm deferring that to a follow-up. ## Testing - [x] Added a resume test to assert that instructions are preserved. In particular, `resume_switches_models_preserves_base_instructions` fails against main. Existing test coverage thats assert base instructions are preserved across multiple requests in a session: - Manual compact keeps baseline instructions: core/tests/suite/compact.rs:199 - Auto-compact keeps baseline instructions: core/tests/suite/compact.rs:1142 - Prompt caching reuses the same instructions across two requests: core/tests/suite/prompt_caching.rs:150 and core/tests/suite/prompt_caching.rs:157 - Prompt caching with explicit expected string across two requests: core/tests/suite/prompt_caching.rs:213 and core/tests/suite/prompt_caching.rs:222 - Resume with model switch keeps original instructions: core/tests/suite/resume.rs:136 - Compact/resume/fork uses request 0 instructions for later expected payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
/// base_instructions for the session. This *should* always be present when creating a new session,
/// but may be missing for older sessions. If not present, fall back to rendering the base_instructions
/// from ModelsManager.
pub base_instructions: Option<BaseInstructions>,
#[serde(skip_serializing_if = "Option::is_none")]
pub dynamic_tools: Option<Vec<DynamicToolSpec>>,
}
impl Default for SessionMeta {
fn default() -> Self {
SessionMeta {
id: ThreadId::default(),
forked_from_id: None,
timestamp: String::new(),
cwd: PathBuf::new(),
originator: String::new(),
cli_version: String::new(),
source: SessionSource::default(),
model_provider: None,
fix(core) Preserve base_instructions in SessionMeta (#9427) ## Summary This PR consolidates base_instructions onto SessionMeta / SessionConfiguration, so we ensure `base_instructions` is set once per session and should be (mostly) immutable, unless: - overridden by config on resume / fork - sub-agent tasks, like review or collab In a future PR, we should convert all references to `base_instructions` to consistently used the typed struct, so it's less likely that we put other strings there. See #9423. However, this PR is already quite complex, so I'm deferring that to a follow-up. ## Testing - [x] Added a resume test to assert that instructions are preserved. In particular, `resume_switches_models_preserves_base_instructions` fails against main. Existing test coverage thats assert base instructions are preserved across multiple requests in a session: - Manual compact keeps baseline instructions: core/tests/suite/compact.rs:199 - Auto-compact keeps baseline instructions: core/tests/suite/compact.rs:1142 - Prompt caching reuses the same instructions across two requests: core/tests/suite/prompt_caching.rs:150 and core/tests/suite/prompt_caching.rs:157 - Prompt caching with explicit expected string across two requests: core/tests/suite/prompt_caching.rs:213 and core/tests/suite/prompt_caching.rs:222 - Resume with model switch keeps original instructions: core/tests/suite/resume.rs:136 - Compact/resume/fork uses request 0 instructions for later expected payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
base_instructions: None,
dynamic_tools: None,
}
}
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema, TS)]
pub struct SessionMetaLine {
#[serde(flatten)]
pub meta: SessionMeta,
#[serde(skip_serializing_if = "Option::is_none")]
pub git: Option<GitInfo>,
}
#[derive(Serialize, Deserialize, Debug, Clone, JsonSchema, TS)]
#[serde(tag = "type", content = "payload", rename_all = "snake_case")]
pub enum RolloutItem {
SessionMeta(SessionMetaLine),
ResponseItem(ResponseItem),
Compacted(CompactedItem),
TurnContext(TurnContextItem),
EventMsg(EventMsg),
}
#[derive(Serialize, Deserialize, Clone, Debug, JsonSchema, TS)]
pub struct CompactedItem {
pub message: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub replacement_history: Option<Vec<ResponseItem>>,
}
impl From<CompactedItem> for ResponseItem {
fn from(value: CompactedItem) -> Self {
ResponseItem::Message {
id: None,
role: "assistant".to_string(),
content: vec![ContentItem::OutputText {
text: value.message,
}],
end_turn: None,
}
}
}
#[derive(Serialize, Deserialize, Clone, Debug, JsonSchema, TS)]
pub struct TurnContextItem {
pub cwd: PathBuf,
pub approval_policy: AskForApproval,
pub sandbox_policy: SandboxPolicy,
pub model: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub personality: Option<Personality>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub collaboration_mode: Option<CollaborationMode>,
#[serde(skip_serializing_if = "Option::is_none")]
pub effort: Option<ReasoningEffortConfig>,
pub summary: ReasoningSummaryConfig,
#[serde(skip_serializing_if = "Option::is_none")]
pub user_instructions: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub developer_instructions: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub final_output_json_schema: Option<Value>,
#[serde(skip_serializing_if = "Option::is_none")]
pub truncation_policy: Option<TruncationPolicy>,
}
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq, JsonSchema, TS)]
#[serde(tag = "mode", content = "limit", rename_all = "snake_case")]
pub enum TruncationPolicy {
Bytes(usize),
Tokens(usize),
}
#[derive(Serialize, Deserialize, Clone, JsonSchema)]
pub struct RolloutLine {
pub timestamp: String,
#[serde(flatten)]
pub item: RolloutItem,
}
#[derive(Serialize, Deserialize, Clone, Debug, JsonSchema, TS)]
pub struct GitInfo {
/// Current commit hash (SHA)
#[serde(skip_serializing_if = "Option::is_none")]
pub commit_hash: Option<String>,
/// Current branch name
#[serde(skip_serializing_if = "Option::is_none")]
pub branch: Option<String>,
/// Repository URL (if available from remote)
#[serde(skip_serializing_if = "Option::is_none")]
pub repository_url: Option<String>,
}
2025-11-28 11:34:57 +00:00
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
pub enum ReviewDelivery {
Inline,
Detached,
}
2025-12-02 11:26:27 +00:00
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, JsonSchema, TS)]
#[serde(tag = "type", rename_all = "camelCase")]
#[ts(tag = "type")]
pub enum ReviewTarget {
/// Review the working tree: staged, unstaged, and untracked files.
UncommittedChanges,
/// Review changes between the current branch and the given base branch.
#[serde(rename_all = "camelCase")]
#[ts(rename_all = "camelCase")]
BaseBranch { branch: String },
/// Review the changes introduced by a specific commit.
#[serde(rename_all = "camelCase")]
#[ts(rename_all = "camelCase")]
Commit {
sha: String,
/// Optional human-readable label (e.g., commit subject) for UIs.
title: Option<String>,
},
/// Arbitrary instructions provided by the user.
#[serde(rename_all = "camelCase")]
#[ts(rename_all = "camelCase")]
Custom { instructions: String },
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
2025-11-18 21:58:54 +00:00
/// Review request sent to the review session.
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
pub struct ReviewRequest {
2025-12-02 11:26:27 +00:00
pub target: ReviewTarget,
#[serde(skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub user_facing_hint: Option<String>,
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
}
/// Structured review result produced by a child review session.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
pub struct ReviewOutputEvent {
pub findings: Vec<ReviewFinding>,
pub overall_correctness: String,
pub overall_explanation: String,
pub overall_confidence_score: f32,
}
impl Default for ReviewOutputEvent {
fn default() -> Self {
Self {
findings: Vec::new(),
overall_correctness: String::default(),
overall_explanation: String::default(),
overall_confidence_score: 0.0,
}
}
}
/// A single review finding describing an observed issue or recommendation.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
pub struct ReviewFinding {
pub title: String,
pub body: String,
pub confidence_score: f32,
pub priority: i32,
pub code_location: ReviewCodeLocation,
}
/// Location of the code related to a review finding.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
pub struct ReviewCodeLocation {
pub absolute_file_path: PathBuf,
pub line_range: ReviewLineRange,
}
/// Inclusive line range in a file associated with the finding.
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
Review Mode (Core) (#3401) ## 📝 Review Mode -- Core This PR introduces the Core implementation for Review mode: - New op `Op::Review { prompt: String }:` spawns a child review task with isolated context, a review‑specific system prompt, and a `Config.review_model`. - `EnteredReviewMode`: emitted when the child review session starts. Every event from this point onwards reflects the review session. - `ExitedReviewMode(Option<ReviewOutputEvent>)`: emitted when the review finishes or is interrupted, with optional structured findings: ```json { "findings": [ { "title": "<≤ 80 chars, imperative>", "body": "<valid Markdown explaining *why* this is a problem; cite files/lines/functions>", "confidence_score": <float 0.0-1.0>, "priority": <int 0-3>, "code_location": { "absolute_file_path": "<file path>", "line_range": {"start": <int>, "end": <int>} } } ], "overall_correctness": "patch is correct" | "patch is incorrect", "overall_explanation": "<1-3 sentence explanation justifying the overall_correctness verdict>", "overall_confidence_score": <float 0.0-1.0> } ``` ## Questions ### Why separate out its own message history? We want the review thread to match the training of our review models as much as possible -- that means using a custom prompt, removing user instructions, and starting a clean chat history. We also want to make sure the review thread doesn't leak into the parent thread. ### Why do this as a mode, vs. sub-agents? 1. We want review to be a synchronous task, so it's fine for now to do a bespoke implementation. 2. We're still unclear about the final structure for sub-agents. We'd prefer to land this quickly and then refactor into sub-agents without rushing that implementation.
2025-09-12 16:25:10 -07:00
pub struct ReviewLineRange {
pub start: u32,
pub end: u32,
}
#[derive(
Debug, Clone, Copy, Display, Deserialize, Serialize, PartialEq, Eq, JsonSchema, TS, Default,
)]
#[serde(rename_all = "snake_case")]
pub enum ExecCommandSource {
#[default]
Agent,
UserShell,
UnifiedExecStartup,
UnifiedExecInteraction,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ExecCommandBeginEvent {
/// Identifier so this can be paired with the ExecCommandEnd event.
pub call_id: String,
/// Identifier for the underlying PTY process (when available).
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub process_id: Option<String>,
[app-server] feat: add v2 command execution approval flow (#6758) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. **Note**: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```
2025-11-17 16:23:54 -08:00
/// Turn ID that this command belongs to.
pub turn_id: String,
/// The command to be executed.
pub command: Vec<String>,
/// The command's working directory if not the default cwd for the agent.
pub cwd: PathBuf,
[1/3] Parse exec commands and format them more nicely in the UI (#2095) # Note for reviewers The bulk of this PR is in in the new file, `parse_command.rs`. This file is designed to be written TDD and implemented with Codex. Do not worry about reviewing the code, just review the unit tests (if you want). If any cases are missing, we'll add more tests and have Codex fix them. I think the best approach will be to land and iterate. I have some follow-ups I want to do after this lands. The next PR after this will let us merge (and dedupe) multiple sequential cells of the same such as multiple read commands. The deduping will also be important because the model often reads the same file multiple times in a row in chunks === This PR formats common commands like reading, formatting, testing, etc more nicely: It tries to extract things like file names, tests and falls back to the cmd if it doesn't. It also only shows stdout/err if the command failed. <img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15" src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e" /> <img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32" src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9" /> <img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2" src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f" /> Part 2: https://github.com/openai/codex/pull/2097 Part 3: https://github.com/openai/codex/pull/2110
2025-08-11 11:26:15 -07:00
pub parsed_cmd: Vec<ParsedCommand>,
/// Where the command originated. Defaults to Agent for backward compatibility.
feature: Add "!cmd" user shell execution (#2471) feature: Add "!cmd" user shell execution This change lets users run local shell commands directly from the TUI by prefixing their input with ! (e.g. !ls). Output is truncated to keep the exec cell usable, and Ctrl-C cleanly interrupts long-running commands (e.g. !sleep 10000). **Summary of changes** - Route Op::RunUserShellCommand through a dedicated UserShellCommandTask (core/src/tasks/user_shell.rs), keeping the task logic out of codex.rs. - Reuse the existing tool router: the task constructs a ToolCall for the local_shell tool and relies on ShellHandler, so no manual MCP tool lookup is required. - Emit exec lifecycle events (ExecCommandBegin/ExecCommandEnd) so the TUI can show command metadata, live output, and exit status. **End-to-end flow** **TUI handling** 1. ChatWidget::submit_user_message (TUI) intercepts messages starting with !. 2. Non-empty commands dispatch Op::RunUserShellCommand { command }; empty commands surface a help hint. 3. No UserInput items are created, so nothing is enqueued for the model. **Core submission loop** 4. The submission loop routes the op to handlers::run_user_shell_command (core/src/codex.rs). 5. A fresh TurnContext is created and Session::spawn_user_shell_command enqueues UserShellCommandTask. **Task execution** 6. UserShellCommandTask::run emits TaskStartedEvent, formats the command, and prepares a ToolCall targeting local_shell. 7. ToolCallRuntime::handle_tool_call dispatches to ShellHandler. **Shell tool runtime** 8. ShellHandler::run_exec_like launches the process via the unified exec runtime, honoring sandbox and shell policies, and emits ExecCommandBegin/End. 9. Stdout/stderr are captured for the UI, but the task does not turn the resulting ToolOutput into a model response. **Completion** 10. After ExecCommandEnd, the task finishes without an assistant message; the session marks it complete and the exec cell displays the final output. **Conversation context** - The command and its output never enter the conversation history or the model prompt; the flow is local-only. - Only exec/task events are emitted for UI rendering. **Demo video** https://github.com/user-attachments/assets/fcd114b0-4304-4448-a367-a04c43e0b996
2025-10-29 00:31:20 -07:00
#[serde(default)]
pub source: ExecCommandSource,
/// Raw input sent to a unified exec session (if this is an interaction event).
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub interaction_input: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ExecCommandEndEvent {
/// Identifier for the ExecCommandBegin that finished.
pub call_id: String,
/// Identifier for the underlying PTY process (when available).
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub process_id: Option<String>,
[app-server] feat: add v2 command execution approval flow (#6758) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. **Note**: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```
2025-11-17 16:23:54 -08:00
/// Turn ID that this command belongs to.
pub turn_id: String,
/// The command that was executed.
pub command: Vec<String>,
/// The command's working directory if not the default cwd for the agent.
pub cwd: PathBuf,
pub parsed_cmd: Vec<ParsedCommand>,
/// Where the command originated. Defaults to Agent for backward compatibility.
#[serde(default)]
pub source: ExecCommandSource,
/// Raw input sent to a unified exec session (if this is an interaction event).
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub interaction_input: Option<String>,
/// Captured stdout
pub stdout: String,
/// Captured stderr
pub stderr: String,
/// Captured aggregated output
#[serde(default)]
pub aggregated_output: String,
/// The command's exit code.
pub exit_code: i32,
/// The duration of the command execution.
#[ts(type = "string")]
pub duration: Duration,
/// Formatted output from the command, as seen by the model.
pub formatted_output: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ViewImageToolCallEvent {
/// Identifier for the originating tool call.
pub call_id: String,
/// Local filesystem path provided to the tool.
pub path: PathBuf,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
pub enum ExecOutputStream {
Stdout,
Stderr,
}
#[serde_as]
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct ExecCommandOutputDeltaEvent {
/// Identifier for the ExecCommandBegin that produced this chunk.
pub call_id: String,
/// Which stream produced this chunk.
pub stream: ExecOutputStream,
/// Raw bytes from the stream (may not be valid UTF-8).
#[serde_as(as = "serde_with::base64::Base64")]
#[schemars(with = "String")]
#[ts(type = "string")]
pub chunk: Vec<u8>,
}
#[serde_as]
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct TerminalInteractionEvent {
/// Identifier for the ExecCommandBegin that produced this chunk.
pub call_id: String,
/// Process id associated with the running command.
pub process_id: String,
/// Stdin sent to the running session.
pub stdin: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct BackgroundEventEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct DeprecationNoticeEvent {
/// Concise summary of what is deprecated.
pub summary: String,
/// Optional extra guidance, such as migration steps or rationale.
#[serde(skip_serializing_if = "Option::is_none")]
pub details: Option<String>,
}
2025-10-27 10:55:29 +00:00
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct UndoStartedEvent {
#[serde(skip_serializing_if = "Option::is_none")]
pub message: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct UndoCompletedEvent {
pub success: bool,
#[serde(skip_serializing_if = "Option::is_none")]
pub message: Option<String>,
}
feat(app-server): thread/rollback API (#8454) Add `thread/rollback` to app-server to support IDEs undo-ing the last N turns of a thread. For context, an IDE partner will be supporting an "undo" capability where the IDE (the app-server client) will be responsible for reverting the local changes made during the last turn. To support this well, we also need a way to drop the last turn (or more generally, the last N turns) from the agent's context. This is what `thread/rollback` does. **Core idea**: A Thread rollback is represented as a persisted event message (EventMsg::ThreadRollback) in the rollout JSONL file, not by rewriting history. On resume, both the model's context (core replay) and the UI turn list (app-server v2's thread history builder) apply these markers so the pruned history is consistent across live conversations and `thread/resume`. Implementation notes: - Rollback only affects agent context and appends to the rollout file; clients are responsible for reverting files on disk. - If a thread rollback is currently in progress, subsequent `thread/rollback` calls are rejected. - Because we use `CodexConversation::submit` and codex core tracks active turns, returning an error on concurrent rollbacks is communicated via an `EventMsg::Error` with a new variant `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and sends the BAD_REQUEST RPC response. Tests cover thread rollbacks in both core and app-server, including when `num_turns` > existing turns (which clears all turns). **Note**: this explicitly does **not** behave like `/undo` which we just removed from the CLI, which does the opposite of what `thread/rollback` does. `/undo` reverts local changes via ghost commits/snapshots and does not modify the agent's context / conversation history.
2026-01-06 13:23:48 -08:00
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ThreadRolledBackEvent {
/// Number of user turns that were removed from context.
pub num_turns: u32,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
2025-08-21 01:15:24 -07:00
pub struct StreamErrorEvent {
pub message: String,
#[serde(default)]
pub codex_error_info: Option<CodexErrorInfo>,
/// Optional details about the underlying stream failure (often the same
/// human-readable message that is surfaced as the terminal error if retries
/// are exhausted).
#[serde(default)]
pub additional_details: Option<String>,
2025-08-21 01:15:24 -07:00
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct StreamInfoEvent {
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct PatchApplyBeginEvent {
/// Identifier so this can be paired with the PatchApplyEnd event.
pub call_id: String,
[app-server] feat: v2 apply_patch approval flow (#6760) This PR adds the API V2 version of the apply_patch approval flow, which centers around `ThreadItem::FileChange`. This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only) and related events (`item/started`, `item/completed` for `ThreadItem::FileChange`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. Similar to https://github.com/openai/codex/pull/6758, the approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add a few additional fields to `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those were fairly lightweight. However, the `EventMsg`s emitted by core are the following: ``` 1) Auto-approved (no request for approval)
 - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 2) Approved by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 3) Declined by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd ``` For a request triggering an approval, this would result in: ``` item/fileChange/requestApproval item/started item/completed ``` which is different from the `ThreadItem::CommandExecution` flow introduced in https://github.com/openai/codex/pull/6758, which does the below and is preferable: ``` item/started item/commandExecution/requestApproval item/completed ``` To fix this, we leverage `TurnSummaryStore` on codex_message_processor to store a little bit of state, allowing us to fire `item/started` and `item/fileChange/requestApproval` whenever we receive the underlying `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the `EventMsg::PatchApplyBegin` later. This is much less invasive than modifying the order of EventMsg within core (I tried). The resulting payloads: ``` { "method": "item/started", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "inProgress", "type": "fileChange" } } } ``` ``` { "id": 0, "method": "item/fileChange/requestApproval", "params": { "grantRoot": null, "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686", "reason": null, "threadId": "019a9e11-8295-7883-a283-779e06502c6f", "turnId": "1" } } ``` ``` { "id": 0, "result": { "decision": "accept" } } ``` ``` { "method": "item/completed", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "completed", "type": "fileChange" } } } ```
2025-11-19 20:13:31 -08:00
/// Turn ID that this patch belongs to.
/// Uses `#[serde(default)]` for backwards compatibility.
#[serde(default)]
pub turn_id: String,
/// If true, there was no ApplyPatchApprovalRequest for this patch.
pub auto_approved: bool,
/// The changes to be applied.
pub changes: HashMap<PathBuf, FileChange>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct PatchApplyEndEvent {
/// Identifier for the PatchApplyBegin that finished.
pub call_id: String,
[app-server] feat: v2 apply_patch approval flow (#6760) This PR adds the API V2 version of the apply_patch approval flow, which centers around `ThreadItem::FileChange`. This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only) and related events (`item/started`, `item/completed` for `ThreadItem::FileChange`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. Similar to https://github.com/openai/codex/pull/6758, the approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add a few additional fields to `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those were fairly lightweight. However, the `EventMsg`s emitted by core are the following: ``` 1) Auto-approved (no request for approval)
 - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 2) Approved by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 3) Declined by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd ``` For a request triggering an approval, this would result in: ``` item/fileChange/requestApproval item/started item/completed ``` which is different from the `ThreadItem::CommandExecution` flow introduced in https://github.com/openai/codex/pull/6758, which does the below and is preferable: ``` item/started item/commandExecution/requestApproval item/completed ``` To fix this, we leverage `TurnSummaryStore` on codex_message_processor to store a little bit of state, allowing us to fire `item/started` and `item/fileChange/requestApproval` whenever we receive the underlying `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the `EventMsg::PatchApplyBegin` later. This is much less invasive than modifying the order of EventMsg within core (I tried). The resulting payloads: ``` { "method": "item/started", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "inProgress", "type": "fileChange" } } } ``` ``` { "id": 0, "method": "item/fileChange/requestApproval", "params": { "grantRoot": null, "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686", "reason": null, "threadId": "019a9e11-8295-7883-a283-779e06502c6f", "turnId": "1" } } ``` ``` { "id": 0, "result": { "decision": "accept" } } ``` ``` { "method": "item/completed", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "completed", "type": "fileChange" } } } ```
2025-11-19 20:13:31 -08:00
/// Turn ID that this patch belongs to.
/// Uses `#[serde(default)]` for backwards compatibility.
#[serde(default)]
pub turn_id: String,
/// Captured stdout (summary printed by apply_patch).
pub stdout: String,
/// Captured stderr (parser errors, IO failures, etc.).
pub stderr: String,
/// Whether the patch was applied successfully.
pub success: bool,
[app-server] feat: v2 apply_patch approval flow (#6760) This PR adds the API V2 version of the apply_patch approval flow, which centers around `ThreadItem::FileChange`. This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only) and related events (`item/started`, `item/completed` for `ThreadItem::FileChange`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. Similar to https://github.com/openai/codex/pull/6758, the approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add a few additional fields to `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those were fairly lightweight. However, the `EventMsg`s emitted by core are the following: ``` 1) Auto-approved (no request for approval)
 - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 2) Approved by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 3) Declined by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd ``` For a request triggering an approval, this would result in: ``` item/fileChange/requestApproval item/started item/completed ``` which is different from the `ThreadItem::CommandExecution` flow introduced in https://github.com/openai/codex/pull/6758, which does the below and is preferable: ``` item/started item/commandExecution/requestApproval item/completed ``` To fix this, we leverage `TurnSummaryStore` on codex_message_processor to store a little bit of state, allowing us to fire `item/started` and `item/fileChange/requestApproval` whenever we receive the underlying `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the `EventMsg::PatchApplyBegin` later. This is much less invasive than modifying the order of EventMsg within core (I tried). The resulting payloads: ``` { "method": "item/started", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "inProgress", "type": "fileChange" } } } ``` ``` { "id": 0, "method": "item/fileChange/requestApproval", "params": { "grantRoot": null, "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686", "reason": null, "threadId": "019a9e11-8295-7883-a283-779e06502c6f", "turnId": "1" } } ``` ``` { "id": 0, "result": { "decision": "accept" } } ``` ``` { "method": "item/completed", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "completed", "type": "fileChange" } } } ```
2025-11-19 20:13:31 -08:00
/// The changes that were applied (mirrors PatchApplyBeginEvent::changes).
#[serde(default)]
pub changes: HashMap<PathBuf, FileChange>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct TurnDiffEvent {
pub unified_diff: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
pub struct GetHistoryEntryResponseEvent {
pub offset: usize,
pub log_id: u64,
/// The entry at the requested offset, if available and parseable.
#[serde(skip_serializing_if = "Option::is_none")]
pub entry: Option<HistoryEntry>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct McpListToolsResponseEvent {
/// Fully qualified tool name -> tool definition.
pub tools: std::collections::HashMap<String, McpTool>,
/// Known resources grouped by server name.
pub resources: std::collections::HashMap<String, Vec<McpResource>>,
/// Known resource templates grouped by server name.
pub resource_templates: std::collections::HashMap<String, Vec<McpResourceTemplate>>,
/// Authentication status for each configured MCP server.
pub auth_statuses: std::collections::HashMap<String, McpAuthStatus>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct McpStartupUpdateEvent {
/// Server name being started.
pub server: String,
/// Current startup status.
pub status: McpStartupStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
#[serde(rename_all = "snake_case", tag = "state")]
#[ts(rename_all = "snake_case", tag = "state")]
pub enum McpStartupStatus {
Starting,
Ready,
Failed { error: String },
Cancelled,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, Default)]
pub struct McpStartupCompleteEvent {
pub ready: Vec<String>,
pub failed: Vec<McpStartupFailure>,
pub cancelled: Vec<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct McpStartupFailure {
pub server: String,
pub error: String,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
#[ts(rename_all = "snake_case")]
pub enum McpAuthStatus {
Unsupported,
NotLoggedIn,
BearerToken,
OAuth,
}
impl fmt::Display for McpAuthStatus {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let text = match self {
McpAuthStatus::Unsupported => "Unsupported",
McpAuthStatus::NotLoggedIn => "Not logged in",
McpAuthStatus::BearerToken => "Bearer token",
McpAuthStatus::OAuth => "OAuth",
};
f.write_str(text)
}
}
/// Response payload for `Op::ListCustomPrompts`.
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ListCustomPromptsResponseEvent {
pub custom_prompts: Vec<CustomPrompt>,
}
/// Response payload for `Op::ListSkills`.
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ListSkillsResponseEvent {
pub skills: Vec<SkillsListEntry>,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
#[ts(rename_all = "snake_case")]
pub enum SkillScope {
User,
Repo,
System,
Admin,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct SkillMetadata {
pub name: String,
pub description: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
/// Legacy short_description from SKILL.md. Prefer SKILL.json interface.short_description.
pub short_description: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub interface: Option<SkillInterface>,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub dependencies: Option<SkillDependencies>,
pub path: PathBuf,
pub scope: SkillScope,
pub enabled: bool,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, PartialEq, Eq)]
pub struct SkillInterface {
#[ts(optional)]
pub display_name: Option<String>,
#[ts(optional)]
pub short_description: Option<String>,
#[ts(optional)]
pub icon_small: Option<PathBuf>,
#[ts(optional)]
pub icon_large: Option<PathBuf>,
#[ts(optional)]
pub brand_color: Option<String>,
#[ts(optional)]
pub default_prompt: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, PartialEq, Eq)]
pub struct SkillDependencies {
pub tools: Vec<SkillToolDependency>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS, PartialEq, Eq)]
pub struct SkillToolDependency {
#[serde(rename = "type")]
#[ts(rename = "type")]
pub r#type: String,
pub value: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub description: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub transport: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub command: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub url: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct SkillErrorInfo {
pub path: PathBuf,
pub message: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct SkillsListEntry {
pub cwd: PathBuf,
pub skills: Vec<SkillMetadata>,
pub errors: Vec<SkillErrorInfo>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct SessionConfiguredEvent {
pub session_id: ThreadId,
#[serde(skip_serializing_if = "Option::is_none")]
pub forked_from_id: Option<ThreadId>,
/// Optional user-facing thread name (may be unset).
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub thread_name: Option<String>,
/// Tell the client what model is being queried.
pub model: String,
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
pub model_provider_id: String,
/// When to escalate for approval for execution
pub approval_policy: AskForApproval,
/// How to sandbox commands executed in the system
pub sandbox_policy: SandboxPolicy,
/// Working directory that should be treated as the *root* of the
/// session.
pub cwd: PathBuf,
/// The effort the model is putting into reasoning about the user's request.
#[serde(skip_serializing_if = "Option::is_none")]
pub reasoning_effort: Option<ReasoningEffortConfig>,
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
/// Identifier of the history log file (inode on Unix, 0 otherwise).
pub history_log_id: u64,
/// Current number of entries in the history log.
pub history_entry_count: usize,
Replay EventMsgs from Response Items when resuming a session with history. (#3123) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.
2025-09-03 21:47:00 -07:00
/// Optional initial messages (as events) for resumed sessions.
/// When present, UIs can use these to seed the history.
#[serde(skip_serializing_if = "Option::is_none")]
pub initial_messages: Option<Vec<EventMsg>>,
/// Path in which the rollout is stored. Can be `None` for ephemeral threads
#[serde(skip_serializing_if = "Option::is_none")]
pub rollout_path: Option<PathBuf>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct ThreadNameUpdatedEvent {
pub thread_id: ThreadId,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub thread_name: Option<String>,
}
/// User's decision in response to an ExecApprovalRequest.
#[derive(Debug, Default, Clone, Deserialize, Serialize, PartialEq, Eq, Display, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
pub enum ReviewDecision {
/// User has approved this command and the agent should execute it.
Approved,
/// User has approved this command and wants to apply the proposed execpolicy
/// amendment so future matching commands are permitted.
ApprovedExecpolicyAmendment {
proposed_execpolicy_amendment: ExecPolicyAmendment,
},
/// User has approved this command and wants to automatically approve any
/// future identical instances (`command` and `cwd` match exactly) for the
/// remainder of the session.
ApprovedForSession,
/// User has denied this command and the agent should not execute it, but
/// it should continue the session and try something else.
#[default]
Denied,
/// User has denied this command and the agent should not do anything until
/// the user's next command.
Abort,
}
2026-01-09 13:10:31 +00:00
impl ReviewDecision {
/// Returns an opaque version of the decision without PII. We can't use an ignored flag
/// on `serde` because the serialization is required by some surfaces.
pub fn to_opaque_string(&self) -> &'static str {
match self {
ReviewDecision::Approved => "approved",
ReviewDecision::ApprovedExecpolicyAmendment { .. } => "approved_with_amendment",
ReviewDecision::ApprovedForSession => "approved_for_session",
ReviewDecision::Denied => "denied",
ReviewDecision::Abort => "abort",
}
}
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
#[serde(tag = "type", rename_all = "snake_case")]
#[ts(tag = "type")]
pub enum FileChange {
Add {
content: String,
},
Delete {
content: String,
},
Update {
unified_diff: String,
move_path: Option<PathBuf>,
},
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct Chunk {
/// 1-based line index of the first line in the original file
pub orig_index: u32,
pub deleted_lines: Vec<String>,
pub inserted_lines: Vec<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, JsonSchema, TS)]
pub struct TurnAbortedEvent {
pub reason: TurnAbortReason,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
#[serde(rename_all = "snake_case")]
pub enum TurnAbortReason {
Interrupted,
Replaced,
ReviewEnded,
}
feat: emit events around collab tools (#9095) Emit the following events around the collab tools. On the `app-server` this will be under `item/started` and `item/completed` ``` #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the newly spawned agent, if it was created. pub new_thread_id: Option<ThreadId>, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, /// Last known status of the new agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingBeginEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingEndEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Last known status of the receiver agent reported to the sender agent before /// the close. pub status: AgentStatus, } ```
2026-01-14 17:55:57 +00:00
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
/// beginning.
pub prompt: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the newly spawned agent, if it was created.
pub new_thread_id: Option<ThreadId>,
/// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
/// beginning.
pub prompt: String,
/// Last known status of the new agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
/// leaking at the beginning.
pub prompt: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
/// leaking at the beginning.
pub prompt: String,
/// Last known status of the receiver agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingBeginEvent {
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
2026-01-16 12:05:04 +01:00
/// Thread ID of the receivers.
pub receiver_thread_ids: Vec<ThreadId>,
feat: emit events around collab tools (#9095) Emit the following events around the collab tools. On the `app-server` this will be under `item/started` and `item/completed` ``` #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the newly spawned agent, if it was created. pub new_thread_id: Option<ThreadId>, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, /// Last known status of the new agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingBeginEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingEndEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Last known status of the receiver agent reported to the sender agent before /// the close. pub status: AgentStatus, } ```
2026-01-14 17:55:57 +00:00
/// ID of the waiting call.
pub call_id: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingEndEvent {
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// ID of the waiting call.
pub call_id: String,
2026-01-16 12:05:04 +01:00
/// Last known status of the receiver agents reported to the sender agent.
pub statuses: HashMap<ThreadId, AgentStatus>,
feat: emit events around collab tools (#9095) Emit the following events around the collab tools. On the `app-server` this will be under `item/started` and `item/completed` ``` #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the newly spawned agent, if it was created. pub new_thread_id: Option<ThreadId>, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, /// Last known status of the new agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingBeginEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingEndEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Last known status of the receiver agent reported to the sender agent before /// the close. pub status: AgentStatus, } ```
2026-01-14 17:55:57 +00:00
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Last known status of the receiver agent reported to the sender agent before
/// the close.
pub status: AgentStatus,
}
#[cfg(test)]
mod tests {
use super::*;
use crate::items::UserMessageItem;
use crate::items::WebSearchItem;
use anyhow::Result;
use pretty_assertions::assert_eq;
use serde_json::json;
use tempfile::NamedTempFile;
#[test]
fn external_sandbox_reports_full_access_flags() {
let restricted = SandboxPolicy::ExternalSandbox {
network_access: NetworkAccess::Restricted,
};
assert!(restricted.has_full_disk_write_access());
assert!(!restricted.has_full_network_access());
let enabled = SandboxPolicy::ExternalSandbox {
network_access: NetworkAccess::Enabled,
};
assert!(enabled.has_full_disk_write_access());
assert!(enabled.has_full_network_access());
}
#[test]
fn item_started_event_from_web_search_emits_begin_event() {
let event = ItemStartedEvent {
thread_id: ThreadId::new(),
turn_id: "turn-1".into(),
item: TurnItem::WebSearch(WebSearchItem {
id: "search-1".into(),
query: "find docs".into(),
action: WebSearchAction::Search {
query: Some("find docs".into()),
queries: None,
},
}),
};
let legacy_events = event.as_legacy_events(false);
assert_eq!(legacy_events.len(), 1);
match &legacy_events[0] {
EventMsg::WebSearchBegin(event) => assert_eq!(event.call_id, "search-1"),
_ => panic!("expected WebSearchBegin event"),
}
}
#[test]
fn item_started_event_from_non_web_search_emits_no_legacy_events() {
let event = ItemStartedEvent {
thread_id: ThreadId::new(),
turn_id: "turn-1".into(),
item: TurnItem::UserMessage(UserMessageItem::new(&[])),
};
assert!(event.as_legacy_events(false).is_empty());
}
feat: expose outputSchema to user_turn/turn_start app_server API (#8377) What changed - Added `outputSchema` support to the app-server APIs, mirroring `codex exec --output-schema` behavior. - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final assistant message for that turn. - V2 `turn/start` now accepts `outputSchema` and constrains the final assistant message for that turn (explicitly per-turn only). Core behavior - `Op::UserTurn` already supported `final_output_json_schema`; now V1 `sendUserTurn` forwards `outputSchema` into that field. - `Op::UserInput` now carries `final_output_json_schema` for per-turn settings updates; core maps it into `SessionSettingsUpdate.final_output_json_schema` so it applies to the created turn context. - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext` (it’s applied only for the current turn). Other overrides (cwd/model/etc) keep their existing persistent behavior. API / docs - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema: Option<serde_json::Value>` to `SendUserTurnParams` (serialized as `outputSchema`). - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema: Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`). - `codex-rs/app-server/README.md`: document `outputSchema` for `turn/start` and clarify it applies only to the current turn. - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1 `sendUserTurn` and v2 `turn/start`. Tests added/updated - New app-server integration tests asserting `outputSchema` is forwarded into outbound `/responses` requests as `text.format`: - `codex-rs/app-server/tests/suite/output_schema.rs` - `codex-rs/app-server/tests/suite/v2/output_schema.rs` - Added per-turn semantics tests (schema does not leak to the next turn): - `send_user_turn_output_schema_is_per_turn_v1` - `turn_start_output_schema_is_per_turn_v2` - Added protocol wire-compat tests for the merged op: - serialize omits `final_output_json_schema` when `None` - deserialize works when field is missing - serialize includes `final_output_json_schema` when `Some(schema)` Call site updates (high level) - Updated all `Op::UserInput { .. }` constructions to include `final_output_json_schema`: - `codex-rs/app-server/src/codex_message_processor.rs` - `codex-rs/core/src/codex_delegate.rs` - `codex-rs/mcp-server/src/codex_tool_runner.rs` - `codex-rs/tui/src/chatwidget.rs` - `codex-rs/tui2/src/chatwidget.rs` - plus impacted core tests. Validation - `just fmt` - `cargo test -p codex-core` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` - `cargo test -p codex-tui` - `cargo test -p codex-tui2` - `cargo test -p codex-protocol` - `cargo clippy --all-features --tests --profile dev --fix -- -D warnings`
2026-01-05 10:27:00 -08:00
#[test]
fn user_input_serialization_omits_final_output_json_schema_when_none() -> Result<()> {
let op = Op::UserInput {
items: Vec::new(),
final_output_json_schema: None,
};
let json_op = serde_json::to_value(op)?;
assert_eq!(json_op, json!({ "type": "user_input", "items": [] }));
Ok(())
}
#[test]
fn user_input_deserializes_without_final_output_json_schema_field() -> Result<()> {
let op: Op = serde_json::from_value(json!({ "type": "user_input", "items": [] }))?;
assert_eq!(
op,
Op::UserInput {
items: Vec::new(),
final_output_json_schema: None,
}
);
Ok(())
}
#[test]
fn user_input_serialization_includes_final_output_json_schema_when_some() -> Result<()> {
let schema = json!({
"type": "object",
"properties": {
"answer": { "type": "string" }
},
"required": ["answer"],
"additionalProperties": false
});
let op = Op::UserInput {
items: Vec::new(),
final_output_json_schema: Some(schema.clone()),
};
let json_op = serde_json::to_value(op)?;
assert_eq!(
json_op,
json!({
"type": "user_input",
"items": [],
"final_output_json_schema": schema,
})
);
Ok(())
}
#[test]
fn user_input_text_serializes_empty_text_elements() -> Result<()> {
let input = UserInput::Text {
text: "hello".to_string(),
text_elements: Vec::new(),
};
let json_input = serde_json::to_value(input)?;
assert_eq!(
json_input,
json!({
"type": "text",
"text": "hello",
"text_elements": [],
})
);
Ok(())
}
#[test]
fn user_message_event_serializes_empty_metadata_vectors() -> Result<()> {
let event = UserMessageEvent {
message: "hello".to_string(),
images: None,
local_images: Vec::new(),
text_elements: Vec::new(),
};
let json_event = serde_json::to_value(event)?;
assert_eq!(
json_event,
json!({
"message": "hello",
"local_images": [],
"text_elements": [],
})
);
Ok(())
}
/// Serialize Event to verify that its JSON representation has the expected
/// amount of nesting.
#[test]
fn serialize_event() -> Result<()> {
let conversation_id = ThreadId::from_string("67e55044-10b1-426f-9247-bb680e5fe0c8")?;
let rollout_file = NamedTempFile::new()?;
let event = Event {
id: "1234".to_string(),
msg: EventMsg::SessionConfigured(SessionConfiguredEvent {
session_id: conversation_id,
forked_from_id: None,
thread_name: None,
model: "codex-mini-latest".to_string(),
model_provider_id: "openai".to_string(),
approval_policy: AskForApproval::Never,
sandbox_policy: SandboxPolicy::ReadOnly,
cwd: PathBuf::from("/home/user/project"),
reasoning_effort: Some(ReasoningEffortConfig::default()),
feat: record messages from user in ~/.codex/history.jsonl (#939) This is a large change to support a "history" feature like you would expect in a shell like Bash. History events are recorded in `$CODEX_HOME/history.jsonl`. Because it is a JSONL file, it is straightforward to append new entries (as opposed to the TypeScript file that uses `$CODEX_HOME/history.json`, so to be valid JSON, each new entry entails rewriting the entire file). Because it is possible for there to be multiple instances of Codex CLI writing to `history.jsonl` at once, we use advisory file locking when working with `history.jsonl` in `codex-rs/core/src/message_history.rs`. Because we believe history is a sufficiently useful feature, we enable it by default. Though to provide some safety, we set the file permissions of `history.jsonl` to be `o600` so that other users on the system cannot read the user's history. We do not yet support a default list of `SENSITIVE_PATTERNS` as the TypeScript CLI does: https://github.com/openai/codex/blob/3fdf9df1335ac9501e3fb0e61715359145711e8b/codex-cli/src/utils/storage/command-history.ts#L10-L17 We are going to take a more conservative approach to this list in the Rust CLI. For example, while `/\b[A-Za-z0-9-_]{20,}\b/` might exclude sensitive information like API tokens, it would also exclude valuable information such as references to Git commits. As noted in the updated documentation, users can opt-out of history by adding the following to `config.toml`: ```toml [history] persistence = "none" ``` Because `history.jsonl` could, in theory, be quite large, we take a[n arguably overly pedantic] approach in reading history entries into memory. Specifically, we start by telling the client the current number of entries in the history file (`history_entry_count`) as well as the inode (`history_log_id`) of `history.jsonl` (see the new fields on `SessionConfiguredEvent`). The client is responsible for keeping new entries in memory to create a "local history," but if the user hits up enough times to go "past" the end of local history, then the client should use the new `GetHistoryEntryRequest` in the protocol to fetch older entries. Specifically, it should pass the `history_log_id` it was given originally and work backwards from `history_entry_count`. (It should really fetch history in batches rather than one-at-a-time, but that is something we can improve upon in subsequent PRs.) The motivation behind this crazy scheme is that it is designed to defend against: * The `history.jsonl` being truncated during the session such that the index into the history is no longer consistent with what had been read up to that point. We do not yet have logic to enforce a `max_bytes` for `history.jsonl`, but once we do, we will aspire to implement it in a way that should result in a new inode for the file on most systems. * New items from concurrent Codex CLI sessions amending to the history. Because, in absence of truncation, `history.jsonl` is an append-only log, so long as the client reads backwards from `history_entry_count`, it should always get a consistent view of history. (That said, it will not be able to read _new_ commands from concurrent sessions, but perhaps we will introduce a `/` command to reload latest history or something down the road.) Admittedly, my testing of this feature thus far has been fairly light. I expect we will find bugs and introduce enhancements/fixes going forward.
2025-05-15 16:26:23 -07:00
history_log_id: 0,
history_entry_count: 0,
Replay EventMsgs from Response Items when resuming a session with history. (#3123) ### Overview This PR introduces the following changes: 1. Adds a unified mechanism to convert ResponseItem into EventMsg. 2. Ensures that when a session is initialized with initial history, a vector of EventMsg is sent along with the session configuration. This allows clients to re-render the UI accordingly. 3. Added integration testing ### Caveats This implementation does not send every EventMsg that was previously dispatched to clients. The excluded events fall into two categories: • “Arguably” rolled-out events Examples include tool calls and apply-patch calls. While these events are conceptually rolled out, we currently only roll out ResponseItems. These events are already being handled elsewhere and transformed into EventMsg before being sent. • Non-rolled-out events Certain events such as TurnDiff, Error, and TokenCount are not rolled out at all. ### Future Directions At present, resuming a session involves maintaining two states: • UI State Clients can replay most of the important UI from the provided EventMsg history. • Model State The model receives the complete session history to reconstruct its internal state. This design provides a solid foundation. If, in the future, more precise UI reconstruction is needed, we have two potential paths: 1. Introduce a third data structure that allows us to derive both ResponseItems and EventMsgs. 2. Clearly divide responsibilities: the core system ensures the integrity of the model state, while clients are responsible for reconstructing the UI.
2025-09-03 21:47:00 -07:00
initial_messages: None,
rollout_path: Some(rollout_file.path().to_path_buf()),
}),
};
let expected = json!({
"id": "1234",
"msg": {
"type": "session_configured",
"session_id": "67e55044-10b1-426f-9247-bb680e5fe0c8",
"model": "codex-mini-latest",
"model_provider_id": "openai",
"approval_policy": "never",
"sandbox_policy": {
"type": "read-only"
},
"cwd": "/home/user/project",
"reasoning_effort": "medium",
"history_log_id": 0,
"history_entry_count": 0,
"rollout_path": format!("{}", rollout_file.path().display()),
}
});
assert_eq!(expected, serde_json::to_value(&event)?);
Ok(())
}
#[test]
fn vec_u8_as_base64_serialization_and_deserialization() -> Result<()> {
let event = ExecCommandOutputDeltaEvent {
call_id: "call21".to_string(),
stream: ExecOutputStream::Stdout,
chunk: vec![1, 2, 3, 4, 5],
};
let serialized = serde_json::to_string(&event)?;
assert_eq!(
r#"{"call_id":"call21","stream":"stdout","chunk":"AQIDBAU="}"#,
serialized,
);
let deserialized: ExecCommandOutputDeltaEvent = serde_json::from_str(&serialized)?;
assert_eq!(deserialized, event);
Ok(())
}
#[test]
fn serialize_mcp_startup_update_event() -> Result<()> {
let event = Event {
id: "init".to_string(),
msg: EventMsg::McpStartupUpdate(McpStartupUpdateEvent {
server: "srv".to_string(),
status: McpStartupStatus::Failed {
error: "boom".to_string(),
},
}),
};
let value = serde_json::to_value(&event)?;
assert_eq!(value["msg"]["type"], "mcp_startup_update");
assert_eq!(value["msg"]["server"], "srv");
assert_eq!(value["msg"]["status"]["state"], "failed");
assert_eq!(value["msg"]["status"]["error"], "boom");
Ok(())
}
#[test]
fn serialize_mcp_startup_complete_event() -> Result<()> {
let event = Event {
id: "init".to_string(),
msg: EventMsg::McpStartupComplete(McpStartupCompleteEvent {
ready: vec!["a".to_string()],
failed: vec![McpStartupFailure {
server: "b".to_string(),
error: "bad".to_string(),
}],
cancelled: vec!["c".to_string()],
}),
};
let value = serde_json::to_value(&event)?;
assert_eq!(value["msg"]["type"], "mcp_startup_complete");
assert_eq!(value["msg"]["ready"][0], "a");
assert_eq!(value["msg"]["failed"][0]["server"], "b");
assert_eq!(value["msg"]["failed"][0]["error"], "bad");
assert_eq!(value["msg"]["cancelled"][0], "c");
Ok(())
}
}