core-agent-ide

Author	SHA1	Message	Date
Matthew Zeng	9f1009540b	Upgrade rmcp to 0.14 (#10718 ) - [x] Upgrade rmcp to 0.14	2026-02-08 15:07:53 -08:00
Eric Traut	10336068db	Fix flaky windows CI test (#10993 ) Hardens PTY Python REPL test and make MCP test startup deterministic Summary - `utils/pty/src/tests.rs` - Added a REPL readiness handshake (`wait_for_python_repl_ready`) that repeatedly sends a marker and waits for it in PTY output before sending test commands. - Updated `pty_python_repl_emits_output_and_exits` to: - wait for readiness first, - preserve startup output, - append output collected through process exit. - Reduces Windows/ConPTY flakiness from early stdin writes racing REPL startup. - `mcp-server/tests/suite/codex_tool.rs` - Avoid remote model refresh during MCP test startup, reducing timeout-prone nondeterminism.	2026-02-07 08:55:42 -08:00
jif-oai	62605fa471	Add resume_agent collab tool (#10903 ) Summary - add the new resume_agent collab tool path through core, protocol, and the app server API, including the resume events - update the schema/TypeScript definitions plus docs so resume_agent appears in generated artifacts and README - note that resumed agents rehydrate rollout history without overwriting their base instructions Testing - Not run (not requested)	2026-02-07 17:31:45 +01:00
xl-openai	f38d181795	feat: add APIs to list and download public remote skills (#10448 ) Add API to list / download from remote public skills	2026-02-03 14:09:37 -08:00
jif-oai	d2394a2494	chore: nuke chat/completions API (#10157 )	2026-02-03 11:31:57 +00:00
Michael Bolin	66447d5d2c	feat: replace custom mcp-types crate with equivalents from rmcp (#10349 ) We started working with MCP in Codex before https://crates.io/crates/rmcp was mature, so we had our own crate for MCP types that was generated from the MCP schema: `8b95d3e082/codex-rs/mcp-types/README.md` Now that `rmcp` is more mature, it makes more sense to use their MCP types in Rust, as they handle details (like the `_meta` field) that our custom version ignored. Though one advantage that our custom types had is that our generated types implemented `JsonSchema` and `ts_rs::TS`, whereas the types in `rmcp` do not. As such, part of the work of this PR is leveraging the adapters between `rmcp` types and the serializable types that are API for us (app server and MCP) introduced in #10356. Note this PR results in a number of changes to `codex-rs/app-server-protocol/schema`, which merit special attention during review. We must ensure that these changes are still backwards-compatible, which is possible because we have: ```diff - export type CallToolResult = { content: Array<ContentBlock>, isError?: boolean, structuredContent?: JsonValue, }; + export type CallToolResult = { content: Array<JsonValue>, structuredContent?: JsonValue, isError?: boolean, _meta?: JsonValue, }; ``` so `ContentBlock` has been replaced with the more general `JsonValue`. Note that `ContentBlock` was defined as: ```typescript export type ContentBlock = TextContent \| ImageContent \| AudioContent \| ResourceLink \| EmbeddedResource; ``` so the deletion of those individual variants should not be a cause of great concern. Similarly, we have the following change in `codex-rs/app-server-protocol/schema/typescript/Tool.ts`: ``` - export type Tool = { annotations?: ToolAnnotations, description?: string, inputSchema: ToolInputSchema, name: string, outputSchema?: ToolOutputSchema, title?: string, }; + export type Tool = { name: string, title?: string, description?: string, inputSchema: JsonValue, outputSchema?: JsonValue, annotations?: JsonValue, icons?: Array<JsonValue>, _meta?: JsonValue, }; ``` so: - `annotations?: ToolAnnotations` ➡️ `JsonValue` - `inputSchema: ToolInputSchema` ➡️ `JsonValue` - `outputSchema?: ToolOutputSchema` ➡️ `JsonValue` and two new fields: `icons?: Array<JsonValue>, _meta?: JsonValue` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/10349). * #10357 * __->__ #10349 * #10356	2026-02-02 17:41:55 -08:00
Charley Cunningham	ec4a2d07e4	Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786 ) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - Normal text → `AgentMessageContentDelta` - Plan text → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`	2026-01-30 18:59:30 +00:00
pap-openai	1ef5455eb6	Conversation naming (#8991 ) Session renaming: - `/rename my_session` - `/rename` without arg and passing an argument in `customViewPrompt` - AppExitInfo shows resume hint using the session name if set instead of uuid, defaults to uuid if not set - Names are stored in `CODEX_HOME/sessions.jsonl` Session resuming: - codex resume <name> lookup for `CODEX_HOME/sessions.jsonl` first entry matching the name and resumes the session --------- Co-authored-by: jif-oai <jif@openai.com>	2026-01-30 10:40:09 +00:00
jif-oai	d594693d1a	feat: dynamic tools injection (#9539 ) ## Summary Add dynamic tool injection to thread startup in API v2, wire dynamic tool calls through the app server to clients, and plumb responses back into the model tool pipeline. ### Flow (high level) - Thread start injects `dynamic_tools` into the model tool list for that thread (validation is done here). - When the model emits a tool call for one of those names, core raises a `DynamicToolCallRequest` event. - The app server forwards it to the client as `item/tool/call`, waits for the client’s response, then submits a `DynamicToolResponse` back to core. - Core turns that into a `function_call_output` in the next model request so the model can continue. ### What changed - Added dynamic tool specs to v2 thread start params and protocol types; introduced `item/tool/call` (request/response) for dynamic tool execution. - Core now registers dynamic tool specs at request time and routes those calls via a new dynamic tool handler. - App server validates tool names/schemas, forwards dynamic tool call requests to clients, and publishes tool outputs back into the session. - Integration tests	2026-01-26 10:06:44 +00:00
jif-oai	83775f4df1	feat: ephemeral threads (#9765 ) Add ephemeral threads capabilities. Only exposed through the `app-server` v2 The idea is to disable the rollout recorder for those threads.	2026-01-24 14:57:40 +00:00
charley-oai	eb90e20c0b	Persist text elements through TUI input and history (#9393 ) Continuation of breaking up this PR https://github.com/openai/codex/pull/9116 ## Summary - Thread user text element ranges through TUI/TUI2 input, submission, queueing, and history so placeholders survive resume/edit flows. - Preserve local image attachments alongside text elements and rehydrate placeholders when restoring drafts. - Keep model-facing content shapes clean by attaching UI metadata only to user input/events (no API content changes). ## Key Changes - TUI/TUI2 composer now captures text element ranges, trims them with text edits, and restores them when submission is suppressed. - User history cells render styled spans for text elements and keep local image paths for future rehydration. - Initial chat widget bootstraps accept empty `initial_text_elements` to keep initialization uniform. - Protocol/core helpers updated to tolerate the new InputText field shape without changing payloads sent to the API.	2026-01-19 23:49:34 -08:00
Shijie Rao	57ec3a8277	Feat: request user input tool (#9472 ) ### Summary * Add `requestUserInput` tool that the model can use for gather feedback/asking question mid turn. ### Tool input schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput input", "type": "object", "additionalProperties": false, "required": ["questions"], "properties": { "questions": { "type": "array", "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.", "minItems": 1, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["id", "header", "question"], "properties": { "id": { "type": "string", "description": "Stable identifier for mapping answers (snake_case)." }, "header": { "type": "string", "description": "Short header label shown in the UI (12 or fewer chars)." }, "question": { "type": "string", "description": "Single-sentence prompt shown to the user." }, "options": { "type": "array", "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.", "minItems": 2, "maxItems": 3, "items": { "type": "object", "additionalProperties": false, "required": ["value", "label", "description"], "properties": { "value": { "type": "string", "description": "Machine-readable value (snake_case)." }, "label": { "type": "string", "description": "User-facing label (1-5 words)." }, "description": { "type": "string", "description": "One short sentence explaining impact/tradeoff if selected." } } } } } } } } } ``` ### Tool output schema ``` { "$schema": "http://json-schema.org/draft-07/schema#", "title": "requestUserInput output", "type": "object", "additionalProperties": false, "required": ["answers"], "properties": { "answers": { "type": "object", "description": "Map of question id to user answer.", "additionalProperties": { "type": "object", "additionalProperties": false, "required": ["selected"], "properties": { "selected": { "type": "array", "items": { "type": "string" } }, "other": { "type": ["string", "null"] } } } } } } ```	2026-01-19 10:17:30 -08:00
Anton Panasenko	c26fe64539	feat: show forked from session id in /status (#9330 ) Summary: - Add forked_from to SessionMeta/SessionConfiguredEvent and persist it for forked sessions. - Surface forked_from in /status for tui + tui2 and add snapshots.	2026-01-16 13:41:46 -08:00
Michael Bolin	99f47d6e9a	fix(mcp): include threadId in both content and structuredContent in CallToolResult (#9338 )	2026-01-15 18:33:11 -08:00
charley-oai	4a9c2bcc5a	Add text element metadata to types (#9235 ) Initial type tweaking PR to make the diff of https://github.com/openai/codex/pull/9116 smaller This should not change any behavior, just adds some fields to types	2026-01-14 16:41:50 -08:00
jif-oai	6a939ed7a4	feat: emit events around collab tools (#9095 ) Emit the following events around the collab tools. On the `app-server` this will be under `item/started` and `item/completed` ``` #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentSpawnEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the newly spawned agent, if it was created. pub new_thread_id: Option<ThreadId>, /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the /// beginning. pub prompt: String, /// Last known status of the new agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabAgentInteractionEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT /// leaking at the beginning. pub prompt: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingBeginEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabWaitingEndEvent { /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// ID of the waiting call. pub call_id: String, /// Last known status of the receiver agent reported to the sender agent. pub status: AgentStatus, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseBeginEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, } #[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)] pub struct CollabCloseEndEvent { /// Identifier for the collab tool call. pub call_id: String, /// Thread ID of the sender. pub sender_thread_id: ThreadId, /// Thread ID of the receiver. pub receiver_thread_id: ThreadId, /// Last known status of the receiver agent reported to the sender agent before /// the close. pub status: AgentStatus, } ```	2026-01-14 17:55:57 +00:00
Michael Bolin	0c09dc3c03	feat: add threadId to MCP server messages (#9192 ) This favors `threadId` instead of `conversationId` so we use the same terms as https://developers.openai.com/codex/sdk/. To test the local build: ``` cd codex-rs cargo build --bin codex npx -y @modelcontextprotocol/inspector ./target/debug/codex mcp-server ``` I sent: ```json { "method": "tools/call", "params": { "name": "codex", "arguments": { "prompt": "favorite ls option?" }, "_meta": { "progressToken": 0 } } } ``` and got: ```json { "content": [ { "type": "text", "text": "`ls -lah` (or `ls -alh`) — long listing, includes dotfiles, human-readable sizes." } ], "structuredContent": { "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e" } } ``` and successfully used the `threadId` in the follow-up with the `codex-reply` tool call: ```json { "method": "tools/call", "params": { "name": "codex-reply", "arguments": { "prompt": "what is the long versoin", "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e" }, "_meta": { "progressToken": 1 } } } ``` whose response also has the `threadId`: ```json { "content": [ { "type": "text", "text": "Long listing is `ls -l` (adds permissions, owner/group, size, timestamp)." } ], "structuredContent": { "threadId": "019bbb20-bff6-7130-83aa-bf45ab33250e" } } ``` Fixes https://github.com/openai/codex/issues/3712.	2026-01-13 22:14:41 -08:00
Ahmed Ibrahim	87f7226cca	Assemble sandbox/approval/network prompts dynamically (#8961 ) - Add a single builder for developer permissions messaging that accepts SandboxPolicy and approval policy. This builder now drives the developer “permissions” message that’s injected at session start and any time sandbox/approval settings change. - Trim EnvironmentContext to only include cwd, writable roots, and shell; removed sandbox/approval/network duplication and adjusted XML serialization and tests accordingly. Follow-up: adding a config value to replace the developer permissions message for custom sandboxes.	2026-01-12 23:12:59 +00:00
zbarsky-openai	2a06d64bc9	feat: add support for building with Bazel (#8875 ) This PR configures Codex CLI so it can be built with [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc` includes configuration so that remote builds can be done using [BuildBuddy](https://www.buildbuddy.io). If you are familiar with Bazel, things should work as you expect, e.g., run `bazel test //... --keep-going` to run all the tests in the repo, but we have also added some new aliases in the `justfile` for convenience: - `just bazel-test` to run tests locally - `just bazel-remote-test` to run tests remotely (currently, the remote build is for x86_64 Linux regardless of your host platform). Note we are currently seeing the following test failures in the remote build, so we still need to figure out what is happening here: ``` failures: suite::compact::manual_compact_twice_preserves_latest_user_messages suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view ``` - `just build-for-release` to build release binaries for all platforms/architectures remotely To setup remote execution: - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI employees should also request org access at https://openai.buildbuddy.io/join/ with their `@openai.com` email address.) - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to `~/.bazelrc` (add the line `build --remote_header=x-buildbuddy-api-key=YOUR_KEY`) - Use `--config=remote` in your `bazel` invocations (or add `common --config=remote` to your `~/.bazelrc`, or use the `just` commands) ## CI In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners (we are working on supporting Windows, but that is not ready yet). Note that the failures we are seeing in `just bazel-remote-test` do not occur on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml` is green right now. The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so that macOS CI jobs build _remotely_ on Linux hosts (using the `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the root `BUILD.bazel`) using cross-compilation to build the macOS artifacts. Then these artifacts are downloaded locally to GitHub's macOS runner so the tests can be executed natively. This is the relevant config that enables this: ``` common:macos --config=remote common:macos --strategy=remote common:macos --strategy=TestRunner=darwin-sandbox,local ``` Because of the remote caching benefits we get from BuildBuddy, these new CI jobs can be extremely fast! For example, consider these two jobs that ran all the tests on Linux x86_64: - Bazel 1m37s https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875 - Cargo 9m20s https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875 For now, we will continue to run both the Bazel and Cargo jobs for PRs, but once we add support for Windows and running Clippy, we should be able to cutover to using Bazel exclusively for PRs, which should still speed things up considerably. We will probably continue to run the Cargo jobs post-merge for commits that land on `main` as a sanity check. Release builds will also continue to be done by Cargo for now. Earlier attempt at this PR: https://github.com/openai/codex/pull/8832 Earlier attempt to add support for Buck2, now abandoned: https://github.com/openai/codex/pull/8504 --------- Co-authored-by: David Zbarsky <dzbarsky@gmail.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-01-09 11:09:43 -08:00
jif-oai	1aed01e99f	renaming: task to turn (#8963 )	2026-01-09 17:31:17 +00:00
pakrym-oai	018de994b0	Stop using AuthManager as the source of codex_home (#8846 )	2026-01-07 18:56:20 +00:00
jif-oai	116059c3a0	chore: unify conversation with thread name (#8830 ) Done and verified by Codex + refactor feature of RustRover	2026-01-07 17:04:53 +00:00
Owen Lin	8b7ec31ba7	feat(app-server): thread/rollback API (#8454 ) Add `thread/rollback` to app-server to support IDEs undo-ing the last N turns of a thread. For context, an IDE partner will be supporting an "undo" capability where the IDE (the app-server client) will be responsible for reverting the local changes made during the last turn. To support this well, we also need a way to drop the last turn (or more generally, the last N turns) from the agent's context. This is what `thread/rollback` does. Core idea: A Thread rollback is represented as a persisted event message (EventMsg::ThreadRollback) in the rollout JSONL file, not by rewriting history. On resume, both the model's context (core replay) and the UI turn list (app-server v2's thread history builder) apply these markers so the pruned history is consistent across live conversations and `thread/resume`. Implementation notes: - Rollback only affects agent context and appends to the rollout file; clients are responsible for reverting files on disk. - If a thread rollback is currently in progress, subsequent `thread/rollback` calls are rejected. - Because we use `CodexConversation::submit` and codex core tracks active turns, returning an error on concurrent rollbacks is communicated via an `EventMsg::Error` with a new variant `CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and sends the BAD_REQUEST RPC response. Tests cover thread rollbacks in both core and app-server, including when `num_turns` > existing turns (which clears all turns). Note: this explicitly does not behave like `/undo` which we just removed from the CLI, which does the opposite of what `thread/rollback` does. `/undo` reverts local changes via ghost commits/snapshots and does not modify the agent's context / conversation history.	2026-01-06 21:23:48 +00:00
Abdelkader Boudih	06e21c7a65	fix: update model examples to gpt-5.2 (#8566 ) The models are outdated and sometime get used by GPT when it to try delegate. I have read the CLA Document and I hereby sign the CLA	2026-01-06 08:47:29 -07:00
Anton Panasenko	807f8a43c2	feat: expose outputSchema to user_turn/turn_start app_server API (#8377 ) What changed - Added `outputSchema` support to the app-server APIs, mirroring `codex exec --output-schema` behavior. - V1 `sendUserTurn` now accepts `outputSchema` and constrains the final assistant message for that turn. - V2 `turn/start` now accepts `outputSchema` and constrains the final assistant message for that turn (explicitly per-turn only). Core behavior - `Op::UserTurn` already supported `final_output_json_schema`; now V1 `sendUserTurn` forwards `outputSchema` into that field. - `Op::UserInput` now carries `final_output_json_schema` for per-turn settings updates; core maps it into `SessionSettingsUpdate.final_output_json_schema` so it applies to the created turn context. - V2 `turn/start` does NOT persist the schema via `OverrideTurnContext` (it’s applied only for the current turn). Other overrides (cwd/model/etc) keep their existing persistent behavior. API / docs - `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema: Option<serde_json::Value>` to `SendUserTurnParams` (serialized as `outputSchema`). - `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema: Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`). - `codex-rs/app-server/README.md`: document `outputSchema` for `turn/start` and clarify it applies only to the current turn. - `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1 `sendUserTurn` and v2 `turn/start`. Tests added/updated - New app-server integration tests asserting `outputSchema` is forwarded into outbound `/responses` requests as `text.format`: - `codex-rs/app-server/tests/suite/output_schema.rs` - `codex-rs/app-server/tests/suite/v2/output_schema.rs` - Added per-turn semantics tests (schema does not leak to the next turn): - `send_user_turn_output_schema_is_per_turn_v1` - `turn_start_output_schema_is_per_turn_v2` - Added protocol wire-compat tests for the merged op: - serialize omits `final_output_json_schema` when `None` - deserialize works when field is missing - serialize includes `final_output_json_schema` when `Some(schema)` Call site updates (high level) - Updated all `Op::UserInput { .. }` constructions to include `final_output_json_schema`: - `codex-rs/app-server/src/codex_message_processor.rs` - `codex-rs/core/src/codex_delegate.rs` - `codex-rs/mcp-server/src/codex_tool_runner.rs` - `codex-rs/tui/src/chatwidget.rs` - `codex-rs/tui2/src/chatwidget.rs` - plus impacted core tests. Validation - `just fmt` - `cargo test -p codex-core` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` - `cargo test -p codex-tui` - `cargo test -p codex-tui2` - `cargo test -p codex-protocol` - `cargo clippy --all-features --tests --profile dev --fix -- -D warnings`	2026-01-05 10:27:00 -08:00
Michael Bolin	e61bae12e3	feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496 ) This PR introduces a `codex-utils-cargo-bin` utility crate that wraps/replaces our use of `assert_cmd::Command` and `escargot::CargoBuild`. As you can infer from the introduction of `buck_project_root()` in this PR, I am attempting to make it possible to build Codex under [Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to achieve faster incremental local builds (largely due to Buck2's [dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/) build strategy, as well as benefits from its local build daemon) as well as faster CI builds if we invest in remote execution and caching. See https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages for more details about the performance advantages of Buck2. Buck2 enforces stronger requirements in terms of build and test isolation. It discourages assumptions about absolute paths (which is key to enabling remote execution). Because the `CARGO_BIN_EXE_` environment variables that Cargo provides are absolute paths (which `assert_cmd::Command` reads), this is a problem for Buck2, which is why we need this `codex-utils-cargo-bin` utility. My WIP-Buck2 setup sets the `CARGO_BIN_EXE_` environment variables passed to a `rust_test()` build rule as relative paths. `codex-utils-cargo-bin` will resolve these values to absolute paths, when necessary. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496). * #8498 * __->__ #8496	2025-12-23 19:29:32 -08:00
Michael Bolin	a8797019a1	chore: cleanup Config instantiation codepaths (#8226 ) This PR does various types of cleanup before I can proceed with more ambitious changes to config loading. First, I noticed duplicated code across these two methods: `774bd9e432/codex-rs/core/src/config/mod.rs (L314-L324)` `774bd9e432/codex-rs/core/src/config/mod.rs (L334-L344)` This has now been consolidated in `load_config_as_toml_with_cli_overrides()`. Further, I noticed that `Config::load_with_cli_overrides()` took two similar arguments: `774bd9e432/codex-rs/core/src/config/mod.rs (L308-L311)` The difference between `cli_overrides` and `overrides` was not immediately obvious to me. At first glance, it appears that one should be able to be expressed in terms of the other, but it turns out that some fields of `ConfigOverrides` (such as `cwd` and `codex_linux_sandbox_exe`) are, by design, not configurable via a `.toml` file or a command-line `--config` flag. That said, I discovered that many callers of `Config::load_with_cli_overrides()` were passing `ConfigOverrides::default()` for `overrides`, so I created two separate methods: - `Config::load_with_cli_overrides(cli_overrides: Vec<(String, TomlValue)>)` - `Config::load_with_cli_overrides_and_harness_overrides(cli_overrides: Vec<(String, TomlValue)>, harness_overrides: ConfigOverrides)` The latter has a long name, as it is _not_ what should be used in the common case, so the extra typing is designed to draw attention to this fact. I tried to update the existing callsites to use the shorter name, where possible. Further, in the cases where `ConfigOverrides` is used, usually only a limited subset of fields are actually set, so I updated the declarations to leverage `..Default::default()` where possible.	2025-12-17 18:01:17 -08:00
xl-openai	4897efcced	Add public skills + improve repo skill discovery and error UX (#8098 ) 1. Adds SkillScope::Public end-to-end (core + protocol) and loads skills from the public cache directory 2. Improves repo skill discovery by searching upward for the nearest .codex/skills within a git repo 3. Deduplicates skills by name with deterministic ordering to avoid duplicates across sources 4. Fixes garbled “Skill errors” overlay rendering by preventing pending history lines from being injected during the modal 5. Updates the project docs “Skills” intro wording to avoid hardcoded paths	2025-12-17 01:35:49 -08:00
xl-openai	5d77d4db6b	Reimplement skills loading using SkillsManager + skills/list op. (#7914 ) refactor the way we load and manage skills: 1. Move skill discovery/caching into SkillsManager and reuse it across sessions. 2. Add the skills/list API (Op::ListSkills/SkillsListResponse) to fetch skills for one or more cwds. Also update app-server for VSCE/App; 3. Trigger skills/list during session startup so UIs preload skills and handle errors immediately.	2025-12-14 09:58:17 -08:00
jif-oai	29381ba5c2	feat: add shell snapshot for shell command (#7786 )	2025-12-11 13:46:43 +00:00
xl-openai	b36ecb6c32	Inject SKILL.md when it's explicitly mentioned. (#7763 ) 1. Skills load once in core at session start; the cached outcome is reused across core and surfaced to TUI via SessionConfigured. 2. TUI detects explicit skill selections, and core injects the matching SKILL.md content into the turn when a selected skill is present.	2025-12-10 13:59:17 -08:00
Eric Traut	c4af707e09	Removed experimental "command risk assessment" feature (#7799 ) This experimental feature received lukewarm reception during internal testing. Removing from the code base.	2025-12-10 09:48:11 -08:00
jif-oai	0ad54982ae	chore: rework unified exec events (#7775 )	2025-12-10 10:30:38 +00:00
zhao-oai	3d35cb4619	Refactor execpolicy fallback evaluation (#7544 ) ## Refactor of the `execpolicy` crate To illustrate why we need this refactor, consider an agent attempting to run `apple \| rm -rf ./`. Suppose `apple` is allowed by `execpolicy`. Before this PR, `execpolicy` would consider `apple` and `pear` and only render one rule match: `Allow`. We would skip any heuristics checks on `rm -rf ./` and immediately approve `apple \| rm -rf ./` to run. To fix this, we now thread a `fallback` evaluation function into `execpolicy` that runs when no `execpolicy` rules match a given command. In our example, we would run `fallback` on `rm -rf ./` and prevent `apple \| rm -rf ./` from being run without approval.	2025-12-03 23:39:48 -08:00
zhao-oai	e925a380dc	whitelist command prefix integration in core and tui (#7033 ) this PR enables TUI to approve commands and add their prefixes to an allowlist: <img width="708" height="605" alt="Screenshot 2025-11-21 at 4 18 07 PM" src="https://github.com/user-attachments/assets/56a19893-4553-4770-a881-becf79eeda32" /> note: we only show the option to whitelist the command when 1) command is not multi-part (e.g `git add -A && git commit -m 'hello world'`) 2) command is not already matched by an existing rule	2025-12-03 23:17:02 -08:00
Ahmed Ibrahim	00cc00ead8	Introduce `ModelsManager` and migrate `app-server` to use it. (#7552 )	2025-12-03 17:17:56 -08:00
Ahmed Ibrahim	71504325d3	Migrate model preset (#7542 ) - Introduce `openai_models` in `/core` - Move `PRESETS` under it - Move `ModelPreset`, `ModelUpgrade`, `ReasoningEffortPreset`, `ReasoningEffortPreset`, and `ReasoningEffortPreset` to `protocol` - Introduce `Op::ListModels` and `EventMsg::AvailableModels` Next steps: - migrate `app-server` and `tui` to use the introduced Operation	2025-12-03 20:30:43 +00:00
jif-oai	9ba27cfa0a	feat: add compaction event (#7289 )	2025-11-25 16:12:14 +00:00
Josh McKinney	ec49b56874	chore: add cargo-deny configuration (#7119 ) - add GitHub workflow running cargo-deny on push/PR - document cargo-deny allowlist with workspace-dep notes and advisory ignores - align workspace crates to inherit version/edition/license for consistent checks	2025-11-24 12:22:18 -08:00
Jeremy Rose	7561a6aaf0	support MCP elicitations (#6947 ) No support for request schema yet, but we'll at least show the message and allow accept/decline. <img width="823" height="551" alt="Screenshot 2025-11-21 at 2 44 05 PM" src="https://github.com/user-attachments/assets/6fbb892d-ca12-4765-921e-9ac4b217534d" />	2025-11-21 14:44:53 -08:00
pakrym-oai	767b66f407	Migrate coverage to shell_command (#7042 )	2025-11-21 03:44:00 +00:00
Owen Lin	d6c30ed25e	[app-server] feat: v2 apply_patch approval flow (#6760 ) This PR adds the API V2 version of the apply_patch approval flow, which centers around `ThreadItem::FileChange`. This PR wires the new RPC (`item/fileChange/requestApproval`, V2 only) and related events (`item/started`, `item/completed` for `ThreadItem::FileChange`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. Similar to https://github.com/openai/codex/pull/6758, the approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add a few additional fields to `EventMsg::PatchApplyBegin` and `EventMsg::PatchApplyEnd`, but those were fairly lightweight. However, the `EventMsg`s emitted by core are the following: ``` 1) Auto-approved (no request for approval)  - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 2) Approved by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd 3) Declined by user - EventMsg::ApplyPatchApprovalRequest - EventMsg::PatchApplyBegin - EventMsg::PatchApplyEnd ``` For a request triggering an approval, this would result in: ``` item/fileChange/requestApproval item/started item/completed ``` which is different from the `ThreadItem::CommandExecution` flow introduced in https://github.com/openai/codex/pull/6758, which does the below and is preferable: ``` item/started item/commandExecution/requestApproval item/completed ``` To fix this, we leverage `TurnSummaryStore` on codex_message_processor to store a little bit of state, allowing us to fire `item/started` and `item/fileChange/requestApproval` whenever we receive the underlying `EventMsg::ApplyPatchApprovalRequest`, and no-oping when we receive the `EventMsg::PatchApplyBegin` later. This is much less invasive than modifying the order of EventMsg within core (I tried). The resulting payloads: ``` { "method": "item/started", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "inProgress", "type": "fileChange" } } } ``` ``` { "id": 0, "method": "item/fileChange/requestApproval", "params": { "grantRoot": null, "itemId": "call_Nxnwj7B3YXigfV6Mwh03d686", "reason": null, "threadId": "019a9e11-8295-7883-a283-779e06502c6f", "turnId": "1" } } ``` ``` { "id": 0, "result": { "decision": "accept" } } ``` ``` { "method": "item/completed", "params": { "item": { "changes": [ { "diff": "Hello from Codex!\n", "kind": "add", "path": "/Users/owen/repos/codex/codex-rs/APPROVAL_DEMO.txt" } ], "id": "call_Nxnwj7B3YXigfV6Mwh03d686", "status": "completed", "type": "fileChange" } } } ```	2025-11-19 20:13:31 -08:00
Michael Bolin	a75321a64c	fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847 ) This adds the following fields to `ThreadStartResponse` and `ThreadResumeResponse`: ```rust pub model: String, pub model_provider: String, pub cwd: PathBuf, pub approval_policy: AskForApproval, pub sandbox: SandboxPolicy, pub reasoning_effort: Option<ReasoningEffort>, ``` This is important because these fields are optional in `ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be able to determine what values were ultimately used to start/resume the conversation. (Though note that any of these could be changed later between turns in the conversation.) Though to get this information reliably, it must be read from the internal `SessionConfiguredEvent` that is created in response to the start of a conversation. Because `SessionConfiguredEvent` (as defined in `codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a number of them had to be added as part of this PR. Because `SessionConfiguredEvent` is referenced in many tests, test instances of `SessionConfiguredEvent` had to be updated, as well, which is why this PR touches so many files.	2025-11-18 21:18:43 -08:00
Owen Lin	cecbd5b021	[app-server] feat: add v2 command execution approval flow (#6758 ) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```	2025-11-18 00:23:54 +00:00
Jeremy Rose	03ffe4d595	core/tui: non-blocking MCP startup (#6334 ) This makes MCP startup not block TUI startup. Messages sent while MCPs are booting will be queued. https://github.com/user-attachments/assets/96e1d234-5d8f-4932-a935-a675d35c05e0 Fixes #6317 --------- Co-authored-by: pakrym-oai <pakrym@openai.com>	2025-11-17 11:26:11 -08:00
Ahmed Ibrahim	c8ebb2a0dc	Add warning on compact (#6052 ) This PR introduces the ability for `core` to send `warnings` as it can send `errors. It also sends a warning on compaction. <img width="811" height="187" alt="image" src="https://github.com/user-attachments/assets/0947a42d-b720-420d-b7fd-115f8a65a46a" />	2025-10-31 13:27:33 -07:00
Celia Chen	6ef658a9f9	[Hygiene] Remove `include_view_image_tool` config (#5976 ) There's still some debate about whether we want to expose `tools.view_image` or `feature.view_image` so those are left unchanged for now, but this old `include_view_image_tool` config is good-to-go. Also updated the doc to reflect that `view_image` tool is now by default true.	2025-10-30 13:23:24 -07:00
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
jif-oai	f4f9695978	feat: compaction prompt configurable (#5959 ) ``` codex -c compact_prompt="Summarize in bullet points" ```	2025-10-30 14:24:24 +00:00
pakrym-oai	3429e82e45	Add item streaming events (#5546 ) Adds AgentMessageContentDelta, ReasoningContentDelta, ReasoningRawContentDelta item streaming events while maintaining compatibility for old events. --------- Co-authored-by: Owen Lin <owen@openai.com>	2025-10-29 22:33:57 +00:00

1 2 3 4 5

224 commits