core-agent-ide

Author	SHA1	Message	Date
Matthew Zeng	45b7763c3f	[apps] Improve app loading. (#10994 ) There are two concepts of apps that we load in the harness: - Directory apps, which is all the apps that the user can install. - Accessible apps, which is what the user actually installed and can be $ inserted and be used by the model. These are extracted from the tools that are loaded through the gateway MCP. Previously we wait for both sets of apps before returning the full apps list. Which causes many issues because accessible apps won't be available to the UI or the model if directory apps aren't loaded or failed to load. In this PR we are separating them so that accessible apps can be loaded separately and are instantly available to be shown in the UI and to be provided in model context. We also added an app-server event so that clients can subscribe to also get accessible apps without being blocked on the full app list. - [x] Separate accessible apps and directory apps loading. - [x] `app/list` request will also emit `app/list/updated` notifications that app-server clients can subscribe. Which allows clients to get accessible apps list to render in the $ menu without being blocked by directory apps. - [x] Cache both accessible and directory apps with 1 hour TTL to avoid reloading them when creating new threads. - [x] TUI improvements to redraw $ menu and /apps menu when app list is updated.	2026-02-08 15:24:56 -08:00
Matthew Zeng	9f1009540b	Upgrade rmcp to 0.14 (#10718 ) - [x] Upgrade rmcp to 0.14	2026-02-08 15:07:53 -08:00
Eric Traut	b3de6c7f2b	Defer persistence of rollout file (#11028 ) - Defer rollout persistence for fresh threads (`InitialHistory::New`): keep rollout events in memory and only materialize rollout file + state DB row on first `EventMsg::UserMessage`. - Keep precomputed rollout path available before materialization. - Change `thread/start` to build thread response from live config snapshot and optional precomputed path. - Improve pre-materialization behavior in app-server/TUI: clearer invalid-request errors for file-backed ops and a friendlier `/fork` “not ready yet” UX. - Update tests to match deferred semantics across start/read/archive/unarchive/fork/resume/review flows. - Improved resilience of user_shell test, which should be unrelated to this change but must be affected by timing changes For Reviewers: * The primary change is in recorder.rs * Most of the other changes were to fix up broken assumptions in existing tests Testing: * Manually tested CLI * Exercised app server paths by manually running IDE Extension with rebuilt CLI binary * Only user-visible change is that `/fork` in TUI generates visible error if used prior to first turn	2026-02-07 23:05:03 -08:00
Charley Cunningham	e6662d6387	app-server: treat null mode developer instructions as built-in defaults (#10983 ) ## Summary - make `turn/start` normalize `collaborationMode.settings.developer_instructions: null` to the built-in instructions for the selected mode - prevent app-server clients from accidentally clearing mode-switch developer instructions by sending `null` - document this behavior in the v2 protocol and app-server docs ## What changed - `codex-rs/app-server/src/codex_message_processor.rs` - added a small `normalize_turn_start_collaboration_mode` helper - in `turn_start`, apply normalization before `OverrideTurnContext` - `codex-rs/app-server/tests/suite/v2/turn_start.rs` - extended `turn_start_accepts_collaboration_mode_override_v2` to assert the outgoing request includes default-mode instruction text when the client sends `developer_instructions: null` - `codex-rs/app-server-protocol/src/protocol/v2.rs` - clarified `TurnStartParams.collaboration_mode` docs: `settings.developer_instructions: null` means use built-in mode instructions - regenerated schema fixture: - `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts` - docs: - `codex-rs/app-server/README.md` - `codex-rs/docs/codex_mcp_interface.md`	2026-02-07 12:59:41 -08:00
Eric Traut	4d52428fa2	Fixed a flaky test (#10970 ) ## Summary Stabilize v2 review integration tests by making them hermetic with respect to model discovery. `app-server` review tests were intermittently timing out in CI (especially on Windows runners) because their test config allowed remote model refresh. During `thread/start`, the test process could issue live `/v1/models` requests, introducing external network latency and nondeterministic timing before review flow assertions. This change disables remote model fetching in the review test config helper used by these tests.	2026-02-06 21:26:26 -08:00
Ahmed Ibrahim	ba8b5d9018	Treat compaction failure as failure state (#10927 ) - Return compaction errors from local and remote compaction flows.\n- Stop turns/tasks when auto-compaction fails instead of continuing execution.	2026-02-06 13:51:46 -08:00
canvrno-oai	36c16e0c58	Add app configs to config.toml (#10822 ) Adds app configs to config.toml + tests	2026-02-06 10:29:08 -08:00
jif-oai	aab61934af	Handle required MCP startup failures across components (#10902 ) Summary - add a `required` flag for MCP servers everywhere config/CLI data is touched so mandatory helpers can be round-tripped - have `codex exec` and `codex app-server` thread start/resume fail fast when required MCPs fail to initialize	2026-02-06 17:14:37 +01:00
Eric Traut	dd80e332c4	Removed the "remote_compaction" feature flag (#10840 ) This feature is always on now	2026-02-05 23:54:57 -08:00
Owen Lin	0d8b2b74c4	feat(app-server): turn/steer API (#10821 ) This PR adds a dedicated `turn/steer` API for appending user input to an in-flight turn. ## Motivation Currently, steering in the app is implemented by just calling `turn/start` while a turn is running. This has some really weird quirks: - Client gets back a new `turn.id`, even though streamed events/approvals remained tied to the original active turn ID. - All the various turn-level override params on `turn/start` do not apply to the "steer", and would only apply to the next real turn. - There can also be a race condition where the client thinks the turn is active but the server has already completed it, so there might be bugs if the client has baked in some client-specific behavior thinking it's a steer when in fact the server kicked off a new turn. This is particularly possible when running a client against a remote app-server. Having a dedicated `turn/steer` API eliminates all those quirks. `turn/steer` behavior: - Requires an active turn on threadId. Returns a JSON-RPC error if there is no active turn. - If expectedTurnId is provided, it must match the active turn (more useful when connecting to a remote app-server). - Does not emit `turn/started`. - Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or `outputSchema` to accurately reflect that these are not applied when steering.	2026-02-06 00:35:04 +00:00
Matthew Zeng	729b016515	Add stage field for experimental flags. (#10793 ) - [x] Add stage field for experimental flags.	2026-02-05 23:31:04 +00:00
Max Johnson	8473096efb	Add app-server transport layer with websocket support (#10693 ) - Adds --listen <URL> to codex app-server with two listen modes: - stdio:// (default, existing behavior) - ws://IP:PORT (new websocket transport) - Refactors message routing to be connection-aware: - Tracks per-connection session state (initialize/experimental capability) - Routes responses/errors to the originating connection - Broadcasts server notifications/requests to initialized connections - Updates initialization semantics to be per connection (not process-global), and updates app-server docs accordingly. - Adds websocket accept/read/write handling (JSON-RPC per text frame, ping/pong handling, connection lifecycle events). Testing - Unit tests for transport URL parsing and targeted response/error routing. - New websocket integration test validating: - per-connection initialization requirements - no cross-connection response leakage - same request IDs on different connections route independently.	2026-02-05 20:56:34 +00:00
Matthew Zeng	7e81f63698	[app-server] Add a method to list experimental features. (#10721 ) - [x] Add a method to list experimental features.	2026-02-05 20:04:01 +00:00
jif-oai	c67120f4a0	fix: flaky landlock (#10689 ) https://openai.slack.com/archives/C095U48JNL9/p1770243347893959	2026-02-05 10:30:18 +00:00
Dylan Hurd	fe8b474acd	fix(core,app-server) resume with different model (#10719 ) ## Summary When resuming with a different model, we should also append a developer message with the model instructions ## Testing - [x] Added unit tests	2026-02-05 00:40:05 -08:00
Dylan Hurd	a05aadfa1b	chore(config) Default Personality Pragmatic (#10705 ) ## Summary Switch back to Pragmatic personality ## Testing - [x] Updated unit tests	2026-02-04 21:22:47 -08:00
Dylan Hurd	73f32840c6	chore(core) personality migration tests (#10650 ) ## Summary Adds additional tests for personality edge cases ## Testing - [x] These are tests	2026-02-04 19:03:14 -08:00
Owen Lin	5ea107a088	feat(app-server, core): allow text + image content items for dynamic tool outputs (#10567 ) Took over the work that @aaronl-openai started here: https://github.com/openai/codex/pull/10397 Now that app-server clients are able to set up custom tools (called `dynamic_tools` in app-server), we should expose a way for clients to pass in not just text, but also image outputs. This is something the Responses API already supports for function call outputs, where you can pass in either a string or an array of content outputs (text, image, file): https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image So let's just plumb it through in Codex (with the caveat that we only support text and image for now). This is implemented end-to-end across app-server v2 protocol types and core tool handling. ## Breaking API change NOTE: This introduces a breaking change with dynamic tools, but I think it's ok since this concept was only recently introduced (https://github.com/openai/codex/pull/9539) and it's better to get the API contract correct. I don't think there are any real consumers of this yet (not even the Codex App). Old shape: `{ "output": "dynamic-ok", "success": true }` New shape: ``` { "contentItems": [ { "type": "inputText", "text": "dynamic-ok" }, { "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" } ] "success": true } ```	2026-02-04 16:12:47 -08:00
gt-oai	7c6d21a414	Fix test_shell_command_interruption flake (#10649 ) ## Human summary Sandboxing (specifically `LandlockRestrict`) is means that e.g. `sleep 10` fails immediately. Therefore it cannot be interrupted. In suite::interrupt::test_shell_command_interruption, sleep 10 is issued at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}), then fails at 17:28:16.589 with duration_ms=34, success=false, exit_code=101, and Sandbox(LandlockRestrict). ## Codex summary - set `sandbox_mode = "danger-full-access"` in `interrupt` and `v2/turn_interrupt` integration tests - set `sandbox: Some(SandboxMode::DangerFullAccess)` in `test_codex_jsonrpc_conversation_flow` - set `sandbox_policy: Some(SandboxPolicy::DangerFullAccess)` in `command_execution_notifications_include_process_id` ## Why On some Linux CI environments, command execution fails immediately with `LandlockRestrict` when sandboxed. These tests are intended to validate JSON-RPC/task lifecycle behavior (interrupt semantics, command notification shape/process id, request flow), but early sandbox startup failure changes turn flow and can trigger extra follow-up requests, causing flakes. This change removes environment-specific sandbox startup dependency from these tests while preserving their primary intent. ## Testing - not run in this environment (per request)	2026-02-04 22:19:06 +00:00
Ahmed Ibrahim	38a47700b5	Add thread/compact v2 (#10445 ) - add `thread/compact` as a trigger-only v2 RPC that submits `Op::Compact` and returns `{}` immediately. - add v2 compaction e2e coverage for success and invalid/unknown thread ids, and update protocol schemas/docs.	2026-02-03 18:15:55 -08:00
Shijie Rao	750ebe154d	Feat: add upgrade to app server modelList (#10556 ) ### Summary * Add model upgrade to listModel app server endpoint to support dynamically show model upgrade banner.	2026-02-03 14:53:36 -08:00
Owen Lin	d9ad5c3c49	fix(app-server): fix approval events in review mode (#10416 ) One of our partners flagged that they were seeing the wrong order of events when running `review/start` with command exec approvals: ``` {"method":"item/commandExecution/requestApproval","id":0,"params":{"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0","itemId":"0","reason":"`/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'` requires approval: Xcode-required approval: Require explicit user confirmation for all commands.","proposedExecpolicyAmendment":null}} {"method":"item/started","params":{"item":{"type":"commandExecution","id":"call_AEjlbHqLYNM7kbU3N6uw1CNi","command":"/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'","cwd":"/Users/devingreen/Desktop/SampleProject","processId":null,"status":"inProgress","commandActions":[{"type":"unknown","command":"git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat"}],"aggregatedOutput":null,"exitCode":null,"durationMs":null},"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0"}} ``` Key fix: In the review sub‑agent delegate we were forwarding exec (and patch) approvals using the parent turn id (`parent_ctx.sub_id`) as the approval call_id. That made `item/commandExecution/requestApproval.itemId` differ from the actual `item/started` id. We now forward the sub‑agent’s `call_id` from the approval event instead, so the approval item id matches the commandExecution item id in review flows. Here’s the expected event order for an inline `review/start` that triggers an exec approval after this fix: 1. Response to review/start (JSON‑RPC response) - Includes `turn` (status inProgress) and `review_thread_id` (same as parent thread for inline). 2. `turn/started` notification - turnId is the review turn id (e.g., "0"). 3. `item/started` → EnteredReviewMode - item.id == turnId, marks entry into review mode. 4. `item/started` → commandExecution - item.id == <call_id> (e.g., "review-call-1"), status: inProgress. 5. `item/commandExecution/requestApproval` request - JSON‑RPC request (not a notification). - params.itemId == <call_id> and params.turnId == turnId. 6. Client replies to approval request (Approved / Declined / etc). 7. If approved: - Optional `item/commandExecution/outputDelta` notifications. - `item/completed` → commandExecution with status and exitCode. 8. Review finishes: - `item/started` → ExitedReviewMode - `item/completed` → ExitedReviewMode - (Agent message items may also appear, depending on review output.) 9. `turn/completed` notification The key being #4 and #5 are now in the proper order with the correct item id.	2026-02-03 12:08:17 -08:00
Charley Cunningham	d509df676b	Cleanup collaboration mode variants (#10404 ) ## Summary This PR simplifies collaboration modes to the visible set `default \| plan`, while preserving backward compatibility for older partners that may still send legacy mode names. Specifically: - Renames the old Code behavior to Default. - Keeps Plan as-is. - Removes Custom mode behavior (fallbacks now resolve to Default). - Keeps `PairProgramming` and `Execute` internally for compatibility plumbing, while removing them from schema/API and UI visibility. - Adds legacy input aliasing so older clients can still send old mode names. ## What Changed 1. Mode enum and compatibility - `ModeKind` now uses `Plan` + `Default` as active/public modes. - `ModeKind::Default` deserialization accepts legacy values: - `code` - `pair_programming` - `execute` - `custom` - `PairProgramming` and `Execute` variants remain in code but are hidden from protocol/schema generation. - `Custom` variant is removed; previous custom fallbacks now map to `Default`. 2. Collaboration presets and templates - Built-in presets now return only: - `Plan` - `Default` - Template rename: - `core/templates/collaboration_mode/code.md` -> `default.md` - `execute.md` and `pair_programming.md` remain on disk but are not surfaced in visible preset lists. 3. TUI updates - Updated user-facing naming and prompts from “Code” to “Default”. - Updated mode-cycle and indicator behavior to reflect only visible `Plan` and `Default`. - Updated corresponding tests and snapshots. 4. request_user_input behavior - `request_user_input` remains allowed only in `Plan` mode. - Rejection messaging now consistently treats non-plan modes as `Default`. 5. Schemas - Regenerated config and app-server schemas. - Public schema types now advertise mode values as: - `plan` - `default` ## Backward Compatibility Notes - Incoming legacy mode names (`code`, `pair_programming`, `execute`, `custom`) are accepted and coerced to `default`. - Outgoing/public schema surfaces intentionally expose only `plan \| default`. - This allows tolerant ingestion of older partner payloads while standardizing new integrations on the reduced mode set. ## Codex author `codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`	2026-02-03 09:23:53 -08:00
Colin Young	7e07ec8f73	[Codex][CLI] Gate image inputs by model modalities (#10271 ) ###### Summary - Add input_modalities to model metadata so clients can determine supported input types. - Gate image paste/attach in TUI when the selected model does not support images. - Block submits that include images for unsupported models and show a clear warning. - Propagate modality metadata through app-server protocol/model-list responses. - Update related tests/fixtures. ###### Rationale - Models support different input modalities. - Clients need an explicit capability signal to prevent unsupported requests. - Backward-compatible defaults preserve existing behavior when modality metadata is absent. ###### Scope - codex-rs/protocol, codex-rs/core, codex-rs/tui - codex-rs/app-server-protocol, codex-rs/app-server - Generated app-server types / schema fixtures ###### Trade-offs - Default behavior assumes text + image when field is absent for compatibility. - Server-side validation remains the source of truth. ###### Follow-up - Non-TUI clients should consume input_modalities to disable unsupported attachments. - Model catalogs should explicitly set input_modalities for text-only models. ###### Testing - cargo fmt --all - cargo test -p codex-tui - env -u GITHUB_APP_KEY cargo test -p codex-core --lib - just write-app-server-schema - cargo run -p codex-cli --bin codex -- app-server generate-ts --out app-server-types - test against local backend <img width="695" height="199" alt="image" src="https://github.com/user-attachments/assets/d22dd04f-5eba-4db9-a7c5-a2506f60ec44" /> --------- Co-authored-by: Josh McKinney <joshka@openai.com>	2026-02-02 18:56:39 -08:00
Ahmed Ibrahim	b8addcddb9	Require models refresh on cli version mismatch (#10414 )	2026-02-02 18:55:25 -08:00
sayan-oai	fc05374344	chore: add phase to message responseitem (#10455 ) ### What add wiring for `phase` field on `ResponseItem::Message` to lay groundwork for differentiating model preambles and final messages. currently optional. follows pattern in #9698. updated schemas with `just write-app-server-schema` so we can see type changes. ### Tests Updated existing tests for SSE parsing and hydrating from history	2026-02-03 02:52:26 +00:00
jif-oai	3cc9122ee2	feat: experimental flags (#10231 ) ## Problem being solved - We need a single, reliable way to mark app-server API surface as experimental so that: 1. the runtime can reject experimental usage unless the client opts in 2. generated TS/JSON schemas can exclude experimental methods/fields for stable clients. Right now that’s easy to drift or miss when done ad-hoc. ## How to declare experimental methods and fields - Experimental method: add `#[experimental("method/name")]` to the `ClientRequest` variant in `client_request_definitions!`. - Experimental field: on the params struct, derive `ExperimentalApi` and annotate the field with `#[experimental("method/name.field")]` + set `inspect_params: true` for the method variant so `ClientRequest::experimental_reason()` inspects params for experimental fields. ## How the macro solves it - The new derive macro lives in `codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via `#[derive(ExperimentalApi)]` plus `#[experimental("reason")]` attributes. - Structs: - Generates `ExperimentalApi::experimental_reason(&self)` that checks only annotated fields. - The “presence” check is type-aware: - `Option<T>`: `is_some_and(...)` recursively checks inner. - `Vec`/`HashMap`/`BTreeMap`: must be non-empty. - `bool`: must be `true`. - Other types: considered present (returns `true`). - Registers each experimental field in an `inventory` with `(type_name, serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for that type. Field names are converted from `snake_case` to `camelCase` for schema/TS filtering. - Enums: - Generates an exhaustive `match` returning `Some(reason)` for annotated variants and `None` otherwise (no wildcard arm). - Wiring: - Runtime gating uses `ExperimentalApi::experimental_reason()` in `codex-rs/app-server/src/message_processor.rs` to reject requests unless `InitializeParams.capabilities.experimental_api == true`. - Schema/TS export filters use the inventory list and `EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to strip experimental methods/fields when `experimental_api` is false.	2026-02-02 11:06:50 +00:00
Charley Cunningham	d3514bbdd2	Bump thread updated_at on unarchive to refresh sidebar ordering (#10280 ) ## Summary - Touch restored rollout files on `thread/unarchive` so `updatedAt` reflects the unarchive time. - Add a regression test to ensure unarchiving bumps `updated_at` from an old mtime. ## Notes This fixes the UX issue where unarchived old threads don’t reappear near the top of recent threads.	2026-02-01 12:53:47 -08:00
Dylan Hurd	8a461765f3	chore(core) Default to friendly personality (#10305 ) ## Summary Update default personality to friendly ## Testing - [x] Unit tests pass	2026-01-31 17:11:32 -07:00
Dylan Hurd	ed9e02c9dc	chore(app-server) add personality update test (#10306 ) ## Summary Add some additional validation to ensure app-server handles Personality changes ## Testing - [x] These are tests	2026-01-31 14:49:55 -07:00
Michael Bolin	e6d913af2d	chore: rename ChatGpt -> Chatgpt in type names (#10244 ) When using ChatGPT in names of types, we should be consistent, so this renames some types with `ChatGpt` in the name to `Chatgpt`. From https://rust-lang.github.io/api-guidelines/naming.html: > In `UpperCamelCase`, acronyms and contractions of compound words count as one word: use `Uuid` rather than `UUID`, `Usize` rather than `USize` or `Stdin` rather than `StdIn`. In `snake_case`, acronyms and contractions are lower-cased: `is_xid_start`. This PR updates existing uses of `ChatGpt` and changes them to `Chatgpt`. Though in all cases where it could affect the wire format, I visually inspected that we don't change anything there. That said, this _will_ change the codegen because it will affect the spelling of type names. For example, this renames `AuthMode::ChatGPT` to `AuthMode::Chatgpt` in `app-server-protocol`, but the wire format is still `"chatgpt"`. This PR also updates a number of types in `codex-rs/core/src/auth.rs`.	2026-01-30 11:18:39 -08:00
Charley Cunningham	ec4a2d07e4	Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786 ) ## Summary - Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed in core, emitting plan deltas plus a plan `ThreadItem`, while stripping tags from normal assistant output. - Persist plan items and rebuild them on resume so proposed plans show in thread history. - Wire plan items/deltas through app-server protocol v2 and render a dedicated proposed-plan view in the TUI, including the “Implement this plan?” prompt only when a plan item is present. ## Changes ### Core (`codex-rs/core`) - Added a generic, line-based tag parser that buffers each line until it can disprove a tag prefix; implements auto-close on `finish()` for unterminated tags. `codex-rs/core/src/tagged_block_parser.rs` - Refactored proposed plan parsing to wrap the generic parser. `codex-rs/core/src/proposed_plan_parser.rs` - In plan mode, stream assistant deltas as: - Normal text → `AgentMessageContentDelta` - Plan text → `PlanDelta` + `TurnItem::Plan` start/completion (`codex-rs/core/src/codex.rs`) - Final plan item content is derived from the completed assistant message (authoritative), not necessarily the concatenated deltas. - Strips `<proposed_plan>` blocks from assistant text in plan mode so tags don’t appear in normal messages. (`codex-rs/core/src/stream_events_utils.rs`) - Persist `ItemCompleted` events only for plan items for rollout replay. (`codex-rs/core/src/rollout/policy.rs`) - Guard `update_plan` tool in Plan Mode with a clear error message. (`codex-rs/core/src/tools/handlers/plan.rs`) - Updated Plan Mode prompt to: - keep `<proposed_plan>` out of non-final reasoning/preambles - require exact tag formatting - allow only one `<proposed_plan>` block per turn (`codex-rs/core/templates/collaboration_mode/plan.md`) ### Protocol / App-server protocol - Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items. (`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`) - Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with EXPERIMENTAL markers and note that deltas may not match the final plan item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`) - Added plan delta route in app-server protocol common mapping. (`codex-rs/app-server-protocol/src/protocol/common.rs`) - Rebuild plan items from persisted `ItemCompleted` events on resume. (`codex-rs/app-server-protocol/src/protocol/thread_history.rs`) ### App-server - Forward plan deltas to v2 clients and map core plan items to v2 plan items. (`codex-rs/app-server/src/bespoke_event_handling.rs`, `codex-rs/app-server/src/codex_message_processor.rs`) - Added v2 plan item tests. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ### TUI - Added a dedicated proposed plan history cell with special background and padding, and moved “• Proposed Plan” outside the highlighted block. (`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`) - Only show “Implement this plan?” when a plan item exists. (`codex-rs/tui/src/chatwidget.rs`, `codex-rs/tui/src/chatwidget/tests.rs`) <img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM" src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286" /> ### Docs / Misc - Updated protocol docs to mention plan deltas. (`codex-rs/docs/protocol_v1.md`) - Minor plumbing updates in exec/debug clients to tolerate plan deltas. (`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`) ## Tests - Added core integration tests: - Plan mode strips plan from agent messages. - Missing `</proposed_plan>` closes at end-of-message. (`codex-rs/core/tests/suite/items.rs`) - Added unit tests for generic tag parser (prefix buffering, non-tag lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`) - Existing app-server plan item tests in v2. (`codex-rs/app-server/tests/suite/v2/plan_item.rs`) ## Notes / Behavior - Plan output no longer appears in standard assistant text in Plan Mode; it streams via `PlanDelta` and completes as a `TurnItem::Plan`. - The final plan item content is authoritative and may diverge from streamed deltas (documented as experimental). - Reasoning summaries are not filtered; prompt instructs the model not to include `<proposed_plan>` outside the final plan message. ## Codex Author `codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`	2026-01-30 18:59:30 +00:00
Shijie Rao	a0ccef9d5c	Chore: plan mode do not include free form question and always include isOther (#10210 ) We should never ask a freeform question when planning and we should always include isOther as an escape hatch.	2026-01-30 01:19:24 -08:00
Dylan Hurd	e3ab0bd973	chore(personality) new schema with fallbacks (#10147 ) ## Summary Let's dial in this api contract in a bit more with more robust fallback behavior when model_instructions_template is false. Switches to a more explicit template / variables structure, with more fallbacks. ## Testing - [x] Adding unit tests - [x] Tested locally	2026-01-30 00:10:12 -07:00
Celia Chen	7151387474	[feat] persist dynamic tools in session rollout file (#10130 ) Add dynamic tools to rollout file for persistence & read from rollout on resume. Ran a real example and spotted the following in the rollout file: ``` {"timestamp":"2026-01-29T01:27:57.468Z","type":"session_meta","payload":{"id":"019c075d-3f0b-77e3-894e-c1c159b04b1e","timestamp":"2026-01-29T01:27:57.451Z","...."dynamic_tools":[{"name":"demo_tool","description":"Demo dynamic tool","inputSchema":{"additionalProperties":false,"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}}],"git":{"commit_hash":"ebc573f15c01b8af158e060cfedd401f043e9dfa","branch":"dev/cc/dynamic-tools","repository_url":"https://github.com/openai/codex.git"}}} ```	2026-01-30 01:10:00 +00:00
Owen Lin	81a17bb2c1	feat(app-server): support external auth mode (#10012 ) This enables a new use case where `codex app-server` is embedded into a parent application that will directly own the user's ChatGPT auth lifecycle, which means it owns the user’s auth tokens and refreshes it when necessary. The parent application would just want a way to pass in the auth tokens for codex to use directly. The idea is that we are introducing a new "auth mode" currently only exposed via app server: `chatgptAuthTokens` which consist of the `id_token` (stores account metadata) and `access_token` (the bearer token used directly for backend API calls). These auth tokens are only stored in-memory. This new mode is in addition to the existing `apiKey` and `chatgpt` auth modes. This PR reuses the shape of our existing app-server account APIs as much as possible: - Update `account/login/start` with a new `chatgptAuthTokens` variant, which will allow the client to pass in the tokens and have codex app-server use them directly. Upon success, the server emits `account/login/completed` and `account/updated` notifications. - A new server->client request called `account/chatgptAuthTokens/refresh` which the server can use whenever the access token previously passed in has expired and it needs a new one from the parent application. I leveraged the core 401 retry loop which typically triggers auth token refreshes automatically, but made it pluggable: - chatgpt mode refreshes internally, as usual. - chatgptAuthTokens mode calls the client via `account/chatgptAuthTokens/refresh`, the client responds with updated tokens, codex updates its in-memory auth, then retries. This RPC has a 10s timeout and handles JSON-RPC errors from the client. Also some additional things: - chatgpt logins are blocked while external auth is active (have to log out first. typically clients will pick one OR the other, not support both) - `account/logout` clears external auth in memory - Ensures that if `forced_chatgpt_workspace_id` is set via the user's config, we respect it in both: - `account/login/start` with `chatgptAuthTokens` (returns a JSON-RPC error back to the client) - `account/chatgptAuthTokens/refresh` (fails the turn, and on next request app-server will send another `account/chatgptAuthTokens/refresh` request to the client).	2026-01-29 23:46:04 +00:00
Matthew Zeng	b9cd089d1f	[connectors] Support connectors part 2 - slash command and tui (#9728 ) - [x] Support `/apps` slash command to browse the apps in tui. - [x] Support inserting apps to prompt using `$`. - [x] Lots of simplification/renaming from connectors to apps.	2026-01-28 19:51:58 -08:00
Ahmed Ibrahim	52609c6f42	Add app-server compaction item notifications tests (#10123 ) - add v2 tests covering local + remote auto-compaction item started/completed notifications	2026-01-29 01:00:38 +00:00
Jeremy Rose	b8156706e6	file-search: improve file query perf (#9939 ) switch nucleo-matcher for nucleo and use a "file search session" w/ live updating query instead of a single hermetic run per query.	2026-01-28 10:54:43 -08:00
Dylan Hurd	996e09ca24	feat(core) RequestRule (#9489 ) ## Summary Instead of trying to derive the prefix_rule for a command mechanically, let's let the model decide for us. ## Testing - [x] tested locally	2026-01-28 08:43:17 +00:00
Owen Lin	fc0fd85349	fix(app-server, core): defer initial context write to rollout file until first turn (#9950 ) ### Overview Currently calling `thread/resume` will always bump the thread's `updated_at` timestamp. This PR makes it the `updated_at` timestamp changes only if a turn is triggered. ### Additonal context What we typically do on resuming a thread is always writing “initial context” to the rollout file immediately. This initial context includes: - Developer instructions derived from sandbox/approval policy + cwd - Optional developer instructions (if provided) - Optional collaboration-mode instructions - Optional user instructions (if provided) - Environment context (cwd, shell, etc.) This PR defers writing the “initial context” to the rollout file until the first `turn/start`, so we don't inadvertently bump the thread's `updated_at` timestamp until a turn is actually triggered. This works even though both `thread/resume` and `turn/start` accept overrides (such as `model`, `cwd`, etc.) because the initial context is seeded from the effective `TurnContext` in memory, computed at `turn/start` time, after both sets of overrides have been applied. NOTE: This is a very short-lived solution until we introduce sqlite. Then we can remove this.	2026-01-27 10:41:54 -08:00
jif-oai	247fb2de64	[app-server] feat: add filtering on thread list (#9897 )	2026-01-26 21:54:19 +00:00
Charley Cunningham	62266b13f8	Add thread/unarchive to restore archived rollouts (#9843 ) ## Summary - Adds a new `thread/unarchive` RPC to move archived thread rollouts back into the active `sessions/` tree. ## What changed - Protocol - Adds `thread/unarchive` request/response types and wiring. - Server - Implements `thread_unarchive` in the app server. - Validates the archived rollout path and thread ID. - Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout filename timestamp. - Core - Adds `find_archived_thread_path_by_id_str` helper for archived rollouts. - Docs - Documents the new RPC and usage example. - Tests - Adds an end-to-end server test that: 1) starts a thread, 2) archives it, 3) unarchives it, 4) asserts the file is restored to `sessions/`. ## How to use ```json { "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } } ``` ## Author Codex Session `codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this kind of session UUID forkable by anyone with the right `session_object_storage_url` line in their config, but for now just pasting it here for my reference)	2026-01-26 11:24:36 -08:00
Shijie Rao	3ba702c5b6	Feat: add isOther to question returned by request user input tool (#9890 ) ### Summary Add `isOther` to question object from request_user_input tool input and remove `other` option from the tool prompt to better handle tool input.	2026-01-26 09:52:38 -08:00
gt-oai	48aeb67f7a	Fix flakey conversation flow test (#9784 ) I've seen this test fail with: ``` - Mock #1. Expected range of matching incoming requests: == 2 Number of matched incoming requests: 1 ``` This is because we pop the wrong task_complete events and then the test exits. I think this is because the MCP events are now buffered after https://github.com/openai/codex/pull/8874. So: 1. clear the buffer before we do any user message sending 2. additionally listen for task start before task complete 3. use the ID from task start to find the correct task complete event.	2026-01-26 15:58:14 +00:00
jif-oai	d594693d1a	feat: dynamic tools injection (#9539 ) ## Summary Add dynamic tool injection to thread startup in API v2, wire dynamic tool calls through the app server to clients, and plumb responses back into the model tool pipeline. ### Flow (high level) - Thread start injects `dynamic_tools` into the model tool list for that thread (validation is done here). - When the model emits a tool call for one of those names, core raises a `DynamicToolCallRequest` event. - The app server forwards it to the client as `item/tool/call`, waits for the client’s response, then submits a `DynamicToolResponse` back to core. - Core turns that into a `function_call_output` in the next model request so the model can continue. ### What changed - Added dynamic tool specs to v2 thread start params and protocol types; introduced `item/tool/call` (request/response) for dynamic tool execution. - Core now registers dynamic tool specs at request time and routes those calls via a new dynamic tool handler. - App server validates tool names/schemas, forwards dynamic tool call requests to clients, and publishes tool outputs back into the session. - Integration tests	2026-01-26 10:06:44 +00:00
Dylan Hurd	25fccc3d4d	chore(core) move model_instructions_template config (#9871 ) ## Summary Move `model_instructions_template` config to the experimental slug while we iterate on this feature ## Testing - [x] Tested locally, unit tests still pass	2026-01-26 07:02:11 +00:00
Dylan Hurd	031bafd1fb	feat(tui) /personality (#9718 ) ## Summary Adds /personality selector in the TUI, which leverages the new core interface in #9644 Notes: - We are doing some of our own state management for model_info loading here, but not sure if that's ideal. open to opinions on simpler approach, but would like to avoid blocking on a larger refactor - Right now, the `/personality` selector just hides when the model doesn't support it. we can update this behavior down the line ## Testing - [x] Tested locally - [x] Added snapshot tests	2026-01-25 21:59:42 -08:00
Ahmed Ibrahim	58450ba2a1	Use collaboration mode masks without mutating base settings (#9806 ) Keep an unmasked base collaboration mode and apply the active mask on demand. Simplify the TUI mask helpers and update tests/docs to match the mask contract.	2026-01-25 07:35:31 +00:00
jif-oai	83775f4df1	feat: ephemeral threads (#9765 ) Add ephemeral threads capabilities. Only exposed through the `app-server` v2 The idea is to disable the rollout recorder for those threads.	2026-01-24 14:57:40 +00:00

1 2 3 4

187 commits