core-agent-ide

Author	SHA1	Message	Date
Ahmed Ibrahim	2bc3e52a91	Stabilize app list update ordering test (#14052 ) ## Summary - make `list_apps_waits_for_accessible_data_before_emitting_directory_updates` accept the two valid notification paths the server can emit - keep rejecting the real bug this test is meant to catch: a directory-only `app/list/updated` notification before accessible app data is available ## Why this fixes the flake The old test used a fixed `150ms` silence window and assumed the first notification after that window had to be the fully merged final update. In CI, scheduling occasionally lets accessible app data arrive before directory data, so the first valid notification can be an accessible-only interim update. That made the test fail even though the server behavior was correct. This change makes the test deterministic by reading notifications until the final merged payload arrives. Any interim update is only accepted if it contains accessible apps only; if the server ever emits inaccessible directory data before accessible data is ready, the test still fails immediately. ## Change type - test-only; no production app-list logic changes	2026-03-09 00:16:13 -07:00
Jack Mousseau	e6b93841c5	Add request permissions tool (#13092 ) Adds a built-in `request_permissions` tool and wires it through the Codex core, protocol, and app-server layers so a running turn can ask the client for additional permissions instead of relying on a static session policy. The new flow emits a `RequestPermissions` event from core, tracks the pending request by call ID, forwards it through app-server v2 as an `item/permissions/requestApproval` request, and resumes the tool call once the client returns an approved subset of the requested permission profile.	2026-03-08 20:23:06 -07:00
Matthew Zeng	a684a36091	[app-server] Support hot-reload user config when batch writing config. (#13839 ) - [x] Support hot-reload user config when batch writing config.	2026-03-08 17:38:01 -07:00
Charley Cunningham	7ba1fccfc1	fix(ci): restore guardian coverage and bazel unit tests (#13912 ) ## Summary - restore the guardian review request snapshot test and its tracked snapshot after it was dropped from `main` - make Bazel Rust unit-test wrappers resolve runfiles correctly on manifest-only platforms like macOS and point Insta at the real workspace root - harden the shell-escalation socket-closure assertion so the musl Bazel test no longer depends on fd reuse behavior ## Verification - cargo test -p codex-core guardian_review_request_layout_matches_model_visible_request_snapshot - cargo test -p codex-shell-escalation - bazel test //codex-rs/exec:exec-unit-tests //codex-rs/shell-escalation:shell-escalation-unit-tests Supersedes #13894. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com> Co-authored-by: viyatb-oai <viyatb@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-08 12:05:19 -07:00
Ahmed Ibrahim	dc19e78962	Stabilize abort task follow-up handling (#13874 ) - production logic plus tests; cancel running tasks before clearing pending turn state - suppress follow-up model requests after cancellation and assert on stabilized request counts instead of fixed sleeps	2026-03-07 22:56:00 -08:00
jif-oai	cf143bf71e	feat: simplify DB further (#13771 )	2026-03-07 03:48:36 -08:00
iceweasel-oai	4b4f61d379	app-server: require absolute cwd for windowsSandbox/setupStart (#13833 ) ## Summary - require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf - reject relative cwd values at request parsing instead of normalizing them later in the setup flow - add RPC-layer coverage for relative cwd rejection and update the checked-in protocol schemas/docs ## Why windowsSandbox/setupStart was carrying the client-provided cwd as a raw PathBuf for command_cwd while config derivation normalized the same value into an absolute policy_cwd. That left room for relative-path ambiguity in the setup path, especially for inputs like cwd: "repo". Making the RPC accept only absolute paths removes that split entirely: the handler now receives one already-validated absolute path and uses it for both config derivation and setup. This keeps the trust model unchanged. Trusted clients could already choose the session cwd; this change is only about making the setup RPC reject relative paths so command_cwd and policy_cwd cannot diverge. ## Testing - cargo test -p codex-app-server windows_sandbox_setup (run locally by user) - cargo test -p codex-app-server-protocol windows_sandbox (run locally by user)	2026-03-06 22:47:08 -08:00
Ruslan Nigmatullin	e9bd8b20a1	app-server: Add streaming and tty/pty capabilities to `command/exec` (#13640 ) * Add an ability to stream stdin, stdout, and stderr * Streaming of stdout and stderr has a configurable cap for total amount of transmitted bytes (with an ability to disable it) * Add support for overriding environment variables * Add an ability to terminate running applications (using `command/exec/terminate`) * Add TTY/PTY support, with an ability to resize the terminal (using `command/exec/resize`)	2026-03-06 17:30:17 -08:00
Rohan Mehta	61098c7f51	Allow full web search tool config (#13675 ) Previously, we could only configure whether web search was on/off. This PR enables sending along a web search config, which includes all the stuff responsesapi supports: filters, location, etc.	2026-03-07 00:50:50 +00:00
xl-openai	0243734300	feat: Add curated plugin marketplace + Metadata Cleanup. (#13712 ) 1. Add a synced curated plugin marketplace and include it in marketplace discovery. 2. Expose optional plugin.json interface metadata in plugin/list 3. Tighten plugin and marketplace path handling using validated absolute paths. 4. Let manifests override skill, MCP, and app config paths. 5. Restrict plugin enablement/config loading to the user config layer so plugin enablement is at global level	2026-03-06 19:39:35 -05:00
Ruslan Nigmatullin	51fcdc760d	app-server: Emit `thread/name/updated` event globally (#13674 )	2026-03-06 10:25:18 -08:00
Owen Lin	6c98a59dbd	fix(app-server): fix turn_start_shell_zsh_fork_executes_command_v2 flake (#13770 ) This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test. The interrupt path can race with the follow-up `/responses` request that reports the aborted tool call, so the test now allows that extra no-op response instead of assuming there will only ever be one request. The assertions still stay focused on the behavior the test actually cares about: starting the zsh-forked command correctly. Testing: - `just fmt` - `cargo test -p codex-app-server --test all suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2 -- --exact --nocapture`	2026-03-06 10:10:16 -08:00
Matthew Zeng	98dca99db7	[elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621 ) - [x] Switch to use MCP style elicitation payload for mcp tool approvals. - [ ] TODO: Update the UI to support the full spec.	2026-03-06 01:50:26 -08:00
sayan-oai	014a59fb0b	check app auth in plugin/install (#13685 ) #### What on `plugin/install`, check if installed apps are already authed on chatgpt, and return list of all apps that are not. clients can use this list to trigger auth workflows as needed. checks are best effort based on `codex_apps` loading, much like `app/list`. #### Tests Added integration tests, tested locally.	2026-03-06 06:45:00 +00:00
xl-openai	520ed724d2	support plugin/list. (#13540 ) Introduce a plugin/list which reads from local marketplace.json. Also update the signature for plugin/install.	2026-03-05 21:58:50 -05:00
Ahmed Ibrahim	6cf0ed4e79	Refine realtime startup context formatting (#13560 ) ## Summary - group recent work by git repo when available, otherwise by directory - render recent work as bounded user asks with per-thread cwd context - exclude hidden files and directories from workspace trees	2026-03-05 16:31:20 -08:00
sayan-oai	4e77ea0ec7	add @plugin mentions (#13510 ) ## Note-- added plugin mentions via @, but that conflicts with file mentions depends and builds upon #13433. - introduces explicit `@plugin` mentions. this injects the plugin's mcp servers, app names, and skill name format into turn context as a dev message. - we do not yet have UI for these mentions, so we currently parse raw text (as opposed to skills and apps which have UI chips, autocomplete, etc.) this depends on a `plugins/list` app-server endpoint we can feed the UI with, which is upcoming - also annotate mcp and app tool descriptions with the plugin(s) they come from. this gives the model a first class way of understanding what tools come from which plugins, which will help implicit invocation. ### Tests Added and updated tests, unit and integration. Also confirmed locally a raw `@plugin` injects the dev message, and the model knows about its apps, mcps, and skills.	2026-03-06 00:03:39 +00:00
Max Johnson	1980b6ce00	treat SIGTERM like ctrl-c for graceful shutdown (#13594 ) treat SIGTERM the same as SIGINT for graceful app-server websocket shutdown	2026-03-05 18:16:58 +00:00
Owen Lin	926b2f19e8	feat(app-server): support mcp elicitations in v2 api (#13425 ) This adds a first-class server request for MCP server elicitations: `mcpServer/elicitation/request`. Until now, MCP elicitation requests only showed up as a raw `codex/event/elicitation_request` event from core. That made it hard for v2 clients to handle elicitations using the same request/response flow as other server-driven interactions (like shell and `apply_patch` tools). This also updates the underlying MCP elicitation request handling in core to pass through the full MCP request (including URL and form data) so we can expose it properly in app-server. ### Why not `item/mcpToolCall/elicitationRequest`? This is because MCP elicitations are related to MCP servers first, and only optionally to a specific MCP tool call. In the MCP protocol, elicitation is a server-to-client capability: the server sends `elicitation/create`, and the client replies with an elicitation result. RMCP models it that way as well. In practice an elicitation is often triggered by an MCP tool call, but not always. ### What changed - add `mcpServer/elicitation/request` to the v2 app-server API - translate core `codex/event/elicitation_request` events into the new v2 server request - map client responses back into `Op::ResolveElicitation` so the MCP server can continue - update app-server docs and generated protocol schema - add an end-to-end app-server test that covers the full round trip through a real RMCP elicitation flow - The new test exercises a realistic case where an MCP tool call triggers an elicitation, the app-server emits mcpServer/elicitation/request, the client accepts it, and the tool call resumes and completes successfully. ### app-server API flow - Client starts a thread with `thread/start`. - Client starts a turn with `turn/start`. - App-server sends `item/started` for the `mcpToolCall`. - While that tool call is in progress, app-server sends `mcpServer/elicitation/request`. - Client responds to that request with `{ action: "accept" \| "decline" \| "cancel" }`. - App-server sends `serverRequest/resolved`. - App-server sends `item/completed` for the mcpToolCall. - App-server sends `turn/completed`. - If the turn is interrupted while the elicitation is pending, app-server still sends `serverRequest/resolved` before the turn finishes.	2026-03-05 07:20:20 -08:00
sayan-oai	03d55f0e6f	chore: add web_search_tool_type for image support (#13538 ) add `web_search_tool_type` on model_info that can be populated from backend. will be used to filter which models can use `web_search` with images and which cant. added small unit test.	2026-03-05 07:02:27 +00:00
joeytrasatti-openai	22f4113ac1	Preserve persisted thread git info in resume (#13504 ) ## Summary - ensure `thread.resume` reuses the stored `gitInfo` instead of rebuilding it from the live working tree - persist and apply thread git metadata through the resume flow and add a regression test covering branch mismatch cases ## Testing - Not run (not requested)	2026-03-04 17:16:43 -08:00
iceweasel-oai	54a1c81d73	allow apps to specify cwd for sandbox setup. (#13484 ) The electron app doesn't start up the app-server in a particular workspace directory. So sandbox setup happens in the app-installed directory instead of the project workspace. This allows the app do specify the workspace cwd so that the sandbox setup actually sets up the ACLs instead of exiting fast and then having the first shell command be slow.	2026-03-04 10:54:30 -08:00
Val Kharitonov	4f6c4bb143	support 'flex' tier in app-server in addition to 'fast' (#13391 )	2026-03-03 22:46:05 -08:00
Owen Lin	0fbd84081b	feat(app-server): add a skills/changed v2 notification (#13414 ) This adds a first-class app-server v2 `skills/changed` notification for the existing skills live-reload signal. Before this change, clients only had the legacy raw `codex/event/skills_update_available` event. With this PR, v2 clients can listen for a typed JSON-RPC notification instead of depending on the legacy `codex/event/*` stream, which we want to remove soon.	2026-03-03 17:01:00 -08:00
Curtis 'Fjord' Hawthorne	b92146d48b	Add under-development original-resolution view_image support (#13050 ) ## Summary Add original-resolution support for `view_image` behind the under-development `view_image_original_resolution` feature flag. When the flag is enabled and the target model is `gpt-5.3-codex` or newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends `detail: "original"` to the Responses API instead of using the legacy resize/compress path. ## What changed - Added `view_image_original_resolution` as an under-development feature flag. - Added `ImageDetail` to the protocol models and support for serializing `detail: "original"` on tool-returned images. - Added `PromptImageMode::Original` to `codex-utils-image`. - Preserves original PNG/JPEG/WebP bytes. - Keeps legacy behavior for the resize path. - Updated `view_image` to: - use the shared `local_image_content_items_with_label_number(...)` helper in both code paths - select original-resolution mode only when: - the feature flag is enabled, and - the model slug parses as `gpt-5.3-codex` or newer - Kept local user image attachments on the existing resize path; this change is specific to `view_image`. - Updated history/image accounting so only `detail: "original"` images use the docs-based GPT-5 image cost calculation; legacy images still use the old fixed estimate. - Added JS REPL guidance, gated on the same feature flag, to prefer JPEG at 85% quality unless lossless is required, while still allowing other formats when explicitly requested. - Updated tests and helper code that construct `FunctionCallOutputContentItem::InputImage` to carry the new `detail` field. ## Behavior ### Feature off - `view_image` keeps the existing resize/re-encode behavior. - History estimation keeps the existing fixed-cost heuristic. ### Feature on + `gpt-5.3-codex+` - `view_image` sends original-resolution images with `detail: "original"`. - PNG/JPEG/WebP source bytes are preserved when possible. - History estimation uses the GPT-5 docs-based image-cost calculation for those `detail: "original"` images. #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/13050 - ⏳ `2` https://github.com/openai/codex/pull/13331 - ⏳ `3` https://github.com/openai/codex/pull/13049	2026-03-03 15:56:54 -08:00
joeytrasatti-openai	935754baa3	Add thread metadata update endpoint to app server (#13280 ) ## Summary - add the v2 `thread/metadata/update` API, including protocol/schema/TypeScript exports and app-server docs - patch stored thread `gitInfo` in sqlite without resuming the thread, with validation plus support for explicit `null` clears - repair missing sqlite thread rows from rollout data before patching, and make those repairs safe by inserting only when absent and updating only git columns so newer metadata is not clobbered - keep sqlite authoritative for mutable thread git metadata by preserving existing sqlite git fields during reconcile/backfill and only using rollout `SessionMeta` git fields to fill gaps - add regression coverage for the endpoint, repair paths, concurrent sqlite writes, clearing git fields, and rollout/backfill reconciliation - fix the login server shutdown race so cancelling before the waiter starts still terminates `block_until_done()` correctly ## Testing - `cargo test -p codex-state apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields` - `cargo test -p codex-state update_thread_git_info_preserves_newer_non_git_metadata` - `cargo test -p codex-core backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields` - `cargo test -p codex-app-server thread_metadata_update` - `cargo test` - currently fails in existing `codex-core` grep-files tests with `unsupported call: grep_files`: - `suite::grep_files::grep_files_tool_collects_matches` - `suite::grep_files::grep_files_tool_reports_empty_results`	2026-03-03 15:56:11 -08:00
Owen Lin	167158f93c	chore(app-server): delete v1 RPC methods and notifications (#13375 ) ## Summary This removes the old app-server v1 methods and notifications we no longer need, while keeping the small set the main codex app client still depends on for now. The remaining legacy surface is: - `initialize` - `getConversationSummary` - `getAuthStatus` - `gitDiffToRemote` - `fuzzyFileSearch` - `fuzzyFileSearch/sessionStart` - `fuzzyFileSearch/sessionUpdate` - `fuzzyFileSearch/sessionStop` And the raw `codex/event/*` notifications emitted from core. These notifications will be removed in a followup PR. ## What changed - removed deprecated v1 request variants from the protocol and app-server dispatcher - removed deprecated typed notifications: `authStatusChange`, `loginChatGptComplete`, and `sessionConfigured` - updated the app-server test client to use v2 flows instead of deleted v1 flows - deleted legacy-only app-server test suites and added focused coverage for `getConversationSummary` - regenerated app-server schema fixtures and updated the MCP interface docs to match the remaining compatibility surface ## Testing - `just write-app-server-schema` - `cargo test -p codex-app-server-protocol` - `cargo test -p codex-app-server`	2026-03-03 13:18:25 -08:00
pash-openai	07e532dcb9	app-server service tier plumbing (plus some cleanup) (#13334 ) followup to https://github.com/openai/codex/pull/13212 to expose fast tier controls to app server (majority of this PR is generated schema jsons - actual code is +69 / -35 and +24 tests ) - add service tier fields to the app-server protocol surfaces used by thread lifecycle, turn start, config, and session configured events - thread service tier through the app-server message processor and core thread config snapshots - allow runtime config overrides to carry service tier for app-server callers cleanup: - Removing useless "legacy" code supporting "standard" - we moved to None \| "fast", so "standard" is not needed.	2026-03-03 02:35:09 -08:00
Ruslan Nigmatullin	9022cdc563	app-server: Silence thread status changes caused by thread being created (#13079 ) Currently we emit `thread/status/changed` with `Idle` status right before sending `thread/started` event (which also has `Idle` status in it). It feels that there is no point in that as client has no way to know prior state of the thread as it didn't exist yet, so silence these kinds of notifications.	2026-03-03 00:52:28 +00:00
Owen Lin	146b798129	fix(app-server): emit turn/started only when turn actually starts (#13261 ) This is a follow-up for https://github.com/openai/codex/pull/13047 ## Why We had a race where `turn/started` could be observed before the thread had actually transitioned to `Active`. This was because we eagerly emitted `turn/started` in the request handler for `turn/start` (and `review/start`). That was showing up as flaky `thread/resume` tests, but the real issue was broader: a client could see `turn/started` and still get back an idle thread immediately afterward. The first idea was to eagerly call `thread_watch_manager.note_turn_started(...)` from the `turn/start` request path. That turns out to be unsafe, because `submit(Op::UserInput)` only queues work. If a turn starts and completes quickly, request-path bookkeeping can race with the real lifecycle events and leave stale running state behind. The real fix is to move `turn/started` to emit only after the turn _actually_ starts, so we do that by waiting for the `EventMsg::TurnStarted` notification emitted by codex core. We do this for both `turn/start` and `review/start`. I also verified this change is safe for our first-party codex apps - they don't have any assumptions that `turn/started` is emitted before the RPC response to `turn/start` (which is correct anyway). I also removed `single_client_mode` since it isn't really necessary now. ## Testing - `cargo test -p codex-app-server thread_resume -- --nocapture` - `cargo test -p codex-app-server 'suite::v2::turn_start::turn_start_emits_notifications_and_accepts_model_override' -- --exact --nocapture` - `cargo test -p codex-app-server`	2026-03-02 16:43:31 -08:00
Ahmed Ibrahim	b20b6aa46f	Update realtime websocket API (#13265 ) - migrate the realtime websocket transport to the new session and handoff flow - make the realtime model configurable in config.toml and use API-key auth for the websocket --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-02 16:05:40 -08:00
Owen Lin	d473e8d56d	feat(app-server): add tracing to all app-server APIs (#13285 ) ### Overview This PR adds the first piece of tracing for app-server JSON-RPC requests. There are two main changes: - JSON-RPC requests can now take an optional W3C trace context at the top level via a `trace` field (`traceparent` / `tracestate`). - app-server now creates a dedicated request span for every inbound JSON-RPC request in `MessageProcessor`, and uses the request-level trace context as the parent when present. For compatibility with existing flows, app-server still falls back to the TRACEPARENT env var when there is no request-level traceparent. This PR is intentionally scoped to the app-server boundary. In a followup, we'll actually propagate trace context through the async handoff into core execution spans like run_turn, which will make app-server traces much more useful. ### Spans A few details on the app-server span shape: - each inbound request gets its own server span - span/resource names are based on the JSON-RPC method (`initialize`, `thread/start`, `turn/start`, etc.) - spans record transport (stdio vs websocket), request id, connection id, and client name/version when available - `initialize` stores client metadata in session state so later requests on the same connection can reuse it	2026-03-02 16:01:41 -08:00
Ruslan Nigmatullin	14fcb6645c	app-server: Update `thread/name/set` to support not-loaded threads (#13282 ) Currently `thread/name/set` does only work for loaded threads. Expand the scope to also support persisted but not-yet-loaded ones for a more predictable API surface. This will make it possible to rename threads discovered via `thread/list` and similar operations.	2026-03-02 15:13:18 -08:00
Josh McKinney	75e7c804ea	test(app-server): increase flow test timeout to reduce flake (#11814 ) ## Summary - increase `DEFAULT_READ_TIMEOUT` in `codex_message_processor_flow` from 20s to 45s - keep test behavior the same while avoiding platform timing flakes ## Why Windows ARM64 CI showed these tests taking about 24s before `task_complete`, which could fail early and produce wiremock request-count mismatches. ## Testing - just fmt - cargo test -p codex-app-server codex_message_processor_flow -- --nocapture	2026-03-02 12:29:28 -08:00
jif-oai	b649953845	feat: polluted memories (#13008 ) Add a feature flag to disable memory creation for "polluted"	2026-03-02 11:57:32 +00:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
Thibault Sottiaux	c9cef6ba9e	[codex] include plan type in account updates (#13181 ) This change fixes a Codex app account-state sync bug where clients could know the user was signed in but still miss the ChatGPT subscription tier, which could lead to incorrect upgrade messaging for paid users. The root cause was that `account/updated` only carried `authMode` while plan information was available separately via `account/read` and rate-limit snapshots, so this update adds `planType` to `account/updated`, populates it consistently across login and refresh paths.	2026-03-01 13:43:37 -08:00
Ruslan Nigmatullin	8c1e3f3e64	app-server: Add `ephemeral` field to `Thread` object (#13084 ) Currently there is no alternative way to know that thread is ephemeral, only client which did create it has the knowledge.	2026-02-27 17:42:25 -08:00
Eric Traut	ff5cbfd7d4	Handle missing plan info for ChatGPT accounts (#13072 ) Addresses https://github.com/openai/codex/issues/13007 and https://github.com/openai/codex/issues/12170 There are situations where the ChatGPT auth backend might return a JWT that contains no plan information. Most code paths already handle this case well, but the internal implementation of the "account/read" app server call was failing in this case (returning an error rather than properly returning None for the plan). This resulted in a situation where users needed to log in every time the extension or app started even if they successfully logged in the last time. Summary - allow ChatGPT-authenticated accounts to fall back to `AccountPlanType::Unknown` when the token omits the plan claim - add regression coverage in `app-server/tests/suite/v2/account.rs` to confirm `account/read` returns `plan_type: Unknown` when the claim is absent - ensure the Rust auth helpers and fixtures treat missing plan claims as Optional and default to `Unknown`	2026-02-27 17:51:21 -07:00
Matthew Zeng	392fa7de50	[apps] Stablize app list updated event. (#13067 ) Stablize app list updated event so that we only send 2 updates: 1 when installed apps become available, one when all directory apps are available. Previously it also updates when directory apps become available before installed apps, which cuts off installed apps.	2026-02-27 15:23:24 -08:00
Ruslan Nigmatullin	69d7a456bb	app-server: Replay pending item requests on `thread/resume` (#12560 ) Replay pending client requests after `thread/resume` and emit resolved notifications when those requests clear so approval/input UI state stays in sync after reconnects and across subscribed clients. Affected RPCs: - `item/commandExecution/requestApproval` - `item/fileChange/requestApproval` - `item/tool/requestUserInput` Motivation: - Resumed clients need to see pending approval/input requests that were already outstanding before the reconnect. - Clients also need an explicit signal when a pending request resolves or is cleared so stale UI can be removed on turn start, completion, or interruption. Implementation notes: - Use pending client requests from `OutgoingMessageSender` in order to replay them after `thread/resume` attaches the connection, using original request ids. - Emit `serverRequest/resolved` when pending requests are answered or cleared by lifecycle cleanup. - Update the app-server protocol schema, generated TypeScript bindings, and README docs for the replay/resolution flow. High-level test plan: - Added automated coverage for replaying pending command execution and file change approval requests on `thread/resume`. - Added automated coverage for resolved notifications in command approval, file change approval, request_user_input, turn start, and turn interrupt flows. - Verified schema/docs updates in the relevant protocol and app-server tests. Manual testing: - Tested reconnect/resume with multiple connections. - Confirmed state stayed in sync between connections.	2026-02-27 12:45:59 -08:00
Michael Bolin	66b0adb34c	app-server: deflake running thread resume tests (#13047 ) ## Why CI has been intermittently failing in `suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch` because these running-thread resume tests treated `turn/started` as proof that the thread was already active. That signal is too early for this path. `turn/started` is emitted optimistically from [`turn_start`](`1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L5757-L5767)`). In `single_client_mode`, the listener skips `current_turn_history` tracking in [`codex_message_processor.rs`](`1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L6461-L6465)`), so running-thread resume still depends on `ThreadWatchManager` observing the core `TurnStarted` event in [`bespoke_event_handling.rs`](`1103d0037e/codex-rs/app-server/src/bespoke_event_handling.rs (L152-L156)`). If `thread/resume` lands in that window, the thread can still look `Idle` and the assertion flakes. ## What - Add a helper in `codex-rs/app-server/tests/suite/v2/thread_resume.rs` that waits for `thread/status/changed` to report `Active` for the target thread. - Use that public v2 notification as the synchronization barrier in the four running-thread resume tests instead of relying on `turn/started`. ## Follow-up This PR keeps the fix at the test layer so we can remove the flake without changing server behavior. A broader runtime fix should still be considered separately, for example: - make `turn/start` eagerly transition the thread to `Active` so `turn/started` and `thread/status/changed` are coherent - or revisit the `single_client_mode` guard that skips current-turn tracking for running-thread resume ## Testing - `cargo test -p codex-app-server thread_resume -- --nocapture` - `for i in $(seq 1 10); do cargo test -p codex-app-server 'suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch' -- --exact --nocapture; done`	2026-02-27 19:47:30 +00:00
jif-oai	8cf5b00aef	fix: more stable notify script (#13011 )	2026-02-27 16:05:44 +01:00
Michael Bolin	e6cd75a684	notify: include client in legacy hook payload (#12968 ) ## Why The `notify` hook payload did not identify which Codex client started the turn. That meant downstream notification hooks could not distinguish between completions coming from the TUI and completions coming from app-server clients such as VS Code or Xcode. Now that the Codex App provides its own desktop notifications, it would be nice to be able to filter those out. This change adds that context without changing the existing payload shape for callers that do not know the client name, and keeps the new end-to-end test cross-platform. ## What changed - added an optional top-level `client` field to the legacy `notify` JSON payload - threaded that value through `core` and `hooks`; the internal session and turn state now carries it as `app_server_client_name` - set the field to `codex-tui` for TUI turns - captured `initialize.clientInfo.name` in the app server and applied it to subsequent turns before dispatching hooks - replaced the notify integration test hook with a `python3` script so the test does not rely on Unix shell permissions or `bash` - documented the new field in `docs/config.md` ## Testing - `cargo test -p codex-hooks` - `cargo test -p codex-tui` - `cargo test -p codex-app-server suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name -- --exact --nocapture` - `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs` still has unrelated existing failures in this environment) ## Docs The public config reference on `developers.openai.com/codex` should mention that the legacy `notify` payload may include a top-level `client` field. The TUI reports `codex-tui`, and the app server reports `initialize.clientInfo.name` when it is available.	2026-02-26 22:27:34 -08:00
Ahmed Ibrahim	4d180ae428	Add model availability NUX metadata (#12972 ) - replace show_nux with structured availability_nux model metadata - expose availability NUX data through the app-server model API - update shared fixtures and tests for the new field	2026-02-26 22:02:57 -08:00
Matthew Zeng	6fe3dc2e22	[apps] Improve app/list with force_fetch=true (#12745 ) - [x] Improve app/list with force_fetch=true, we now keep cached snapshot until both install apps and directory apps load.	2026-02-27 03:54:03 +00:00
Shijie Rao	8715a6ef84	Feat: cxa-1833 update model/list (#12958 ) ### Summary Update `model/list` in app server to include more upgrade information.	2026-02-26 17:02:24 -08:00
Celia Chen	90cc4e79a2	feat: add local date/timezone to turn environment context (#12947 ) ## Summary This PR includes the session's local date and timezone in the model-visible environment context and persists that data in `TurnContextItem`. ## What changed - captures the current local date and IANA timezone when building a turn context, with a UTC fallback if the timezone lookup fails - includes current_date and timezone in the serialized <environment_context> payload - stores those fields on TurnContextItem so they survive rollout/history handling, subagent review threads, and resume flows - treats date/timezone changes as environment updates, so prompt caching and context refresh logic do not silently reuse stale time context - updates tests to validate the new environment fields without depending on a single hardcoded environment-context string ## test built a local build and saw it in the rollout file: ``` {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}} ```	2026-02-26 23:17:35 +00:00
pakrym-oai	ba41e84a50	Use model catalog default for reasoning summary fallback (#12873 ) ## Summary - make `Config.model_reasoning_summary` optional so unset means use model default - resolve the optional config value to a concrete summary when building `TurnContext` - add protocol support for `default_reasoning_summary` in model metadata ## Validation - `cargo test -p codex-core --lib client::tests -- --nocapture` --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-26 09:31:13 -08:00
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00

1 2 3 4 5 ...

308 commits