core-agent-ide

Author	SHA1	Message	Date
Ruslan Nigmatullin	14fcb6645c	app-server: Update `thread/name/set` to support not-loaded threads (#13282 ) Currently `thread/name/set` does only work for loaded threads. Expand the scope to also support persisted but not-yet-loaded ones for a more predictable API surface. This will make it possible to rename threads discovered via `thread/list` and similar operations.	2026-03-02 15:13:18 -08:00
Josh McKinney	75e7c804ea	test(app-server): increase flow test timeout to reduce flake (#11814 ) ## Summary - increase `DEFAULT_READ_TIMEOUT` in `codex_message_processor_flow` from 20s to 45s - keep test behavior the same while avoiding platform timing flakes ## Why Windows ARM64 CI showed these tests taking about 24s before `task_complete`, which could fail early and produce wiremock request-count mismatches. ## Testing - just fmt - cargo test -p codex-app-server codex_message_processor_flow -- --nocapture	2026-03-02 12:29:28 -08:00
jif-oai	b649953845	feat: polluted memories (#13008 ) Add a feature flag to disable memory creation for "polluted"	2026-03-02 11:57:32 +00:00
Ahmed Ibrahim	0aeb55bf08	Record realtime close marker on replacement (#13058 ) ## Summary - record a realtime close developer message when a new realtime session replaces an active one - assert the replacement marker through the mocked responses request path --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Charles Cunningham <ccunningham@openai.com>	2026-03-01 13:54:12 -08:00
Thibault Sottiaux	c9cef6ba9e	[codex] include plan type in account updates (#13181 ) This change fixes a Codex app account-state sync bug where clients could know the user was signed in but still miss the ChatGPT subscription tier, which could lead to incorrect upgrade messaging for paid users. The root cause was that `account/updated` only carried `authMode` while plan information was available separately via `account/read` and rate-limit snapshots, so this update adds `planType` to `account/updated`, populates it consistently across login and refresh paths.	2026-03-01 13:43:37 -08:00
Ruslan Nigmatullin	8c1e3f3e64	app-server: Add `ephemeral` field to `Thread` object (#13084 ) Currently there is no alternative way to know that thread is ephemeral, only client which did create it has the knowledge.	2026-02-27 17:42:25 -08:00
Eric Traut	ff5cbfd7d4	Handle missing plan info for ChatGPT accounts (#13072 ) Addresses https://github.com/openai/codex/issues/13007 and https://github.com/openai/codex/issues/12170 There are situations where the ChatGPT auth backend might return a JWT that contains no plan information. Most code paths already handle this case well, but the internal implementation of the "account/read" app server call was failing in this case (returning an error rather than properly returning None for the plan). This resulted in a situation where users needed to log in every time the extension or app started even if they successfully logged in the last time. Summary - allow ChatGPT-authenticated accounts to fall back to `AccountPlanType::Unknown` when the token omits the plan claim - add regression coverage in `app-server/tests/suite/v2/account.rs` to confirm `account/read` returns `plan_type: Unknown` when the claim is absent - ensure the Rust auth helpers and fixtures treat missing plan claims as Optional and default to `Unknown`	2026-02-27 17:51:21 -07:00
Matthew Zeng	392fa7de50	[apps] Stablize app list updated event. (#13067 ) Stablize app list updated event so that we only send 2 updates: 1 when installed apps become available, one when all directory apps are available. Previously it also updates when directory apps become available before installed apps, which cuts off installed apps.	2026-02-27 15:23:24 -08:00
Ruslan Nigmatullin	69d7a456bb	app-server: Replay pending item requests on `thread/resume` (#12560 ) Replay pending client requests after `thread/resume` and emit resolved notifications when those requests clear so approval/input UI state stays in sync after reconnects and across subscribed clients. Affected RPCs: - `item/commandExecution/requestApproval` - `item/fileChange/requestApproval` - `item/tool/requestUserInput` Motivation: - Resumed clients need to see pending approval/input requests that were already outstanding before the reconnect. - Clients also need an explicit signal when a pending request resolves or is cleared so stale UI can be removed on turn start, completion, or interruption. Implementation notes: - Use pending client requests from `OutgoingMessageSender` in order to replay them after `thread/resume` attaches the connection, using original request ids. - Emit `serverRequest/resolved` when pending requests are answered or cleared by lifecycle cleanup. - Update the app-server protocol schema, generated TypeScript bindings, and README docs for the replay/resolution flow. High-level test plan: - Added automated coverage for replaying pending command execution and file change approval requests on `thread/resume`. - Added automated coverage for resolved notifications in command approval, file change approval, request_user_input, turn start, and turn interrupt flows. - Verified schema/docs updates in the relevant protocol and app-server tests. Manual testing: - Tested reconnect/resume with multiple connections. - Confirmed state stayed in sync between connections.	2026-02-27 12:45:59 -08:00
Michael Bolin	66b0adb34c	app-server: deflake running thread resume tests (#13047 ) ## Why CI has been intermittently failing in `suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch` because these running-thread resume tests treated `turn/started` as proof that the thread was already active. That signal is too early for this path. `turn/started` is emitted optimistically from [`turn_start`](`1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L5757-L5767)`). In `single_client_mode`, the listener skips `current_turn_history` tracking in [`codex_message_processor.rs`](`1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L6461-L6465)`), so running-thread resume still depends on `ThreadWatchManager` observing the core `TurnStarted` event in [`bespoke_event_handling.rs`](`1103d0037e/codex-rs/app-server/src/bespoke_event_handling.rs (L152-L156)`). If `thread/resume` lands in that window, the thread can still look `Idle` and the assertion flakes. ## What - Add a helper in `codex-rs/app-server/tests/suite/v2/thread_resume.rs` that waits for `thread/status/changed` to report `Active` for the target thread. - Use that public v2 notification as the synchronization barrier in the four running-thread resume tests instead of relying on `turn/started`. ## Follow-up This PR keeps the fix at the test layer so we can remove the flake without changing server behavior. A broader runtime fix should still be considered separately, for example: - make `turn/start` eagerly transition the thread to `Active` so `turn/started` and `thread/status/changed` are coherent - or revisit the `single_client_mode` guard that skips current-turn tracking for running-thread resume ## Testing - `cargo test -p codex-app-server thread_resume -- --nocapture` - `for i in $(seq 1 10); do cargo test -p codex-app-server 'suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch' -- --exact --nocapture; done`	2026-02-27 19:47:30 +00:00
jif-oai	8cf5b00aef	fix: more stable notify script (#13011 )	2026-02-27 16:05:44 +01:00
Michael Bolin	e6cd75a684	notify: include client in legacy hook payload (#12968 ) ## Why The `notify` hook payload did not identify which Codex client started the turn. That meant downstream notification hooks could not distinguish between completions coming from the TUI and completions coming from app-server clients such as VS Code or Xcode. Now that the Codex App provides its own desktop notifications, it would be nice to be able to filter those out. This change adds that context without changing the existing payload shape for callers that do not know the client name, and keeps the new end-to-end test cross-platform. ## What changed - added an optional top-level `client` field to the legacy `notify` JSON payload - threaded that value through `core` and `hooks`; the internal session and turn state now carries it as `app_server_client_name` - set the field to `codex-tui` for TUI turns - captured `initialize.clientInfo.name` in the app server and applied it to subsequent turns before dispatching hooks - replaced the notify integration test hook with a `python3` script so the test does not rely on Unix shell permissions or `bash` - documented the new field in `docs/config.md` ## Testing - `cargo test -p codex-hooks` - `cargo test -p codex-tui` - `cargo test -p codex-app-server suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name -- --exact --nocapture` - `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs` still has unrelated existing failures in this environment) ## Docs The public config reference on `developers.openai.com/codex` should mention that the legacy `notify` payload may include a top-level `client` field. The TUI reports `codex-tui`, and the app server reports `initialize.clientInfo.name` when it is available.	2026-02-26 22:27:34 -08:00
Ahmed Ibrahim	4d180ae428	Add model availability NUX metadata (#12972 ) - replace show_nux with structured availability_nux model metadata - expose availability NUX data through the app-server model API - update shared fixtures and tests for the new field	2026-02-26 22:02:57 -08:00
Matthew Zeng	6fe3dc2e22	[apps] Improve app/list with force_fetch=true (#12745 ) - [x] Improve app/list with force_fetch=true, we now keep cached snapshot until both install apps and directory apps load.	2026-02-27 03:54:03 +00:00
Shijie Rao	8715a6ef84	Feat: cxa-1833 update model/list (#12958 ) ### Summary Update `model/list` in app server to include more upgrade information.	2026-02-26 17:02:24 -08:00
Celia Chen	90cc4e79a2	feat: add local date/timezone to turn environment context (#12947 ) ## Summary This PR includes the session's local date and timezone in the model-visible environment context and persists that data in `TurnContextItem`. ## What changed - captures the current local date and IANA timezone when building a turn context, with a UTC fallback if the timezone lookup fails - includes current_date and timezone in the serialized <environment_context> payload - stores those fields on TurnContextItem so they survive rollout/history handling, subagent review threads, and resume flows - treats date/timezone changes as environment updates, so prompt caching and context refresh logic do not silently reuse stale time context - updates tests to validate the new environment fields without depending on a single hardcoded environment-context string ## test built a local build and saw it in the rollout file: ``` {"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}} ```	2026-02-26 23:17:35 +00:00
pakrym-oai	ba41e84a50	Use model catalog default for reasoning summary fallback (#12873 ) ## Summary - make `Config.model_reasoning_summary` optional so unset means use model default - resolve the optional config value to a concrete summary when building `TurnContext` - add protocol support for `default_reasoning_summary` in model metadata ## Validation - `cargo test -p codex-core --lib client::tests -- --nocapture` --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-26 09:31:13 -08:00
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00
Eric Traut	28bfbb8f2b	Enforce user input length cap (#12823 ) Currently there is no bound on the length of a user message submitted in the TUI or through the app server interface. That means users can paste many megabytes of text, which can lead to bad performance, hangs, and crashes. In extreme cases, it can lead to a [kernel panic](https://github.com/openai/codex/issues/12323). This PR limits the length of a user input to 2**20 (about 1M) characters. This value was chosen because it fills the entire context window on the latest models, so accepting longer inputs wouldn't make sense anyway. Summary - add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol and surface it in TUI and app server code - block oversized submissions in the TUI submit flow and emit error history cells when validation fails - reject heavy app-server requests with JSON-RPC `-32602` and structured `input_too_large` data, plus document the behavior Testing - ran the IDE extension with this change and verified that when I attempt to paste a user message that's several MB long, it correctly reports an error instead of crashing or making my computer hot.	2026-02-25 22:23:51 -08:00
Celia Chen	4f45668106	Revert "Add skill approval event/response (#12633 )" (#12811 ) This reverts commit https://github.com/openai/codex/pull/12633. We no longer need this PR, because we favor sending normal exec command approval server request with `additional_permissions` of skill permissions instead	2026-02-26 01:02:42 +00:00
Charley Cunningham	2f4d6ded1d	Enable request_user_input in Default mode (#12735 ) ## Summary - allow `request_user_input` in Default collaboration mode as well as Plan - update the Default-mode instructions to prefer assumptions first and use `request_user_input` only when a question is unavoidable - update request_user_input and app-server tests to match the new Default-mode behavior - refactor collaboration-mode availability plumbing into `CollaborationModesConfig` for future mode-related flags ## Codex author `codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`	2026-02-25 15:20:46 -08:00
Celia Chen	b6d20748e0	Revert "Ensure shell command skills trigger approval (#12697 )" (#12721 ) This reverts commit `daf0f03ac8`. # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2026-02-25 22:49:53 +00:00
Owen Lin	21f7032dbb	feat(app-server): thread/unsubscribe API (#10954 ) Adds a new v2 app-server API for a client to be able to unsubscribe to a thread: - New RPC method: `thread/unsubscribe` - New server notification: `thread/closed` Today clients can start/resume/archive threads, but there wasn’t a way to explicitly unload a live thread from memory without archiving it. With `thread/unsubscribe`, a client can indicate it is no longer actively working with a live Thread. If this is the only client subscribed to that given thread, the thread will be automatically closed by app-server, at which point the server will send `thread/closed` and `thread/status/changed` with `status: notLoaded` notifications. This gives clients a way to prevent long-running app-server processes from accumulating too many thread (and related) objects in memory. Closed threads will also be removed from `thread/loaded/list`.	2026-02-25 13:14:30 -08:00
sayan-oai	d45ffd5830	make 5.3-codex visible in cli for api users (#12808 ) 5.3-codex released in api, mark it visible for API users via bundled `models.json`.	2026-02-25 13:01:40 -08:00
Michael Bolin	be5bca6f8d	fix: harden zsh fork tests and keep subcommand approvals deterministic (#12809 ) ## Why The prior `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` assertion was brittle under Bazel: command approval payloads in the test could include environment-dependent wrapper/command formatting differences, which makes exact command-string matching flaky even when behavior is correct. (This regression was knowingly introduced in https://github.com/openai/codex/pull/12800, but it was urgent to land that PR.) ## What changed - Hardened `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` in [`turn_start_zsh_fork.rs`](https://github.com/openai/codex/blob/main/codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs): - Replaced strict `approval_command.starts_with("/bin/rm")` checks with intent-based subcommand matching. - Subcommand approvals are now recognized by file-target semantics (`first.txt` or `second.txt`) plus `rm` intent. - Parent approval recognition is now more tolerant of command-format differences while still requiring a definitive parent command context. - Uses a defensive loop that waits for all target subcommand decisions and the parent approval request. - Preserved the existing regression and unit test fixes from earlier commits in `unix_escalation.rs` and `skill_approval.rs`. ## Verification - Ran the zsh fork subcommand decline regression under this change: - `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` - Confirmed the test is now robust against approval-command-string variation instead of hardcoding one expected command shape.	2026-02-25 12:23:30 -08:00
Owen Lin	a0fd94bde6	feat(app-server): add ThreadItem::DynamicToolCall (#12732 ) Previously, clients would call `thread/start` with dynamic_tools set, and when a model invokes a dynamic tool, it would just make the server->client `item/tool/call` request and wait for the client's response to complete the tool call. This works, but it doesn't have an `item/started` or `item/completed` event. Now we are doing this: - [new] emit `item/started` with `DynamicToolCall` populated with the call arguments - send an `item/tool/call` server request - [new] once the client responds, emit `item/completed` with `DynamicToolCall` populated with the response. Also, with `persistExtendedHistory: true`, dynamic tool calls are now reconstructable in `thread/read` and `thread/resume` as `ThreadItem::DynamicToolCall`.	2026-02-25 12:00:10 -08:00
Ahmed Ibrahim	947092283a	Add app-server v2 thread realtime API (#12715 ) Add experimental `thread/realtime/*` v2 requests and notifications, then route app-server realtime events through that thread-scoped surface with integration coverage. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-25 09:59:10 -08:00
jif-oai	f46b767b7e	feat: add search term to thread list (#12578 ) Add `searchTerm` to `thread/list` that will search for a match in the titles (the condition being `searchTerm` $$\in$$ `title`)	2026-02-25 09:59:41 +00:00
jif-oai	10c04e11b8	feat: add service name to app-server (#12319 ) Add service name to the app-server so that the app can use it's own service name This is on thread level because later we might plan the app-server to become a singleton on the computer	2026-02-25 09:51:42 +00:00
Max Johnson	5163850025	codex-rs/app-server: graceful websocket restart on Ctrl-C (#12517 ) ## Summary - add graceful websocket app-server restart on Ctrl-C by draining until no assistant turns are running - stop the websocket acceptor and disconnect existing connections once the drain condition is met - add a websocket integration test that verifies Ctrl-C waits for an in-flight turn before exit ## Verification - `cargo check -p codex-app-server --quiet` - `cargo test -p codex-app-server --test all suite::v2::connection_handling_websocket` - I (maxj) tested remote and local Codex.app --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-24 16:27:59 -08:00
daveaitel-openai	dcab40123f	Agent jobs (spawn_agents_on_csv) + progress UI (#10935 ) ## Summary - Add agent job support: spawn a batch of sub-agents from CSV, auto-run, auto-export, and store results in SQLite. - Simplify workflow: remove run/resume/get-status/export tools; spawn is deterministic and completes in one call. - Improve exec UX: stable, single-line progress bar with ETA; suppress sub-agent chatter in exec. ## Why Enables map-reduce style workflows over arbitrarily large repos using the existing Codex orchestrator. This addresses review feedback about overly complex job controls and non-deterministic monitoring. ## Demo (progress bar) ``` ./codex-rs/target/debug/codex exec \ --enable collab \ --enable sqlite \ --full-auto \ --progress-cursor \ -c agents.max_threads=16 \ -C /Users/daveaitel/code/codex \ - <<'PROMPT' Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows: path = item-01..item-30, area = test. Then call spawn_agents_on_csv with: - csv_path: /tmp/agent_job_progress_demo.csv - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1." - output_csv_path: /tmp/agent_job_progress_demo_out.csv PROMPT ``` ## Review feedback addressed - Auto-start jobs on spawn; removed run/resume/status/export tools. - Auto-export on success. - More descriptive tool spec + clearer prompts. - Avoid deadlocks on spawn failure; pending/running handled safely. - Progress bar no longer scrolls; stable single-line redraw. ## Tests - `cd codex-rs && cargo test -p codex-exec` - `cd codex-rs && cargo build -p codex-cli`	2026-02-24 21:00:19 +00:00
pakrym-oai	daf0f03ac8	Ensure shell command skills trigger approval (#12697 ) Summary - detect skill-invoking shell commands based on the original command string, request approvals when needed, and cache positive decisions per session - keep implicit skill invocation emitted after approval and keep skill approval decline messaging centralized to the shell handler - expand and adjust skill approval tests to cover shell-based skill scripts while matching the new detection expectations Testing - Not run (not requested)	2026-02-24 12:13:20 -08:00
Michael Bolin	3ca0e7673b	feat: run zsh fork shell tool via shell-escalation (#12649 ) ## Why This PR switches the `shell_command` zsh-fork path over to `codex-shell-escalation` so the new shell tool can use the shared exec-wrapper/escalation protocol instead of the `zsh_exec_bridge` implementation that was introduced in https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on UNIX domain sockets, which is not as tamper-proof as the FD-based approach in `codex-shell-escalation`. ## What Changed - Added a Unix zsh-fork runtime adapter in `core` (`core/src/tools/runtimes/shell/unix_escalation.rs`) that: - runs zsh-fork commands through `codex_shell_escalation::run_escalate_server` - bridges exec-policy / approval decisions into `ShellActionProvider` - executes escalated commands via a `ShellCommandExecutor` that calls `process_exec_tool_call` - Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving the generic `shell` tool path unchanged. - Removed the `zsh_exec_bridge`-based session service and deleted `core/src/zsh_exec_bridge/mod.rs`. - Moved exec-wrapper entrypoint dispatch to `arg0` by handling the `codex-execve-wrapper` arg0 alias there, and removed the old `codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and `app-server` mains. - Added the needed `codex-shell-escalation` dependencies for `core` and `arg0`. ## Tests - `cargo test -p codex-core shell_zsh_fork_prefers_shell_command_over_unified_exec` - `cargo test -p codex-app-server turn_start_shell_zsh_fork -- --nocapture` - verifies zsh-fork command execution and approval flows through the new backend - includes subcommand approve/decline coverage using the shared zsh DotSlash fixture in `app-server/tests/suite/zsh` - To test manually, I added the following to `~/.codex/config.toml`: ```toml zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh" [features] shell_zsh_fork = true ``` Then I ran `just c` to run the dev build of Codex with these changes and sent it the message: ``` run `echo $0` ``` And it replied with: ``` echo $0 printed: /Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh In this tool context, $0 reflects the script path used to invoke the shell, not just zsh. ``` so the tool appears to be wired up correctly. ## Notes - The zsh subcommand-decline integration test now uses `rm` under a `WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is auto-allowed by the new `shell-escalation` policy path, which no longer produces subcommand approval prompts.	2026-02-24 10:31:08 -08:00
Dylan Hurd	f6053fdfb3	feat(core) Introduce Feature::RequestPermissions (#11871 ) ## Summary Introduces the initial implementation of Feature::RequestPermissions. RequestPermissions allows the model to request that a command be run inside the sandbox, with additional permissions, like writing to a specific folder. Eventually this will include other rules as well, and the ability to persist these permissions, but this PR is already quite large - let's get the core flow working and go from there! <img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM" src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368" /> ## Testing - [x] Added tests - [x] Tested locally - [x] Feature	2026-02-24 09:48:57 -08:00
sayan-oai	7e46e5b9c2	chore: rm hardcoded PRESETS list (#12650 ) rm `PRESETS` list harcoded in `model_presets` as we now have bundled `models.json` with equivalent info. update logic to rely on bundled models instead, update tests.	2026-02-23 22:35:51 -08:00
pakrym-oai	58763afa0f	Add skill approval event/response (#12633 ) Set the stage for skill-level permission approval in addition to command-level. Behind a feature flag.	2026-02-23 22:28:58 -08:00
Michael Bolin	38f84b6b29	refactor: delete exec-server and move execve wrapper into shell-escalation (#12632 ) ## Why We already plan to remove the shell-tool MCP path, and doing that cleanup first makes the follow-on `shell-escalation` work much simpler. This change removes the last remaining reason to keep `codex-rs/exec-server` around by moving the `codex-execve-wrapper` binary and shared shell test fixtures to the crates/tests that now own that functionality. ## What Changed ### Delete `codex-rs/exec-server` - Remove the `exec-server` crate, including the MCP server binary, MCP-specific modules, and its test support/test suite - Remove `exec-server` from the `codex-rs` workspace and update `Cargo.lock` ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation` - Move the wrapper implementation into `shell-escalation` (`src/unix/execve_wrapper.rs`) - Add the `codex-execve-wrapper` binary entrypoint under `shell-escalation/src/bin/` - Update `shell-escalation` exports/module layout so the wrapper entrypoint is hosted there - Move the wrapper README content from `exec-server` to `shell-escalation/README.md` ### Move shared shell test fixtures to `app-server` - Move the DotSlash `bash`/`zsh` test fixtures from `exec-server/tests/suite/` to `app-server/tests/suite/` - Update `app-server` zsh-fork tests to reference the new fixture paths ### Keep `shell-tool-mcp` as a shell-assets package - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm artifact contains only patched Bash/Zsh payloads (no Rust binaries) - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`, and docs to reflect the shell-assets-only package shape - `shell-tool-mcp-ci.yml` does not need changes because it is already JS-only ## Verification - `cargo shear` - `cargo clippy -p codex-shell-escalation --tests` - `just clippy`	2026-02-23 20:10:22 -08:00
Michael Bolin	e8949f4507	test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518 ) ## Why The zsh integration tests were still brittle in two ways: - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so they often did not exercise the patched zsh fork that `shell-tool-mcp` ships - once the tests consistently used the vendored zsh fork, they exposed real Linux-specific zsh-fork issues in CI In particular, the Linux failures were not just test noise: - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode could receive malformed arguments - the `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` test uses the zsh exec bridge (which talks to the parent over a Unix socket), but Linux restricted sandbox seccomp denies `connect(2)`, causing timeouts on `ubuntu-24.04` x86/arm This PR makes the zsh tests consistently run against the intended vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI signal is meaningful. ## What Changed - Added a single shared test-only DotSlash file for the patched zsh fork at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing `bash` test resource). - Updated both app-server and exec-server zsh tests to use that shared DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH` dependency). - Updated the app-server zsh-fork test helper to resolve the shared DotSlash zsh and avoid silently falling back to host zsh. - Kept the app-server zsh-fork tests configured via `config.toml`, using a test wrapper path where needed to force `zsh -df` (and rewrite `-lc` to `-c`) for the subcommand-decline test. - Hardened the app-server subcommand-decline zsh-fork test for CI variability: - tolerate an extra `/responses` POST with a no-op mock response - tolerate non-target approval ordering while remaining strict on the two `/usr/bin/true` approvals and decline behavior - use `DangerFullAccess` on Linux for this one test because it validates zsh approval flow, not Linux sandbox socket restrictions - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox` arg0 dispatch continues to work. - Moved `maybe_run_zsh_exec_wrapper_mode()` under `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode handling coexists correctly with arg0-dispatched helper modes. - Consolidated duplicated `dotslash -- fetch` resolution logic into shared test support (`core/tests/common/lib.rs`). - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh differences by: - resolving an absolute `git` path - running `git init --quiet .` - asserting success / `.git` creation instead of relying on banner text ## Verification - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture` - `cargo test -p codex-exec-server accept_elicitation -- --nocapture` - `bazel test //codex-rs/exec-server:exec-server-all-test --test_output=streamed --test_arg=--nocapture --test_arg=accept_elicitation_for_prompt_rule_with_zsh` - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 - x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm - aarch64-unknown-linux-gnu` passed in [run 22291424358](https://github.com/openai/codex/actions/runs/22291424358)	2026-02-22 19:39:56 -08:00
Michael Bolin	1af2a37ada	chore: remove codex-core public protocol/shell re-exports (#12432 ) ## Why `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules from `codex-protocol` and `codex-shell-command`. That made it easy for workspace crates to import those APIs through `codex-core`, which in turn hides dependency edges and makes it harder to reduce compile-time coupling over time. This change removes those public re-exports so call sites must import from the source crates directly. Even when a crate still depends on `codex-core` today, this makes dependency boundaries explicit and unblocks future work to drop `codex-core` dependencies where possible. ## What Changed - Removed public re-exports from `codex-rs/core/src/lib.rs` for: - `codex_protocol::protocol` and related protocol/model types (including `InitialHistory`) - `codex_protocol::config_types` (`protocol_config_types`) - `codex_shell_command::{bash, is_dangerous_command, is_safe_command, parse_command, powershell}` - Migrated workspace Rust call sites to import directly from: - `codex_protocol::protocol` - `codex_protocol::config_types` - `codex_protocol::models` - `codex_shell_command` - Added explicit `Cargo.toml` dependencies (`codex-protocol` / `codex-shell-command`) in crates that now import those crates directly. - Kept `codex-core` internal modules compiling by using `pub(crate)` aliases in `core/src/lib.rs` (internal-only, not part of the public API). - Updated the two utility crates that can already drop a `codex-core` dependency edge entirely: - `codex-utils-approval-presets` - `codex-utils-cli` ## Verification - `cargo test -p codex-utils-approval-presets` - `cargo test -p codex-utils-cli` - `cargo check --workspace --all-targets` - `just clippy`	2026-02-20 23:45:35 -08:00
Yaroslav Volovich	dca9c40dd5	test(app-server): wait for turn/completed in turn_start tests (#12376 ) ## Summary - switch a few app-server `turn_start` tests from `codex/event/task_complete` waits to `turn/completed` waits - avoid matching unrelated/background `task_complete` events - keep this flaky test fix separate from the /title feature PR ## Why On Windows ARM CI, these tests can return early after observing a generic `codex/event/task_complete` notification from another task. That can leave the mock Responses server with fewer calls than expected and fail the test with a wiremock verification mismatch. Using `turn/completed` matches the app-server turn lifecycle notification the tests actually care about. ## Validation - `cargo test -p codex-app-server turn_start_updates_sandbox_and_cwd_between_turns_v2 -- --nocapture` - `cargo test -p codex-app-server turn_start_exec_approval_ -- --nocapture` - `just fmt`	2026-02-20 21:15:21 -08:00
natea-oai	936e744c93	Add field to Thread object for the latest rename set for a given thread (#12301 ) Exposes through the app server updated names set for a thread. This enables other surfaces to use the core as the source of truth for thread naming. `threadName` is gathered using the helper functions used to interact with `session_index.jsonl`, and is hydrated in: - `thread/list` - `thread/read` - `thread/resume` - `thread/unarchive` - `thread/rollback` We don't do this for `thread/start` and `thread/fork`.	2026-02-20 18:26:57 -08:00
Matthew Zeng	354e7fedd2	[apps] Enforce simple logo url format. (#12374 ) - [x] Enforce simple logo url format when loading apps directory to save bandwidth.	2026-02-20 22:05:55 +00:00
Matthew Zeng	aa121a115e	[apps] Implement apps configs. (#12086 ) - [x] Implement apps configs.	2026-02-20 12:05:21 -08:00
jif-oai	0f9eed3a6f	feat: add nick name to sub-agents (#12320 ) Adding random nick name to sub-agents. Used for UX At the same time, also storing and wiring the role of the sub-agent	2026-02-20 14:39:49 +00:00
Michael Bolin	366ecaf17a	app-server: fix flaky list_apps_returns_connectors_with_accessible_flags test (#12286 ) ## Why `app/list` emits `app/list/updated` after whichever async load finishes first (directory connectors or accessible tools). This test assumed the directory-backed update always arrived first because it injected a tools delay, but that assumption is not stable when the process-global Codex Apps tools cache is already warm. In that case the accessible-tools path can return immediately and the first notification shape flips, which makes the assertion flaky. Relevant code paths: - [`codex-rs/app-server/src/codex_message_processor.rs`](`13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034)`) (concurrent loads + per-load `app/list/updated` notifications) - [`codex-rs/core/src/mcp_connection_manager.rs`](`13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197)`) (Codex Apps tools cache hit path) ## What Changed Updated `suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags` in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either valid first `app/list/updated` payload: - the directory-first snapshot - the accessible-tools-first snapshot The test still keeps the later assertions strict: - the second `app/list/updated` notification must be the fully merged result - the final `app/list` response must match the same merged result I also added an inline comment explaining why the first notification is intentionally order-insensitive. ## Verification - `cargo test -p codex-app-server`	2026-02-20 02:27:18 +00:00
Michael Bolin	4fa304306b	tests: centralize in-flight turn cleanup helper (#12271 ) ## Why Several tests intentionally exercise behavior while a turn is still active. The cleanup sequence for those tests (`turn/interrupt` + waiting for `codex/event/turn_aborted`) was duplicated across files, which made the rationale easy to lose and the pattern easy to apply inconsistently. This change centralizes that cleanup in one place with a single explanatory doc comment. ## What Changed ### Added shared helper In `codex-rs/app-server/tests/common/mcp_process.rs`: - Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`. - Added a doc comment explaining why explicit interrupt + terminal wait is required for tests that intentionally leave a turn in-flight. ### Migrated call sites Replaced duplicated interrupt/aborted blocks with the helper in: - `codex-rs/app-server/tests/suite/v2/thread_resume.rs` - `thread_resume_rejects_history_when_thread_is_running` - `thread_resume_rejects_mismatched_path_when_thread_is_running` - `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs` - `turn_start_shell_zsh_fork_executes_command_v2` - `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` - `codex-rs/app-server/tests/suite/v2/turn_steer.rs` - `turn_steer_returns_active_turn_id` ### Existing cleanup retained In `codex-rs/app-server/tests/suite/v2/turn_start.rs`: - `turn_start_accepts_local_image_input` continues to explicitly wait for `turn/completed` so the turn lifecycle is fully drained before test exit. ## Verification - `cargo test -p codex-app-server`	2026-02-20 01:47:34 +00:00
Michael Bolin	7ed3e3760d	tests(thread_resume): interrupt running turns in resume error-path tests (#12269 ) ## Why `thread_resume` tests can intentionally create an in-flight turn, assert a `thread/resume` error path, and return immediately. That leaves turn work active during teardown, which can surface as intermittent `LEAK` failures. Sample output that motivated this investigation (reported during test runs): ```text LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch ``` ## What Changed Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`: - `thread_resume_rejects_history_when_thread_is_running` - `thread_resume_rejects_mismatched_path_when_thread_is_running` Both tests now: 1. capture the running turn id from `TurnStartResponse` 2. assert the expected `thread/resume` error 3. call `turn/interrupt` for that running turn 4. wait for `codex/event/turn_aborted` before returning ## Why This Is The Correct Fix These tests are specifically validating resume behavior while a turn is active. They should also own cleanup of that active turn before exiting. Explicitly interrupting and waiting for the terminal abort notification removes teardown races and avoids relying on process-drop behavior to clean up in-flight work. ## Repro / Verification Repro command used for investigation: ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) \| test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) \| test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) \| test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)' ``` Observed before this change: intermittent `LEAK` in `thread_resume_rejects_history_when_thread_is_running`. Also verified with: - `cargo test -p codex-app-server` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269). * #12271 * __->__ #12269	2026-02-19 21:51:18 +00:00
Michael Bolin	2f3d0b186b	app-server tests: reduce intermittent nextest LEAK via graceful child shutdown (#12266 ) ## Why `cargo nextest` was intermittently reporting `LEAK` for `codex-app-server` tests even when assertions passed. This adds noise and flakiness to local/CI signals. Sample output used as the basis of this investigation: ```text LEAK [ 7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1 LEAK [ 7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model LEAK [ 7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests LEAK [ 8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2 LEAK [ 8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item LEAK [ 8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2 LEAK [ 6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2 LEAK [ 6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2 ``` ## How I Reproduced I focused on the suspect tests and ran them under `nextest` stress mode with leak reporting enabled. ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) \| test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) \| test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) \| test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) \| test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) \| test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) \| test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) \| test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)' ``` This reproduced intermittent `LEAK` statuses while tests still passed. ## What Changed In `codex-rs/app-server/tests/common/mcp_process.rs`: - Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown can explicitly close stdin. - In `Drop`, close stdin first to trigger EOF-based graceful shutdown. - Wait briefly for graceful exit. - If still running, fall back to `start_kill()` and the existing bounded `try_wait()` loop. - Updated send-path handling to bail if stdin is already closed. ## Why This Is the Right Fix The leak signal was caused by child-process teardown timing, not test-logic assertion failure. The helper previously relied mostly on force-kill timing in `Drop`; that can race with nextest leak detection. Closing stdin first gives `codex-app-server` a deterministic, graceful shutdown path before force-kill. Keeping the force-kill fallback preserves robustness if graceful shutdown does not complete in time. ## Verification - `cargo test -p codex-app-server` - Re-ran the stress repro above after this change: no `LEAK` statuses observed. - Additional high-signal stress run also showed no leaks: ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) \| test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)' ```	2026-02-19 20:19:42 +00:00
Charley Cunningham	abb018383f	Undo stack size Bazel test hack (#12258 ) Undo hack from https://github.com/openai/codex/pull/12203/changes	2026-02-19 11:04:45 -08:00
Charley Cunningham	16c3c47535	Stabilize app-server detached review and running-resume tests (#12203 ) ## Summary - stabilize `thread_resume_rejoins_running_thread_even_with_override_mismatch` by using a valid delayed second SSE response instead of an intentionally truncated stream - set `RUST_MIN_STACK=4194304` for spawned app-server test processes in `McpProcess` to avoid stack-sensitive CI overflows in detached review tests ## Why - the thread-resume assertion could race with a mocked stream-disconnect error and intermittently observe `systemError` - detached review startup is stack-sensitive in some CI environments; pinning a larger stack in the test harness removes that flake without changing product behavior ## Validation - `just fmt` - `cargo test -p codex-app-server --test all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch` - `cargo test -p codex-app-server --test all suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`	2026-02-18 19:05:35 -08:00

1 2 3 4 5 ...

376 commits