core-agent-ide

Author	SHA1	Message	Date
Michael Bolin	366ecaf17a	app-server: fix flaky list_apps_returns_connectors_with_accessible_flags test (#12286 ) ## Why `app/list` emits `app/list/updated` after whichever async load finishes first (directory connectors or accessible tools). This test assumed the directory-backed update always arrived first because it injected a tools delay, but that assumption is not stable when the process-global Codex Apps tools cache is already warm. In that case the accessible-tools path can return immediately and the first notification shape flips, which makes the assertion flaky. Relevant code paths: - [`codex-rs/app-server/src/codex_message_processor.rs`](`13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034)`) (concurrent loads + per-load `app/list/updated` notifications) - [`codex-rs/core/src/mcp_connection_manager.rs`](`13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197)`) (Codex Apps tools cache hit path) ## What Changed Updated `suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags` in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either valid first `app/list/updated` payload: - the directory-first snapshot - the accessible-tools-first snapshot The test still keeps the later assertions strict: - the second `app/list/updated` notification must be the fully merged result - the final `app/list` response must match the same merged result I also added an inline comment explaining why the first notification is intentionally order-insensitive. ## Verification - `cargo test -p codex-app-server`	2026-02-20 02:27:18 +00:00
Michael Bolin	4fa304306b	tests: centralize in-flight turn cleanup helper (#12271 ) ## Why Several tests intentionally exercise behavior while a turn is still active. The cleanup sequence for those tests (`turn/interrupt` + waiting for `codex/event/turn_aborted`) was duplicated across files, which made the rationale easy to lose and the pattern easy to apply inconsistently. This change centralizes that cleanup in one place with a single explanatory doc comment. ## What Changed ### Added shared helper In `codex-rs/app-server/tests/common/mcp_process.rs`: - Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`. - Added a doc comment explaining why explicit interrupt + terminal wait is required for tests that intentionally leave a turn in-flight. ### Migrated call sites Replaced duplicated interrupt/aborted blocks with the helper in: - `codex-rs/app-server/tests/suite/v2/thread_resume.rs` - `thread_resume_rejects_history_when_thread_is_running` - `thread_resume_rejects_mismatched_path_when_thread_is_running` - `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs` - `turn_start_shell_zsh_fork_executes_command_v2` - `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` - `codex-rs/app-server/tests/suite/v2/turn_steer.rs` - `turn_steer_returns_active_turn_id` ### Existing cleanup retained In `codex-rs/app-server/tests/suite/v2/turn_start.rs`: - `turn_start_accepts_local_image_input` continues to explicitly wait for `turn/completed` so the turn lifecycle is fully drained before test exit. ## Verification - `cargo test -p codex-app-server`	2026-02-20 01:47:34 +00:00
Michael Bolin	7ed3e3760d	tests(thread_resume): interrupt running turns in resume error-path tests (#12269 ) ## Why `thread_resume` tests can intentionally create an in-flight turn, assert a `thread/resume` error path, and return immediately. That leaves turn work active during teardown, which can surface as intermittent `LEAK` failures. Sample output that motivated this investigation (reported during test runs): ```text LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch ``` ## What Changed Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`: - `thread_resume_rejects_history_when_thread_is_running` - `thread_resume_rejects_mismatched_path_when_thread_is_running` Both tests now: 1. capture the running turn id from `TurnStartResponse` 2. assert the expected `thread/resume` error 3. call `turn/interrupt` for that running turn 4. wait for `codex/event/turn_aborted` before returning ## Why This Is The Correct Fix These tests are specifically validating resume behavior while a turn is active. They should also own cleanup of that active turn before exiting. Explicitly interrupting and waiting for the terminal abort notification removes teardown races and avoids relying on process-drop behavior to clean up in-flight work. ## Repro / Verification Repro command used for investigation: ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) \| test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) \| test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) \| test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)' ``` Observed before this change: intermittent `LEAK` in `thread_resume_rejects_history_when_thread_is_running`. Also verified with: - `cargo test -p codex-app-server` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269). * #12271 * __->__ #12269	2026-02-19 21:51:18 +00:00
Michael Bolin	2f3d0b186b	app-server tests: reduce intermittent nextest LEAK via graceful child shutdown (#12266 ) ## Why `cargo nextest` was intermittently reporting `LEAK` for `codex-app-server` tests even when assertions passed. This adds noise and flakiness to local/CI signals. Sample output used as the basis of this investigation: ```text LEAK [ 7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1 LEAK [ 7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model LEAK [ 7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests LEAK [ 8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2 LEAK [ 8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item LEAK [ 8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2 LEAK [ 6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2 LEAK [ 6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2 ``` ## How I Reproduced I focused on the suspect tests and ran them under `nextest` stress mode with leak reporting enabled. ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) \| test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) \| test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) \| test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) \| test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) \| test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) \| test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) \| test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)' ``` This reproduced intermittent `LEAK` statuses while tests still passed. ## What Changed In `codex-rs/app-server/tests/common/mcp_process.rs`: - Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown can explicitly close stdin. - In `Drop`, close stdin first to trigger EOF-based graceful shutdown. - Wait briefly for graceful exit. - If still running, fall back to `start_kill()` and the existing bounded `try_wait()` loop. - Updated send-path handling to bail if stdin is already closed. ## Why This Is the Right Fix The leak signal was caused by child-process teardown timing, not test-logic assertion failure. The helper previously relied mostly on force-kill timing in `Drop`; that can race with nextest leak detection. Closing stdin first gives `codex-app-server` a deterministic, graceful shutdown path before force-kill. Keeping the force-kill fallback preserves robustness if graceful shutdown does not complete in time. ## Verification - `cargo test -p codex-app-server` - Re-ran the stress repro above after this change: no `LEAK` statuses observed. - Additional high-signal stress run also showed no leaks: ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) \| test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)' ```	2026-02-19 20:19:42 +00:00
Charley Cunningham	abb018383f	Undo stack size Bazel test hack (#12258 ) Undo hack from https://github.com/openai/codex/pull/12203/changes	2026-02-19 11:04:45 -08:00
Charley Cunningham	16c3c47535	Stabilize app-server detached review and running-resume tests (#12203 ) ## Summary - stabilize `thread_resume_rejoins_running_thread_even_with_override_mismatch` by using a valid delayed second SSE response instead of an intentionally truncated stream - set `RUST_MIN_STACK=4194304` for spawned app-server test processes in `McpProcess` to avoid stack-sensitive CI overflows in detached review tests ## Why - the thread-resume assertion could race with a mocked stream-disconnect error and intermittently observe `systemError` - detached review startup is stack-sensitive in some CI environments; pinning a larger stack in the test harness removes that flake without changing product behavior ## Validation - `just fmt` - `cargo test -p codex-app-server --test all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch` - `cargo test -p codex-app-server --test all suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`	2026-02-18 19:05:35 -08:00
Ruslan Nigmatullin	1f54496c48	app-server: expose loaded thread status via read/list and notifications (#11786 ) Motivation - Today, a newly connected client has no direct way to determine the current runtime status of threads from read/list responses alone. - This forces clients to infer state from transient events, which can lead to stale or inconsistent UI when reconnecting or attaching late. Changes - Add `status` to `thread/read` responses. - Add `statuses` to `thread/list` responses. - Emit `thread/status/changed` notifications with `threadId` and the new status. - Track runtime status for all loaded threads and default unknown threads to `idle`. - Update protocol/docs/tests/schema fixtures for the revised API. Testing - Validated protocol API changes with automated protocol tests and regenerated schema/type fixtures. - Validated app-server behavior with unit and integration test suites, including status transitions and notifications.	2026-02-18 15:20:03 -08:00
iceweasel-oai	292542616a	app-server support for Windows sandbox setup. (#12025 ) app-server support for initiating Windows sandbox setup. server responds quickly to setup request and makes a future RPC call back to client when the setup finishes. The TUI implementation is unaffected but in a future PR I'll update the TUI to use the shared setup helper (`windows_sandbox.run_windows_sandbox_setup`)	2026-02-18 13:03:16 -08:00
Owen Lin	edacbf7b6e	feat(core): zsh exec bridge (#12052 ) zsh fork PR stack: - https://github.com/openai/codex/pull/12051 - https://github.com/openai/codex/pull/12052 👈 ### Summary This PR introduces a feature-gated native shell runtime path that routes shell execution through a patched zsh exec bridge, removing MCP-specific behavior from the shell hot path while preserving existing CommandExecution lifecycle semantics. When shell_zsh_fork is enabled, shell commands run via patched zsh with per-`execve` interception through EXEC_WRAPPER. Core receives wrapper IPC requests over a Unix socket, applies existing approval policy, and returns allow/deny before the subcommand executes. ### What’s included 1) New zsh exec bridge runtime in core - Wrapper-mode entrypoint (maybe_run_zsh_exec_wrapper_mode) for EXEC_WRAPPER invocations. - Per-execution Unix-socket IPC handling for wrapper requests/responses. - Approval callback integration using existing core approval orchestration. - Streaming stdout/stderr deltas to existing command output event pipeline. - Error handling for malformed IPC, denial/abort, and execution failures. 2) Session lifecycle integration SessionServices now owns a `ZshExecBridge`. Session startup initializes bridge state; shutdown tears it down cleanly. 3) Shell runtime routing (feature-gated) When `shell_zsh_fork` is enabled: - Build execution env/spec as usual. - Add wrapper socket env wiring. - Execute via `zsh_exec_bridge.execute_shell_request(...)` instead of the regular shell path. - Non-zsh-fork behavior remains unchanged. 4) Config + feature wiring - Added `Feature::ShellZshFork` (under development). - Added config support for `zsh_path` (optional absolute path to patched zsh): - `Config`, `ConfigToml`, `ConfigProfile`, overrides, and schema. - Session startup validates that `zsh_path` exists/usable when zsh-fork is enabled. - Added startup test for missing `zsh_path` failure mode. 5) Seatbelt/sandbox updates for wrapper IPC - Extended seatbelt policy generation to optionally allow outbound connection to explicitly permitted Unix sockets. - Wired sandboxing path to pass wrapper socket path through to seatbelt policy generation. - Added/updated seatbelt tests for explicit socket allow rule and argument emission. 6) Runtime entrypoint hooks - This allows the same binary to act as the zsh wrapper subprocess when invoked via `EXEC_WRAPPER`. 7) Tool selection behavior - ToolsConfig now prefers ShellCommand type when shell_zsh_fork is enabled. - Added test coverage for precedence with unified-exec enabled.	2026-02-17 20:19:53 -08:00
Owen Lin	db4d2599b5	feat(core): plumb distinct approval ids for command approvals (#12051 ) zsh fork PR stack: - https://github.com/openai/codex/pull/12051 👈 - https://github.com/openai/codex/pull/12052 With upcoming support for a fork of zsh that allows us to intercept `execve` and run execpolicy checks for each subcommand as part of a `CommandExecution`, it will be possible for there to be multiple approval requests for a shell command like `/path/to/zsh -lc 'git status && rg \"TODO\" src && make test'`. To support that, this PR introduces a new `approval_id` field across core, protocol, and app-server so that we can associate approvals properly for subcommands.	2026-02-18 01:55:57 +00:00
Shijie Rao	b3a8571219	Chore: remove response model check and rely on header model for downgrade (#12061 ) ### Summary Ensure that we use the model value from the response header only so that we are guaranteed with the correct slug name. We are no longer checking against the model value from response so that we are less likely to have false positive. There are two different treatments - for SSE we use the header from the response and for websocket we check top-level events.	2026-02-18 01:50:06 +00:00
Ruslan Nigmatullin	31cbebd3c2	app-server: Emit thread archive/unarchive notifications (#12030 ) * Add v2 server notifications `thread/archived` and `thread/unarchived` with a `threadId` payload. * Wire new events into `thread/archive` and `thread/unarchive` success paths. * Update app-server protocol/schema/docs accordingly. Testing: - Updated archive/unarchive end-to-end tests to verify both notifications are emitted with the expected thread id payload.	2026-02-17 14:53:58 -08:00
Matthew Zeng	16fa195fce	[apps] Expose more fields from apps listing endpoints. (#11706 ) - [x] Expose app_metadata, branding, and labels in AppInfo.	2026-02-17 11:45:04 -08:00
sayan-oai	41800fc876	chore: rm remote models fflag (#11699 ) rm `remote_models` feature flag. We see issues like #11527 when a user has `remote_models` disabled, as we always use the default fallback `ModelInfo`. This causes issues with model performance. Builds on #11690, which helps by warning the user when they are using the default fallback. This PR will make that happen much less frequently as an accidental consequence of disabling `remote_models`.	2026-02-17 11:43:16 -08:00
Shijie Rao	48018e9eac	Feat: add model reroute notification (#12001 ) ### Summary Builiding off `5c75aa7b89 (diff-058ae8f109a8b84b4b79bbfa45f522c2233b9d9e139696044ae374d50b6196e0)`, we have created a `model/rerouted` notification that captures the event so that consumers can render as expected. Keep the `EventMsg::Warning` path in core so that this does not affect TUI rendering. `model/rerouted` is meant to be generic to account for future usage including capacity planning etc.	2026-02-17 11:02:23 -08:00
Fouad Matin	02e9006547	add(core): safety check downgrade warning (#11964 ) Add per-turn notice when a request is downgraded to a fallback model due to cyber safety checks. Changes - codex-api: Emit a ServerModel event based on the openai-model response header and/or response payload (SSE + WebSocket), including when the model changes mid-stream. - core: When the server-reported model differs from the requested model, emit a single per-turn warning explaining the reroute to gpt-5.2 and directing users to Trusted Access verification and the cyber safety explainer. - app-server (v2): Surface these cyber model-routing warnings as synthetic userMessage items with text prefixed by Warning: (and document this behavior).	2026-02-16 22:13:36 -08:00
Dylan Hurd	19afbc35c1	chore(core) rm Feature::RequestRule (#11866 ) ## Summary This feature is now reasonably stable, let's remove it so we can simplify our upcoming iterations here. ## Testing - [x] Existing tests pass	2026-02-16 22:30:23 +00:00
sayan-oai	060a320e7d	fix: show user warning when using default fallback metadata (#11690 ) ### What It's currently unclear when the harness falls back to the default, generic `ModelInfo`. This happens when the `remote_models` feature is disabled or the model is truly unknown, and can lead to bad performance and issues in the harness. Add a user-facing warning when this happens so they are aware when their setup is broken. ### Tests Added tests, tested locally.	2026-02-15 18:46:05 -08:00
sayan-oai	6b466df146	fix: send unfiltered models over model/list (#11793 ) ### What to unblock filtering models in VSCE, change `model/list` app-server endpoint to send all models + visibility field `showInPicker` so filtering can be done in VSCE if desired. ### Tests Updated tests.	2026-02-13 16:26:32 -08:00
Max Johnson	fb0aaf94de	codex-rs: fix thread resume rejoin semantics (#11756 ) ## Summary - always rejoin an in-memory running thread on `thread/resume`, even when overrides are present - reject `thread/resume` when `history` is provided for a running thread - reject `thread/resume` when `path` mismatches the running thread rollout path - warn (but do not fail) on override mismatches for running threads - add more `thread_resume` integration tests and fixes; including restart-based resume-with-overrides coverage ## Validation - `just fmt` - `cargo test -p codex-app-server --test all thread_resume` - manual test with app-server-test-client https://github.com/openai/codex/pull/11755 - manual test both stdio and websocket in app	2026-02-13 23:09:58 +00:00
Jeremy Rose	e4f8263798	[app-server] add fuzzyFileSearch/sessionCompleted (#11773 ) this is to allow the client to know when to stop showing a spinner.	2026-02-13 15:08:14 -08:00
Matthew Zeng	f93037f55d	[apps] Fix app loading logic. (#11518 ) When `app/list` is called with `force_refetch=True`, we should seed the results with what is already cached instead of starting from an empty list. Otherwise when we send app/list/updated events, the client will first see an empty list of accessible apps and then get the updated one.	2026-02-13 03:55:10 +00:00
acrognale-oai	ebe359b876	Add cwd as an optional field to thread/list (#11651 ) Add's the ability to filter app-server thread/list by cwd	2026-02-13 02:05:04 +00:00
Charley Cunningham	f24669d444	Persist complete TurnContextItem state via canonical conversion (#11656 ) ## Summary This PR delivers the first small, shippable step toward model-visible state diffing by making `TurnContextItem` more complete and standardizing how it is built. Specifically, it: - Adds persisted network context to `TurnContextItem`. - Introduces a single canonical `TurnContext -> TurnContextItem` conversion path. - Routes existing rollout write sites through that canonical conversion helper. No context injection/diff behavior changes are included in this PR. ## Why this change The design goal is to make `TurnContextItem` the canonical source of truth for context-diff decisions. Before this PR: - `TurnContextItem` did not include all TurnContext-derived environment inputs needed for v1 completeness. - Construction was duplicated at multiple write sites. This PR addresses both with a minimal, reviewable change. ## Changes ### 1) Extend `TurnContextItem` with network state - Added `TurnContextNetworkItem { allowed_domains, denied_domains }`. - Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`. - Kept backward compatibility by making the new field optional and skipped when absent. Files: - `codex-rs/protocol/src/protocol.rs` ### 2) Canonical conversion helper - Added `TurnContext::to_turn_context_item(collaboration_mode)` in core. - Added internal helper to derive network fields from `config_layer_stack.requirements().network`. Files: - `codex-rs/core/src/codex.rs` ### 3) Use canonical conversion at rollout write sites - Replaced ad hoc `TurnContextItem { ... }` construction with `to_turn_context_item(...)` in: - sampling request path - compaction path Files: - `codex-rs/core/src/codex.rs` - `codex-rs/core/src/compact.rs` ### 4) Update fixtures/tests for new optional field - Updated existing `TurnContextItem` literals in tests to include `network: None`. - Added protocol tests for: - deserializing old payloads with no `network` - serializing when `network` is present Files: - `codex-rs/core/tests/suite/resume_warning.rs` - No replay/diff logic changes. - Persisted rollout `TurnContextItem` now carries additional network context when available. - Older rollout lines without `network` remain readable.	2026-02-12 17:22:44 -08:00
Matthew Zeng	c37560069a	[apps] Add is_enabled to app info. (#11417 ) - [x] Add is_enabled to app info and the response of `app/list`. - [x] Update TUI to have Enable/Disable button on the app detail page.	2026-02-13 00:30:52 +00:00
Michael Bolin	aef4af1079	app-server tests: disable shell_snapshot for review suite (#11657 ) ## Why `suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id` was failing on Windows CI due to an unrelated process crash during shell snapshot initialization (`tokio-runtime-worker` stack overflow). This review test suite validates review API behavior and should not depend on shell snapshot behavior. Keeping shell snapshot enabled in this fixture made the test flaky for reasons outside the scenario under test. ## What Changed - Updated the review suite test config in `codex-rs/app-server/tests/suite/v2/review.rs` to set: - `shell_snapshot = false` This keeps the review tests focused on review behavior by disabling shell snapshot initialization in this fixture. ## Verification - `cargo test -p codex-app-server` - Confirmed the previously failing Windows CI job for this test now passes on this PR.	2026-02-12 23:56:43 +00:00
Owen Lin	76256a8cec	fix: skip review_start_with_detached_delivery_returns_new_thread_id o… (#11645 ) …n windows	2026-02-12 15:12:57 -08:00
Jeremy Rose	66e0c3aaa3	app-server: add fuzzy search sessions for streaming file search (#10268 )	2026-02-12 10:49:44 -08:00
Michael Bolin	abbd74e2be	feat: make sandbox read access configurable with `ReadOnlyAccess` (#11387 ) `SandboxPolicy::ReadOnly` previously implied broad read access and could not express a narrower read surface. This change introduces an explicit read-access model so we can support user-configurable read restrictions in follow-up work, while preserving current behavior today. It also ensures unsupported backends fail closed for restricted-read policies instead of silently granting broader access than intended. ## What - Added `ReadOnlyAccess` in protocol with: - `Restricted { include_platform_defaults, readable_roots }` - `FullAccess` - Updated `SandboxPolicy` to carry read-access configuration: - `ReadOnly { access: ReadOnlyAccess }` - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }` - Preserved existing behavior by defaulting current construction paths to `ReadOnlyAccess::FullAccess`. - Threaded the new fields through sandbox policy consumers and call sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and related tests. - Updated Seatbelt policy generation to honor restricted read roots by emitting scoped read rules when full read access is not granted. - Added fail-closed behavior on Linux and Windows backends when restricted read access is requested but not yet implemented there (`UnsupportedOperation`). - Regenerated app-server protocol schema and TypeScript artifacts, including `ReadOnlyAccess`. ## Compatibility / rollout - Runtime behavior remains unchanged by default (`FullAccess`). - API/schema changes are in place so future config wiring can enable restricted read access without another policy-shape migration.	2026-02-11 18:31:14 -08:00
Michael Bolin	572ab66496	test(app-server): stabilize app/list thread feature-flag test by using file-backed MCP OAuth creds (#11521 ) ## Why `suite::v2::app_list::list_apps_uses_thread_feature_flag_when_thread_id_is_provided` has been flaky in CI. The test exercises `thread/start`, which initializes `codex_apps`. In CI/Linux, that path can reach OS keyring-backed MCP OAuth credential lookup (`Codex MCP Credentials`) and intermittently abort the MCP process (observed stack overflow in `zbus`), causing the test to fail before the assertion logic runs. ## What Changed - Updated the test config in `codex-rs/app-server/tests/suite/v2/app_list.rs` to set `mcp_oauth_credentials_store = "file"` in both relevant config-writing paths: - The in-test config override inside `list_apps_uses_thread_feature_flag_when_thread_id_is_provided` - `write_connectors_config(...)`, which is used by the v2 `app_list` test suite - This keeps test coverage focused on thread-scoped app feature flags while removing OS keyring/DBus dependency from this test path. ## How It Was Verified - `cargo test -p codex-app-server` - `cargo test -p codex-app-server list_apps_uses_thread_feature_flag_when_thread_id_is_provided -- --nocapture`	2026-02-11 18:30:18 -08:00
Max Johnson	7053aa5457	Reapply "Add app-server transport layer with websocket support" (#11370 ) Reapply "Add app-server transport layer with websocket support" with additional fixes from https://github.com/openai/codex/pull/11313/changes to avoid deadlocking. This reverts commit `47356ff83c`. ## Summary To avoid deadlocking when queues are full, we maintain separate tokio tasks dedicated to incoming vs outgoing event handling - split the app-server main loop into two tasks in `run_main_with_transport` - inbound handling (`transport_event_rx`) - outbound handling (`outgoing_rx` + `thread_created_rx`) - separate incoming and outgoing websocket tasks ## Validation Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent requests <img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM" src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f" /> --------- Co-authored-by: jif-oai <jif@openai.com>	2026-02-11 18:13:39 +00:00
Michael Bolin	476c1a7160	Remove `test-support` feature from `codex-core` and replace it with explicit test toggles (#11405 ) ## Why `codex-core` was being built in multiple feature-resolved permutations because test-only behavior was modeled as crate features. For a large crate, those permutations increase compile cost and reduce cache reuse. ## Net Change - Removed the `test-support` crate feature and related feature wiring so `codex-core` no longer needs separate feature shapes for test consumers. - Standardized cross-crate test-only access behind `codex_core::test_support`. - External test code now imports helpers from `codex_core::test_support`. - Underlying implementation hooks are kept internal (`pub(crate)`) instead of broadly public. ## Outcome - Fewer `codex-core` build permutations. - Better incremental cache reuse across test targets. - No intended production behavior change.	2026-02-10 22:44:02 -08:00
xl-openai	fdd0cd1de9	feat: support multiple rate limits (#11260 ) Added multi-limit support end-to-end by carrying limit_name in rate-limit snapshots and handling multiple buckets instead of only codex. Extended /usage client parsing to consume additional_rate_limits Updated TUI /status and in-memory state to store/render per-limit snapshots Extended app-server rate-limit read response: kept rate_limits and added rate_limits_by_name. Adjusted usage-limit error messaging for non-default codex limit buckets	2026-02-10 20:09:31 -08:00
Celia Chen	641d5268fa	chore: persist turn_id in rollout session and make turn_id uuid based (#11246 ) Problem: 1. turn id is constructed in-memory; 2. on resuming threads, turn_id might not be unique; 3. client cannot no the boundary of a turn from rollout files easily. This PR does three things: 1. persist `task_started` and `task_complete` events; 1. persist `turn_id` in rollout turn events; 5. generate turn_id as unique uuids instead of incrementing it in memory. This helps us resolve the issue of clients wanting to have unique turn ids for resuming a thread, and knowing the boundry of each turn in rollout files. example debug logs ``` 2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None } 2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None } 2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None } ``` backward compatibility: if you try to resume an old session without task_started and task_complete event populated, the following happens: - If you resume and do nothing: those reconstructed historical IDs can differ next time you resume. - If you resume and send a new turn: the new turn gets a fresh UUID from live submission flow and is persisted, so that new turn’s ID is stable on later resumes. I think this behavior is fine, because we only care about deterministic turn id once a turn is triggered.	2026-02-11 03:56:01 +00:00
pakrym-oai	c68999ee6d	Prefer websocket transport when model opts in (#11386 ) Summary - add a `prefer_websockets` field to `ModelInfo`, defaulting to `false` in all fixtures and constructors - wire the new flag into websocket selection so models that opt in always use websocket transport even when the feature gate is off Testing - Not run (not requested)	2026-02-10 18:50:48 -08:00
pakrym-oai	bfd4e2112c	Disable very flaky tests (#11394 ) Collected from last 20 builds of main in https://github.com/openai/codex/commits/main/.	2026-02-10 18:50:11 -08:00
jif-oai	847a6092e6	fix: reduce usage of `open_if_present` (#11344 )	2026-02-10 19:25:07 +00:00
jif-oai	a364dd8b56	feat: opt-out of events in the app-server (#11319 ) Add `optOutNotificationMethods` in the app-server to opt-out events based on exact method matching	2026-02-10 18:04:52 +00:00
Max Johnson	47356ff83c	Revert "Add app-server transport layer with websocket support (#10693 )" (#11323 ) Suspected cause of deadlocking bug	2026-02-10 17:37:49 +00:00
Matthew Zeng	005e040f97	[apps] Add thread_id param to optionally load thread config for apps feature check. (#11279 ) - [x] Add thread_id param to optionally load thread config for apps feature check	2026-02-09 23:10:26 -08:00
Owen Lin	53741013ab	fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240 ) …ount_id and chatgpt_plan_type ### Summary Following up on external auth mode which was introduced here: https://github.com/openai/codex/pull/10012 Turns out some clients have a differently shaped ID token and don't have a chosen workspace (aka chatgpt_account_id) encoded in their ID token. So, let's replace `id_token` param with `chatgpt_account_id` and `chatgpt_plan_type` (optional) when initializing the external ChatGPT auth mode (`account/login/start` with `chatgptAuthTokens`). The client was able to test end-to-end with a Codex build from this branch and verified it worked!	2026-02-09 20:48:58 -08:00
Dylan Hurd	168c359b71	Adjust shell command timeouts for Windows (#11247 ) Summary - add platform-aware defaults for shell command timeouts so Windows tests get longer waits - keep medium timeout longer on Windows to ensure flakiness is reduced Testing - Not run (not requested)	2026-02-09 20:03:32 -08:00
Josh McKinney	de59e550c0	test: deflake nextest child-process leak in MCP harnesses (#11263 ) ## Summary - add deterministic child-process cleanup to both test `McpProcess` helpers - keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()` polling in `Drop` - document the failure mode and why this avoids nondeterministic `LEAK` flakes ## Why `cargo nextest` leak detection can intermittently report `LEAK` when a spawned server outlives test teardown, making CI flaky. ## Testing - `just fmt` - `cargo test -p codex-app-server` - `cargo test -p codex-mcp-server` ## Failing CI Reference - Original failing job: https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245	2026-02-10 03:43:24 +00:00
xl-openai	a33ee46e3b	feat: extend skills/list to support additional roots. (#10835 ) Add an optional perCwdExtraUserRoots	2026-02-09 13:30:38 -08:00
gt-oai	9fe925b15a	Load requirements on windows (#10770 ) We support requirements on Unix, loading from `/etc/codex/requirements.toml`. On MacOS, we also support MDM. Now, on Windows, we'll load requirements from `%ProgramData%\OpenAI\Codex\requirements.toml`	2026-02-09 16:05:38 +00:00
Matthew Zeng	45b7763c3f	[apps] Improve app loading. (#10994 ) There are two concepts of apps that we load in the harness: - Directory apps, which is all the apps that the user can install. - Accessible apps, which is what the user actually installed and can be $ inserted and be used by the model. These are extracted from the tools that are loaded through the gateway MCP. Previously we wait for both sets of apps before returning the full apps list. Which causes many issues because accessible apps won't be available to the UI or the model if directory apps aren't loaded or failed to load. In this PR we are separating them so that accessible apps can be loaded separately and are instantly available to be shown in the UI and to be provided in model context. We also added an app-server event so that clients can subscribe to also get accessible apps without being blocked on the full app list. - [x] Separate accessible apps and directory apps loading. - [x] `app/list` request will also emit `app/list/updated` notifications that app-server clients can subscribe. Which allows clients to get accessible apps list to render in the $ menu without being blocked by directory apps. - [x] Cache both accessible and directory apps with 1 hour TTL to avoid reloading them when creating new threads. - [x] TUI improvements to redraw $ menu and /apps menu when app list is updated.	2026-02-08 15:24:56 -08:00
Matthew Zeng	9f1009540b	Upgrade rmcp to 0.14 (#10718 ) - [x] Upgrade rmcp to 0.14	2026-02-08 15:07:53 -08:00
Eric Traut	b3de6c7f2b	Defer persistence of rollout file (#11028 ) - Defer rollout persistence for fresh threads (`InitialHistory::New`): keep rollout events in memory and only materialize rollout file + state DB row on first `EventMsg::UserMessage`. - Keep precomputed rollout path available before materialization. - Change `thread/start` to build thread response from live config snapshot and optional precomputed path. - Improve pre-materialization behavior in app-server/TUI: clearer invalid-request errors for file-backed ops and a friendlier `/fork` “not ready yet” UX. - Update tests to match deferred semantics across start/read/archive/unarchive/fork/resume/review flows. - Improved resilience of user_shell test, which should be unrelated to this change but must be affected by timing changes For Reviewers: * The primary change is in recorder.rs * Most of the other changes were to fix up broken assumptions in existing tests Testing: * Manually tested CLI * Exercised app server paths by manually running IDE Extension with rebuilt CLI binary * Only user-visible change is that `/fork` in TUI generates visible error if used prior to first turn	2026-02-07 23:05:03 -08:00
Charley Cunningham	e6662d6387	app-server: treat null mode developer instructions as built-in defaults (#10983 ) ## Summary - make `turn/start` normalize `collaborationMode.settings.developer_instructions: null` to the built-in instructions for the selected mode - prevent app-server clients from accidentally clearing mode-switch developer instructions by sending `null` - document this behavior in the v2 protocol and app-server docs ## What changed - `codex-rs/app-server/src/codex_message_processor.rs` - added a small `normalize_turn_start_collaboration_mode` helper - in `turn_start`, apply normalization before `OverrideTurnContext` - `codex-rs/app-server/tests/suite/v2/turn_start.rs` - extended `turn_start_accepts_collaboration_mode_override_v2` to assert the outgoing request includes default-mode instruction text when the client sends `developer_instructions: null` - `codex-rs/app-server-protocol/src/protocol/v2.rs` - clarified `TurnStartParams.collaboration_mode` docs: `settings.developer_instructions: null` means use built-in mode instructions - regenerated schema fixture: - `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts` - docs: - `codex-rs/app-server/README.md` - `codex-rs/docs/codex_mcp_interface.md`	2026-02-07 12:59:41 -08:00
Eric Traut	4d52428fa2	Fixed a flaky test (#10970 ) ## Summary Stabilize v2 review integration tests by making them hermetic with respect to model discovery. `app-server` review tests were intermittently timing out in CI (especially on Windows runners) because their test config allowed remote model refresh. During `thread/start`, the test process could issue live `/v1/models` requests, introducing external network latency and nondeterministic timing before review flow assertions. This change disables remote model fetching in the review test config helper used by these tests.	2026-02-06 21:26:26 -08:00

1 2 3 4 5

232 commits