core-agent-ide

Author	SHA1	Message	Date
viyatb-oai	0d1539e74c	fix(linux-sandbox): prefer system /usr/bin/bwrap when available (#14963 ) ## Problem Ubuntu/AppArmor hosts started failing in the default Linux sandbox path after the switch to vendored/default bubblewrap in `0.115.0`. The clearest report is in [#14919](https://github.com/openai/codex/issues/14919), especially [this investigation comment](https://github.com/openai/codex/issues/14919#issuecomment-4076504751): on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or vendored `bwrap` binary fails with errors like `bwrap: setting up uid map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted`. The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict` profile, which grants `userns` access specifically to `/usr/bin/bwrap`. Once Codex started using a vendored/internal bubblewrap path, that path was no longer covered by the distro AppArmor exception, so sandbox namespace setup could fail even when user namespaces were otherwise enabled and `uidmap` was installed. ## What this PR changes - prefer system `/usr/bin/bwrap` whenever it is available - keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is missing - when `/usr/bin/bwrap` is missing, surface a Codex startup warning through the app-server/TUI warning path instead of printing directly from the sandbox helper with `eprintln!` - use the same launcher decision for both the main sandbox execution path and the `/proc` preflight path - document the updated Linux bubblewrap behavior in the Linux sandbox and core READMEs ## Why this fix This still fixes the Ubuntu/AppArmor regression from [#14919](https://github.com/openai/codex/issues/14919), but it keeps the runtime rule simple and platform-agnostic: if the standard system bubblewrap is installed, use it; otherwise fall back to the vendored helper. The warning now follows that same simple rule. If Codex cannot find `/usr/bin/bwrap`, it tells the user that it is falling back to the vendored helper, and it does so through the existing startup warning plumbing that reaches the TUI and app-server instead of low-level sandbox stderr. ## Testing - `cargo test -p codex-linux-sandbox` - `cargo test -p codex-app-server --lib` - `cargo test -p codex-tui-app-server tests::embedded_app_server_start_failure_is_returned` - `cargo clippy -p codex-linux-sandbox --all-targets` - `cargo clippy -p codex-app-server --all-targets` - `cargo clippy -p codex-tui-app-server --all-targets`	2026-03-17 23:05:34 +00:00
Ahmed Ibrahim	98be562fd3	Unify realtime shutdown in core (#14902 ) - route realtime startup, input, and transport failures through a single shutdown path - emit one realtime error/closed lifecycle while clearing session state once --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 15:58:52 -07:00
Ahmed Ibrahim	c6ab4ee537	Gate realtime audio interruption logic to v2 (#14984 ) - thread the realtime version into conversation start and app-server notifications - keep playback-aware mic gating and playback interruption behavior on v2 only, leaving v1 on the legacy path	2026-03-17 15:24:37 -07:00
xl-openai	1a9555eda9	Cleanup skills/remote/xxx endpoints. (#14977 ) Remote skills/remote/xxx as they are not in used for now.	2026-03-17 15:22:36 -07:00
Felipe Coury	43ee72a9b9	fix(tui): implement /mcp inventory for tui_app_server (#14931 ) ## Problem The `/mcp` command did not work in the app-server TUI (remote mode). On `main`, `add_mcp_output()` called `McpManager::effective_servers()` in-process, which only sees locally configured servers, and then emitted a generic stub message for the app-server to handle. In remote usage, that left `/mcp` without a real inventory view. ## Solution Implement `/mcp` for the app-server TUI by fetching MCP server inventory directly from the app-server via the paginated `mcpServerStatus/list` RPC and rendering the results into chat history. The command now follows a three-phase lifecycle: 1. Loading: `ChatWidget::add_mcp_output()` inserts a transient `McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This gives immediate feedback that the command registered. 2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that calls `fetch_all_mcp_server_statuses()` over an app-server request handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded { result }`. 3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell and renders either `new_mcp_tools_output_from_statuses(...)` or an error message. This keeps the main app event loop responsive, so the TUI can repaint before the remote RPC finishes. ## Notes - No `app-server` changes were required. - The rendered inventory includes auth, tools, resources, and resource templates, plus transport details when they are available from local config for display enrichment. - The app-server RPC does not expose authoritative `enabled` or `disabled_reason` state for MCP servers, so the remote `/mcp` view no longer renders a `Status:` row rather than guessing from local config. - RPC failures surface in history as `Failed to load MCP inventory: ...`. ## Tests - `slash_mcp_requests_inventory_via_app_server` - `mcp_inventory_maps_prefix_tool_names_by_server` - `handle_mcp_inventory_result_clears_committed_loading_cell` - `mcp_tools_output_from_statuses_renders_status_only_servers` - `mcp_inventory_loading_snapshot`	2026-03-17 16:11:27 -06:00
Colin Young	0d2ff40a58	Add auth env observability (#14905 ) CXC-410 Emit Env Var Status with `/feedback` report Add more observability on top of #14611 [Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream) [Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream) <img width="1063" height="610" alt="image" src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e" /> ###### Summary - Adds auth-env telemetry that records whether key auth-related env overrides were present on session start and request paths. - Threads those auth-env fields through `/responses`, websocket, and `/models` telemetry and feedback metadata. - Buckets custom provider `env_key` configuration to a safe `"configured"` value instead of emitting raw config text. - Keeps the slice observability-only: no raw token values or raw URLs are emitted. ###### Rationale (from spec findings) - 401 and auth-path debugging needs a way to distinguish env-driven auth paths from sessions with no auth env override. - Startup and model-refresh failures need the same auth-env diagnostics as normal request failures. - Feedback and Sentry tags need the same auth-env signal as OTel events so reports can be triaged consistently. - Custom provider config is user-controlled text, so the telemetry contract must stay presence-only / bucketed. ###### Scope - Adds a small `AuthEnvTelemetry` bundle for env presence collection and threads it through the main request/session telemetry paths. - Does not add endpoint/base-url/provider-header/geo routing attribution or broader telemetry API redesign. ###### Trade-offs - `provider_env_key_name` is bucketed to `"configured"` instead of preserving the literal configured env var name. - `/models` is included because startup/model-refresh auth failures need the same diagnostics, but broader parity work remains out of scope. - This slice keeps the existing telemetry APIs and layers auth-env fields onto them rather than redesigning the metadata model. ###### Client follow-up - Add the separate endpoint/base-url attribution slice if routing-source diagnosis is still needed. - Add provider-header or residency attribution only if auth-env presence proves insufficient in real reports. - Revisit whether any additional auth-related env inputs need safe bucketing after more 401 triage data. ###### Testing - `cargo test -p codex-core emit_feedback_request_tags -- --nocapture` - `cargo test -p codex-core collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture` - `cargo test -p codex-core models_request_telemetry_emits_auth_env_feedback_tags_on_failure -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability -- --nocapture` - `cargo test -p codex-core --no-run --message-format short` - `cargo test -p codex-otel --no-run --message-format short` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 14:26:27 -07:00
pakrym-oai	ee756eb80f	Rename exec_wait tool to wait (#14983 ) Summary - document that code mode only exposes `exec` and the renamed `wait` tool - update code mode tool spec and descriptions to match the new tool name - rename tests and helper references from `exec_wait` to `wait` Testing - Not run (not requested)	2026-03-17 14:22:26 -07:00
iceweasel-oai	2cc4ee413f	temporarily disable private desktop until it works with elevated IPC path (#14986 )	2026-03-17 21:09:57 +00:00
Ahmed Ibrahim	4d9d4b7b0f	Stabilize approval matrix write-file command (#14968 ) ## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 13:52:36 -07:00
Ahmed Ibrahim	23a44ddbe8	Stabilize permissions popup selection tests (#14966 ) ## What is flaky The permissions popup tests in the TUI are flaky, especially on Windows. They assume the popup opens on a specific row and that a fixed number of `Up` or `Down` keypresses will land on a specific preset. They also match popup text too loosely, so a non-selected row can satisfy the assertion. ## Why it was flaky These tests were asserting incidental rendering details rather than the actual selected permission preset. On Windows, the initial selection can differ from non-Windows runs. Some tests also searched the entire popup for text like `Guardian Approvals` or `(current)`, which can match a row that is visible but not selected. Once the popup order or current preset shifted slightly, a test could fail even though the UI behavior was still correct. ## How this PR fixes it This PR adds helpers that identify the selected popup row and selected preset name directly. The tests now assert the current selection by name, navigate to concrete target presets instead of assuming a fixed number of keypresses, and explicitly set the reviewer state in the cases that require `Guardian Approvals` to be current. ## Why this fix fixes the flakiness The assertions now track semantic state, not fragile text placement. Navigation is target-based instead of order-based, so Windows/non-Windows row differences and harmless popup layout changes no longer break the tests. That removes the scheduler- and platform-sensitive assumptions that made the popup suite intermittent. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:45:44 +00:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Matthew Zeng	683c37ce75	[plugins] Support plugin installation elicitation. (#14896 ) It now supports: - Connectors that are from installed and enabled plugins that are not installed yet - Plugins that are on the allowlist that are not installed yet.	2026-03-17 13:19:28 -07:00
Eric Traut	49e7dda2df	Add device-code onboarding and ChatGPT token refresh to app-server TUI (#14952 ) ## Summary - add device-code ChatGPT sign-in to `tui_app_server` onboarding and reuse the existing `chatgptAuthTokens` login path - fall back to browser login when device-code auth is unavailable on the server - treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state during onboarding - add a local ChatGPT auth loader for handing local tokens to the app server and serving refresh requests - handle `account/chatgptAuthTokens/refresh` instead of marking it unsupported, including workspace/account mismatch checks - add focused coverage for onboarding success, existing auth handling, local auth loading, and refresh request behavior ## Testing - `cargo test -p codex-tui-app-server` - `just fix -p codex-tui-app-server`	2026-03-17 14:12:12 -06:00
iceweasel-oai	95bdea93d2	use framed IPC for elevated command runner (#14846 ) ## Summary This is PR 2 of the Windows sandbox runner split. PR 1 introduced the framed IPC runner foundation and related Windows sandbox infrastructure without changing the active elevated one-shot execution path. This PR switches that elevated one-shot path over to the new runner IPC transport and removes the old request-file bootstrap that PR 1 intentionally left in place. After this change, ordinary elevated Windows sandbox commands still behave as one-shot executions, but they now run as the simple case of the same helper/IPC transport that later unified_exec work will build on. ## Why this is needed for unified_exec Windows elevated sandboxed execution crosses a user boundary: the CLI launches a helper as the sandbox user and has to manage command execution from outside that security context. For one-shot commands, the old request-file/bootstrap flow was sufficient. For unified_exec, it is not. Unified_exec needs a long-lived bidirectional channel so the parent can: - send a spawn request - receive structured spawn success/failure - stream stdout and stderr incrementally - eventually support stdin writes, termination, and other session lifecycle events This PR does not add long-lived sessions yet. It converts the existing elevated one-shot path to use the same framed IPC transport so that PR 3 can add unified_exec session semantics on top of a transport that is already exercised by normal elevated command execution. ## Scope This PR: - updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`, and collect framed `Output`/`Exit` messages - removes the old `--request-file=...` execution path from `windows-sandbox-rs/src/elevated/command_runner_win.rs` - keeps the public behavior one-shot: no session reuse or interactive unified_exec behavior is introduced here This PR does not: - add Windows unified_exec session support - add background terminal reuse - add PTY session lifecycle management ## Why Windows needs this and Linux/macOS do not On Linux and macOS, the existing sandbox/process model composes much more directly with long-lived process control. The parent can generally spawn and own the child process (or PTY) directly inside the sandbox model we already use. Windows elevated sandboxing is different. The parent is not directly managing the sandboxed process in the same way; it launches across a different user/security context. That means long-lived control requires an explicit helper process plus IPC for spawn, output, exit, and later stdin/session control. So the extra machinery here is not because unified_exec is conceptually different on Windows. It is because the elevated Windows sandbox boundary requires a helper-mediated transport to support it cleanly. ## Validation - `cargo test -p codex-windows-sandbox`	2026-03-17 11:38:44 -07:00
Keyan Zhang	904dbd414f	generate an internal json schema for `RolloutLine` (#14434 ) ### Why i'm working on something that parses and analyzes codex rollout logs, and i'd like to have a schema for generating a parser/validator. `codex app-server generate-internal-json-schema` writes an `RolloutLine.json` file while doing this, i noticed we have a writer <> reader mismatch issue on `FunctionCallOutputPayload` and reasoning item ID -- added some schemars annotations to fix those ### Test ``` $ just codex app-server generate-internal-json-schema --out ./foo ``` generates an `RolloutLine.json` file, which i validated against jsonl files on disk `just codex app-server --help` doesn't expose the `generate-internal-json-schema` option by default, but you can do `just codex app-server generate-internal-json-schema --help` if you know the command everything else still works --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 11:19:42 -07:00
Ahmed Ibrahim	0d531c05f2	Fix code mode yield startup race (#14959 )	2026-03-17 11:09:12 -07:00
jif-oai	d484bb57d9	feat: add suffix to shell snapshot name (#14938 ) https://github.com/openai/codex/issues/14906	2026-03-17 17:59:27 +00:00
Ahmed Ibrahim	f26ad3c92c	Fix fuzzy search notification buffering in app-server tests (#14955 ) ## What is flaky `codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently loses the expected `fuzzyFileSearch/sessionUpdated` and `fuzzyFileSearch/sessionCompleted` notifications when multiple fuzzy-search sessions are active and CI delivers notifications out of order. ## Why it was flaky The wait helpers were keyed only by JSON-RPC method name. - `wait_for_session_updated` consumed the next `fuzzyFileSearch/sessionUpdated` notification even when it belonged to a different search session. - `wait_for_session_completed` did the same for `fuzzyFileSearch/sessionCompleted`. - Once an unmatched notification was read, it was dropped permanently instead of buffered. - That meant a valid completion for the target search could arrive slightly early, be consumed by the wrong waiter, and disappear before the test started waiting for it. The result depended on notification ordering and runner scheduling instead of on the actual product behavior. ## How this PR fixes it - Add a buffered notification reader in `codex-rs/app-server/tests/common/mcp_process.rs`. - Match fuzzy-search notifications on the identifying payload fields instead of matching only on method name. - Preserve unmatched notifications in the in-process queue so later waiters can still consume them. - Include pending notification methods in timeout failures to make future diagnosis concrete. ## Why this fix fixes the flakiness The test now behaves like a real consumer of an out-of-order event stream: notifications for other sessions stay buffered until the correct waiter asks for them. Reordering no longer loses the target event, so the test result is determined by whether the server emitted the right notifications, not by which one happened to be read first. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 10:52:16 -07:00
Felipe Coury	78e8ee4591	fix(tui): restore remote resume and fork history (#14930 ) ## Problem When the TUI connects to a remote app-server (via WebSocket), resume and fork operations lost all conversation history. `AppServerStartedThread` carried only the `SessionConfigured` event, not the full `Thread` snapshot. After resume or fork, the chat transcript was empty — prior turns were silently discarded. A secondary issue: `primary_session_configured` was not cleared on reset, causing stale session state after reconnection. ## Approach: TUI-side only, zero app-server changes The app-server already returns the full `Thread` object (with populated `turns: Vec<Turn>`) in its `ThreadStartResponse`, `ThreadResumeResponse`, and `ThreadForkResponse`. The data was always there — the TUI was simply throwing it away. The old `AppServerStartedThread` struct only kept the `SessionConfiguredEvent`, discarding the rich turn history that the server had already provided. This PR fixes the problem entirely within `tui_app_server` (3 files changed, 0 changes to `app-server`, `app-server-protocol`, or any other crate). Rather than modifying the server to send history in a different format or adding a new endpoint, the fix preserves the existing `Thread` snapshot and replays it through the TUI's standard event pipeline — making restored sessions indistinguishable from live ones. ## Solution Add a thread snapshot replay path. When the server hands back a `Thread` object (on start, resume, or fork), `restore_started_app_server_thread` converts its historical turns into the same core `Event` sequence the TUI already processes for live interactions, then replays them into the event store so the chat widget renders them. Key changes: - `AppServerStartedThread` now carries the full `Thread` — `started_thread_from_{start,resume,fork}_response` clone the thread into the struct alongside the existing `SessionConfiguredEvent`. - `thread_snapshot_events()` walks the thread's turns and items, producing `TurnStarted` → `ItemCompleted`* → `TurnComplete`/`TurnAborted` event sequences that the TUI already knows how to render. - `restore_started_app_server_thread()` pushes the session event + history events into the thread channel's store, activates the channel, and replays the snapshot — used for initial startup, resume, and fork. - `primary_session_configured` cleared on reset to prevent stale session state after reconnection. ## Tradeoffs - `Thread` is cloned into `AppServerStartedThread`: The full thread snapshot (including all historical turns) is cloned at startup. For long-lived threads this could be large, but it's a one-time cost and avoids lifetime gymnastics with the response. ## Tests - `restore_started_app_server_thread_replays_remote_history` — end-to-end: constructs a `Thread` with one completed turn, restores it, and asserts user/agent messages appear in the transcript. - `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies `thread_snapshot_events` produces the correct event sequence for completed and interrupted turns. ## Test plan - [ ] Verify `cargo check -p codex-tui-app-server` passes - [ ] Verify `cargo test -p codex-tui-app-server` passes - [ ] Manual: connect to a remote app-server, resume an existing thread, confirm history renders in the chat widget - [ ] Manual: fork a thread via remote, confirm prior turns appear	2026-03-17 11:16:08 -06:00
Shijie Rao	8e258eb3f5	Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859 ) ### Summary The goal is for us to get the latest turn model and reasoning effort on thread/resume is no override is provided on the thread/resume func call. This is the part 1 which we write the model and reasoning effort for a thread to the sqlite db and there will be a followup PR to consume the two new fields on thread/resume. [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888) and this one can be merged independently.	2026-03-17 10:14:34 -07:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
jif-oai	e8add54e5d	feat: show effective model in spawn agent event (#14944 ) Show effective model after the full config layering for the sub agent	2026-03-17 16:58:58 +00:00
daveaitel-openai	ef36d39199	Fix agent jobs finalization race and reduce status polling churn (#14843 ) ## Summary - make `report_agent_job_result` atomically transition an item from running to completed while storing `result_json` - remove brittle finalization grace-sleep logic and make finished-item cleanup idempotent - replace blind fixed-interval waiting with status-subscription-based waiting for active worker threads - add state runtime tests for atomic completion and late-report rejection ## Why This addresses the race and polling concerns in #13948 by removing timing-based correctness assumptions and reducing unnecessary status polling churn. ## Validation - `cd codex-rs && just fmt` - `cd codex-rs && cargo test -p codex-state` - `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs` - `cd codex-rs && cargo test` - fails in an unrelated app-server tracing test: `message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children` timed out waiting for response ## Notes - This PR supersedes #14129 with the same agent-jobs fix on a clean branch from `main`. - The earlier PR branch was stacked on unrelated history, which made the review diff include unrelated commits. Fixes #13948	2026-03-17 10:40:14 -04:00
jif-oai	4ed19b0766	feat: rename to get more explicit close agent (#14935 ) https://github.com/openai/codex/issues/14907	2026-03-17 14:37:20 +00:00
jif-oai	31648563c8	feat: centralize package manager version (#14920 )	2026-03-17 12:03:07 +00:00
viyatb-oai	603b6493a9	fix(linux-sandbox): ignore missing writable roots (#14890 ) ## Summary - skip nonexistent `workspace-write` writable roots in the Linux bubblewrap mount builder instead of aborting sandbox startup - keep existing writable roots mounted normally so mixed Windows/WSL configs continue to work - add unit and Linux integration regression coverage for the missing-root case ## Context This addresses regression A from #14875. Regression B will be handled in a separate PR. The old bubblewrap integration added `ensure_mount_targets_exist` as a preflight guard because bubblewrap bind targets must exist, and failing early let Codex return a clearer error than a lower-level mount failure. That policy turned out to be too strict once bubblewrap became the default Linux sandbox: shared Windows/WSL or mixed-platform configs can legitimately contain a well-formed writable root that does not exist on the current machine. This PR keeps bubblewrap's existing-target requirement, but changes Codex to skip missing writable roots instead of treating them as fatal configuration errors.	2026-03-17 00:21:00 -07:00
Eric Traut	d37dcca7e0	Revert tui code so it does not rely on in-process app server (#14899 ) PR https://github.com/openai/codex/pull/14512 added an in-process app server and started to wire up the tui to use it. We were originally planning to modify the `tui` code in place, converting it to use the app server a bit at a time using a hybrid adapter. We've since decided to create an entirely new parallel `tui_app_server` implementation and do the conversion all at once but retain the existing `tui` while we work the bugs out of the new implementation. This PR undoes the changes to the `tui` made in the PR #14512 and restores the old initialization to its previous state. This allows us to modify the `tui_app_server` without the risk of regressing the old `tui` code. For example, we can start to remove support for all legacy core events, like the ones that PR https://github.com/openai/codex/pull/14892 needed to ignore. Testing: * I manually verified that the old `tui` starts and shuts down without a problem.	2026-03-17 00:56:32 -06:00
Eric Traut	57f865c069	Fix tui_app_server: ignore duplicate legacy stream events (#14892 ) The in-process app-server currently emits both typed `ServerNotification`s and legacy `codex/event/*` notifications for the same live turn updates. `tui_app_server` was consuming both paths, so message deltas and completed items could be enqueued twice and rendered as duplicated output in the transcript. Ignore legacy notifications for event types that already have typed (app server) notification handling, while keeping legacy fallback behavior for events that still only arrive on the old path. This preserves compatibility without duplicating streamed commentary or final agent output. We will remove all of the legacy event handlers over time; they're here only during the short window where we're moving the tui to use the app server.	2026-03-17 00:50:25 -06:00
viyatb-oai	db7e02c739	fix: canonicalize symlinked Linux sandbox cwd (#14849 ) ## Problem On Linux, Codex can be launched from a workspace path that is a symlink (for example, a symlinked checkout or a symlinked parent directory). Our sandbox policy intentionally canonicalizes writable/readable roots to the real filesystem path before building the bubblewrap mounts. That part is correct and needed for safety. The remaining bug was that bubblewrap could still inherit the helper process's logical cwd, which might be the symlinked alias instead of the mounted canonical path. In that case, the sandbox starts in a cwd that does not exist inside the sandbox namespace even though the real workspace is mounted. This can cause sandboxed commands to fail in symlinked workspaces. ## Fix This PR keeps the sandbox policy behavior the same, but separates two concepts that were previously conflated: - the canonical cwd used to define sandbox mounts and permissions - the caller's logical cwd used when launching the command On the Linux bubblewrap path, we now thread the logical command cwd through the helper explicitly and only add `--chdir <canonical path>` when the logical cwd differs from the mounted canonical path. That means: - permissions are still computed from canonical paths - bubblewrap starts the command from a cwd that definitely exists inside the sandbox - we do not widen filesystem access or undo the earlier symlink hardening ## Why This Is Safe This is a narrow Linux-only launch fix, not a policy change. - Writable/readable root canonicalization stays intact. - Protected metadata carveouts still operate on canonical roots. - We only override bubblewrap's inherited cwd when the logical path would otherwise point at a symlink alias that is not mounted in the sandbox. ## Tests - kept the existing protocol/core regression coverage for symlink canonicalization - added regression coverage for symlinked cwd handling in the Linux bubblewrap builder/helper path Local validation: - `just fmt` - `cargo test -p codex-protocol` - `cargo test -p codex-core normalize_additional_permissions_canonicalizes_symlinked_write_paths` - `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core --tests -- -D warnings` - `cargo build --bin codex` ## Context This is related to #14694. The earlier writable-root symlink fix addressed the mount/permission side; this PR fixes the remaining symlinked-cwd launch mismatch in the Linux sandbox path.	2026-03-16 22:39:18 -07:00
Ahmed Ibrahim	32e4a5d5d9	[stack 4/4] Reduce realtime self-interruptions during playback (#14827 ) ## Stack Position 4/4. Top-of-stack sibling built on #14830. ## Base - #14830 ## Sibling - #14829 ## Scope - Gate low-level mic chunks while speaker playback is active, while still allowing spoken barge-in. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 05:19:51 +00:00
Ahmed Ibrahim	79f476e47d	[stack 3/4] Add current thread context to realtime startup (#14829 ) ## Stack Position 3/4. Top-of-stack sibling built on #14830. ## Base - #14830 ## Sibling - #14827 ## Scope - Extend the realtime startup context with a bounded summary of the latest thread turns for continuity. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 05:11:05 +00:00
Michael Bolin	15ede607a0	fix: tighten up shell arg quoting in GitHub workflows (#14864 ) Inspired by the work done over in https://github.com/openai/codex-action/pull/74, this tightens up our use of GitHub expressions as shell/environment variables.	2026-03-16 22:01:16 -07:00
Thibault Sottiaux	8e34caffcc	[codex] add Jason as a predefined subagent name (#14881 ) This change adds Jason to codex-core's built-in subagent nickname pool so spawned agents can pick it without any custom role configuration. The default list was simply missing that predefined name (a grave mistake).	2026-03-16 22:01:14 -07:00
xl-openai	e5a28ba0c2	fix: align marketplace display name with existing interface conventions (#14886 ) 1. camelCase for displayName; 2. move displayName under interface.	2026-03-16 21:52:19 -07:00
Ahmed Ibrahim	fbd7f9b986	[stack 2/4] Align main realtime v2 wire and runtime flow (#14830 ) ## Stack Position 2/4. Built on top of #14828. ## Base - #14828 ## Unblocks - #14829 - #14827 ## Scope - Port the realtime v2 wire parsing, session, app-server, and conversation runtime behavior onto the split websocket-method base. - Branch runtime behavior directly on the current realtime session kind instead of parser-derived flow flags. - Keep regression coverage in the existing e2e suites. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-16 21:38:07 -07:00
xl-openai	1d85fe79ed	feat: support remote_sync for plugin install/uninstall. (#14878 ) - Added forceRemoteSync to plugin/install and plugin/uninstall. - With forceRemoteSync=true, we update the remote plugin status first, then apply the local change only if the backend call succeeds. - Kept plugin/list(forceRemoteSync=true) as the main recon path, and for now it treats remote enabled=false as uninstall. We will eventually migrate to plugin/installed for more precise state handling.	2026-03-16 21:37:27 -07:00
xl-openai	49c2b66ece	Add marketplace display names to plugin/list (#14861 ) Add display_name support to marketplace.json.	2026-03-16 19:04:40 -07:00
xl-openai	59533a2c26	skill-creator: default new skills to ~/.codex/skills (#14837 ) ### Motivation - Prevent newly-created skills from being placed in unexpected locations by prompting for an install path and defaulting to a discoverable location so skills are usable immediately. - Make the `skill-creator` instructions explicit about the recommended default (`~/.codex/skills` / `$CODEX_HOME/skills`) so the agent and users follow a consistent, discoverable convention. ### Description - Updated `codex-rs/skills/src/assets/samples/skill-creator/SKILL.md` to add a user prompt: "Where should I create this skill? If you do not have a preference, I will place it in ~/.codex/skills so Codex can discover it automatically.". - Added guidance before running `init_skill.py` that if the user does not specify a location, the agent should default to `~/.codex/skills` (equivalently `$CODEX_HOME/skills`) for auto-discovery. - Updated the `init_skill.py` examples in the same `SKILL.md` to use `~/.codex/skills` as the recommended default while keeping one custom path example. ### Testing - Ran `cargo test -p codex-skills` and the crate's unit test suite passed (`1 passed; 0 failed`). - Verified relevant discovery behavior in code by checking `codex-rs/utils/home-dir/src/lib.rs` (`find_codex_home` defaults to `~/.codex`) and `codex-rs/core/src/skills/loader.rs` (user skill roots include `$CODEX_HOME/skills`). ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69b75a50bb008322a278e55eb0ddccd6)	2026-03-16 18:36:11 -07:00
Michael Bolin	b77fe8fefe	Apply argument comment lint across codex-rs (#14652 ) ## Why Once the repo-local lint exists, `codex-rs` needs to follow the checked-in convention and CI needs to keep it from drifting. This commit applies the fallback `/param/` style consistently across existing positional literal call sites without changing those APIs. The longer-term preference is still to avoid APIs that require comments by choosing clearer parameter types and call shapes. This PR is intentionally the mechanical follow-through for the places where the existing signatures stay in place. After rebasing onto newer `main`, the rollout also had to cover newly introduced `tui_app_server` call sites. That made it clear the first cut of the CI job was too expensive for the common path: it was spending almost as much time installing `cargo-dylint` and re-testing the lint crate as a representative test job spends running product tests. The CI update keeps the full workspace enforcement but trims that extra overhead from ordinary `codex-rs` PRs. ## What changed - keep a dedicated `argument_comment_lint` job in `rust-ci` - mechanically annotate remaining opaque positional literals across `codex-rs` with exact `/param/` comments, including the rebased `tui_app_server` call sites that now fall under the lint - keep the checked-in style aligned with the lint policy by using `/param/` and leaving string and char literals uncommented - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo registry/git metadata in the lint job - split changed-path detection so the lint crate's own `cargo test` step runs only when `tools/argument-comment-lint/` or `rust-ci.yml` changes - continue to run the repo wrapper over the `codex-rs` workspace, so product-code enforcement is unchanged Most of the code changes in this commit are intentionally mechanical comment rewrites or insertions driven by the lint itself. ## Verification - `./tools/argument-comment-lint/run.sh --workspace` - `cargo test -p codex-tui-app-server -p codex-tui` - parsed `.github/workflows/rust-ci.yml` locally with PyYAML --- -> #14652 * #14651	2026-03-16 16:48:15 -07:00
Ahmed Ibrahim	6f05d8d735	[stack 1/4] Split realtime websocket methods by version (#14828 ) ## Stack Position 1/4. Base PR in the realtime stack. ## Base - `main` ## Unblocks - #14830 ## Scope - Split the realtime websocket request builders into `common`, `v1`, and `v2` modules. - Keep runtime behavior unchanged in this PR. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-16 16:00:59 -07:00
pakrym-oai	a3ba10b44b	Add exit helper to code mode scripts (#14851 ) - Summary - expose `exit` through the code mode bridge and module so scripts can stop mid-flight - surface the helper in the description documentation - add a regression test ensuring `exit()` terminates execution cleanly - Testing - Not run (not requested)	2026-03-16 22:07:58 +00:00
iceweasel-oai	d0a693e541	windows-sandbox: add runner IPC foundation for future unified_exec (#14139 ) # Summary This PR introduces the Windows sandbox runner IPC foundation that later unified_exec work will build on. The key point is that this is intentionally infrastructure-only. The new IPC transport, runner plumbing, and ConPTY helpers are added here, but the active elevated Windows sandbox path still uses the existing request-file bootstrap. In other words, this change prepares the transport and module layout we need for unified_exec without switching production behavior over yet. Part of this PR is also a source-layout cleanup: some Windows sandbox files are moved into more explicit `elevated/`, `conpty/`, and shared locations so it is clearer which code is for the elevated sandbox flow, which code is legacy/direct-spawn behavior, and which helpers are shared between them. That reorganization is intentional in this first PR so later behavioral changes do not also have to carry a large amount of file-move churn. # Why This Is Needed For unified_exec Windows elevated sandboxed unified_exec needs a long-lived, bidirectional control channel between the CLI and a helper process running under the sandbox user. That channel has to support: - starting a process and reporting structured spawn success/failure - streaming stdout/stderr back incrementally - forwarding stdin over time - terminating or polling a long-lived process - supporting both pipe-backed and PTY-backed sessions The existing elevated one-shot path is built around a request-file bootstrap and does not provide those primitives cleanly. Before we can turn on Windows sandbox unified_exec, we need the underlying runner protocol and transport layer that can carry those lifecycle events and streams. # Why Windows Needs More Machinery Than Linux Or macOS Linux and macOS can generally build unified_exec on top of the existing sandbox/process model: the parent can spawn the child directly, retain normal ownership of stdio or PTY handles, and manage the lifetime of the sandboxed process without introducing a second control process. Windows elevated sandboxing is different. To run inside the sandbox boundary, we cross into a different user/security context and then need to manage a long-lived process from outside that boundary. That means we need an explicit helper process plus an IPC transport to carry spawn, stdin, output, and exit events back and forth. The extra code here is mostly that missing Windows sandbox infrastructure, not a conceptual difference in unified_exec itself. # What This PR Adds - the framed IPC message types and transport helpers for parent <-> runner communication - the renamed Windows command runner with both the existing request-file bootstrap and the dormant IPC bootstrap - named-pipe helpers for the elevated runner path - ConPTY helpers and process-thread attribute plumbing needed for PTY-backed sessions - shared sandbox/process helpers that later PRs will reuse when switching live execution paths over - early file/module moves so later PRs can focus on behavior rather than layout churn # What This PR Does Not Yet Do - it does not switch the active elevated one-shot path over to IPC yet - it does not enable Windows sandbox unified_exec yet - it does not remove the existing request-file bootstrap yet So while this code compiles and the new path has basic validation, it is not yet the exercised production path. That is intentional for this first PR: the goal here is to land the transport and runner foundation cleanly before later PRs start routing real command execution through it. # Follow-Ups Planned follow-up PRs will: 1. switch elevated one-shot Windows sandbox execution to the new runner IPC path 2. layer Windows sandbox unified_exec sessions on top of the same transport 3. remove the legacy request-file path once the IPC-based path is live # Validation - `cargo build -p codex-windows-sandbox`	2026-03-16 19:45:06 +00:00
Andi Liu	4c9dbc1f88	memories: exclude AGENTS and skills from stage1 input (#14268 ) ###### Why/Context/Summary - Exclude injected AGENTS.md instructions and standalone skill payloads from memory stage 1 inputs so memory generation focuses on conversation content instead of prompt scaffolding. - Strip only the AGENTS fragment from mixed contextual user messages during stage-1 serialization, which preserves environment context in the same message. - Keep subagent notifications in the memory input, and add focused unit coverage for the fragment classifier, rollout policy, and stage-1 serialization path. ###### Test plan - `just fmt` - `cargo test -p codex-core --lib contextual_user_message` - `cargo test -p codex-core --lib rollout::policy` - `cargo test -p codex-core --lib memories::phase1`	2026-03-16 19:30:38 +00:00
Anton Panasenko	663dd3f935	fix(core): fix sanitize name to use '_' everywhere (#14833 )	2026-03-16 12:22:10 -07:00
Eric Traut	a0e41f4ff9	Fixed build failures related to PR 14717 (#14826 )	2026-03-16 12:41:25 -06:00
Jack Mousseau	7a6e30b55b	Use request permission profile in app server (#14665 )	2026-03-16 10:12:23 -07:00
Eric Traut	db89b73a9c	Move TUI on top of app server (parallel code) (#14717 ) This PR replicates the `tui` code directory and creates a temporary parallel `tui_app_server` directory. It also implements a new feature flag `tui_app_server` to select between the two tui implementations. Once the new app-server-based TUI is stabilized, we'll delete the old `tui` directory and feature flag.	2026-03-16 10:49:19 -06:00
jif-oai	c04a0a7454	fix: tui freeze when sub-agents are present (#14816 ) The issue was due to a circular `Drop` schema where the embedded app-server wait for some listeners that wait for this app-server them-selves. The fix is an explicit cleaning Repro: * Start codex * Ask it to spawn a sub-agent * Close Codex * It takes 5s to exit	2026-03-16 16:42:43 +00:00
jif-oai	3f266bcd68	feat: make interrupt state not final for multi-agents (#13850 ) Make `interrupted` an agent state and make it not final. As a result, a `wait` won't return on an interrupted agent and no notification will be send to the parent agent. The rationals are: * If a user interrupt a sub-agent for any reason, you don't want the parent agent to instantaneously ask the sub-agent to restart * If a parent agent interrupt a sub-agent, no need to add a noisy notification in the parent agen	2026-03-16 16:39:40 +00:00
jif-oai	18ad67549c	feat: improve skills cache key to take into account config layering (#14806 ) Fix https://github.com/openai/codex/issues/14161 This fixes sub-agent [[skills.config]] overrides being ignored when parent and child share the same cwd. The root cause was that turn skill loading rebuilt from cwd-only state and reused a cwd-scoped cache, so role-local skill enable/disable overrides did not reliably affect the spawned agent's effective skill set. This change switches turn construction to use the effective per-turn config and adds a config-aware skills cache keyed by skill roots plus final disabled paths.	2026-03-16 16:12:44 +00:00

1 2 3 4 5 ...

4714 commits