core-agent-ide

Author	SHA1	Message	Date
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
viyatb-oai	a3613035f3	Pin setup-zig GitHub Action to immutable SHA (#14858 ) ### Motivation - Pinning the action to an immutable commit SHA reduces the risk of arbitrary code execution in runners with repository access and secrets. ### Description - Replaced `uses: mlugg/setup-zig@v2` with `uses: mlugg/setup-zig@d1434d0886 # v2` in three workflow files. - Updated the following files: ` .github/workflows/rust-ci.yml`, ` .github/workflows/rust-release.yml`, and ` .github/workflows/shell-tool-mcp.yml` to reference the immutable SHA while preserving the original `v2` intent in a trailing comment. ### Testing - No automated tests were run because this is a workflow-only change and does not affect repository source code, so CI validation will occur on the next workflow execution. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69763f570234832d9c67b1b66a27c78d)	2026-03-17 22:40:14 -07:00
Andrei Eternal	6fef421654	[hooks] userpromptsubmit - hook before user's prompt is executed (#14626 ) - this allows blocking the user's prompts from executing, and also prevents them from entering history - handles the edge case where you can both prevent the user's prompt AND add n amount of additionalContexts - refactors some old code into common.rs where hooks overlap functionality - refactors additionalContext being previously added to user messages, instead we use developer messages for them - handles queued messages correctly Sample hook for testing - if you write "[block-user-submit]" this hook will stop the thread: example run ``` › sup • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: sup hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' exactly once near the end. • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory lanterns lit › and [block-user-submit] • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (stopped) warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose. stop: Wizard Tower demo block: remove [block-user-submit] to continue. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py", "timeoutSec": 10, "statusMessage": "reading the observatory notes" } ] } ] } } ``` .codex/hooks/user_prompt_submit_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def prompt_from_payload(payload: dict) -> str: prompt = payload.get("prompt") if isinstance(prompt, str) and prompt.strip(): return prompt.strip() event = payload.get("event") if isinstance(event, dict): user_prompt = event.get("user_prompt") if isinstance(user_prompt, str): return user_prompt.strip() return "" def main() -> int: payload = json.load(sys.stdin) prompt = prompt_from_payload(payload) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" if "[block-user-submit]" in prompt: print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo blocked the prompt on purpose." ), "decision": "block", "reason": ( "Wizard Tower demo block: remove [block-user-submit] to continue." ), } ) ) return 0 prompt_preview = prompt or "(empty prompt)" if len(prompt_preview) > 80: prompt_preview = f"{prompt_preview[:77]}..." print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}" ), "hookSpecificOutput": { "hookEventName": "UserPromptSubmit", "additionalContext": ( "Wizard Tower UserPromptSubmit demo fired. " "For this reply only, include the exact phrase " "'observatory lanterns lit' exactly once near the end." ), }, } ) ) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-17 22:09:22 -07:00
Charley Cunningham	226241f035	Use workspace requirements for guardian prompt override (#14727 ) ## Summary - move `guardian_developer_instructions` from managed config into workspace-managed `requirements.toml` - have guardian continue using the override when present and otherwise fall back to the bundled local guardian prompt - keep the generalized prompt-quality improvements in the shared guardian default prompt - update requirements parsing, layering, schema, and tests for the new source of truth ## Context This replaces the earlier managed-config / MDM rollout plan. The intended rollout path is workspace-managed requirements, including cloud enterprise policies, rather than backend model metadata, Statsig, or Jamf-managed config. That keeps the default/fallback behavior local to `codex-rs` while allowing faster policy updates through the enterprise requirements plane. This is intentionally an admin-managed policy input, not a user preference: the guardian prompt should come either from the bundled `codex-rs` default or from enterprise-managed `requirements.toml`, and normal user/project/session config should not override it. ## Updating The OpenAI Prompt After this lands, the OpenAI-specific guardian prompt should be updated through the workspace Policies UI at `/codex/settings/policies` rather than through Jamf or codex-backend model metadata. Operationally: - open the workspace Policies editor as a Codex admin - edit the default `requirements.toml` policy, or a higher-precedence group-scoped override if we ever want different behavior for a subset of users - set `guardian_developer_instructions = """..."""` to the full OpenAI-specific guardian prompt text - save the policy; codex-backend stores the raw TOML and `codex-rs` fetches the effective requirements file from `/wham/config/requirements` When updating the OpenAI-specific prompt, keep it aligned with the shared default guardian policy in `codex-rs` except for intentional OpenAI-only additions. ## Testing - `cargo check --tests -p codex-core -p codex-config -p codex-cloud-requirements --message-format short` - `cargo run -p codex-core --bin codex-write-config-schema` - `cargo fmt` - `git diff --check` Co-authored-by: Codex <noreply@openai.com>	2026-03-17 22:05:41 -07:00
Ahmed Ibrahim	3ce879c646	Handle realtime conversation end in the TUI (#14903 ) - close live realtime sessions on errors, ctrl-c, and active meter removal - centralize TUI realtime cleanup and avoid duplicate follow-up close info --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 21:04:58 -07:00
pakrym-oai	770616414a	Prefer websockets when providers support them (#13592 ) Remove all flags and model settings. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 19:46:44 -07:00
viyatb-oai	d950543e65	feat: support restricted ReadOnlyAccess in elevated Windows sandbox (#14610 ) ## Summary - support legacy `ReadOnlyAccess::Restricted` on Windows in the elevated setup/runner backend - keep the unelevated restricted-token backend on the legacy full-read model only, and fail closed for restricted read-only policies there - keep the legacy full-read Windows path unchanged while deriving narrower read roots only for elevated restricted-read policies - honor `include_platform_defaults` by adding backend-managed Windows system roots only when requested, while always keeping helper roots and the command `cwd` readable - preserve `workspace-write` semantics by keeping writable roots readable when restricted read access is in use in the elevated backend - document the current Windows boundary: legacy `SandboxPolicy` is supported on both backends, while richer split-only carveouts still fail closed instead of running with weaker enforcement ## Testing - `cargo test -p codex-windows-sandbox` - `cargo check -p codex-windows-sandbox --tests --target x86_64-pc-windows-msvc` - `cargo clippy -p codex-windows-sandbox --tests --target x86_64-pc-windows-msvc -- -D warnings` - `cargo test -p codex-core windows_restricted_token_` ## Notes - local `cargo test -p codex-windows-sandbox` on macOS only exercises the non-Windows stubs; the Windows-targeted compile and clippy runs provide the local signal, and GitHub Windows CI exercises the runtime path	2026-03-17 19:08:50 -07:00
viyatb-oai	6fe8a05dcb	fix: honor active permission profiles in sandbox debug (#14293 ) ## Summary - stop `codex sandbox` from forcing legacy `sandbox_mode` when active `[permissions]` profiles are configured - keep the legacy `read-only` / `workspace-write` fallback for legacy configs and reject `--full-auto` for profile-based configs - use split filesystem and network policies in the macOS/Linux debug sandbox helpers and add regressions for the config-loading behavior assuming "codex/docs/private/secret.txt" = "none" ``` codex -c 'default_permissions="limited-read-test"' sandbox macos -- <command> ... codex sandbox macos -- cat codex/docs/private/secret.txt >/dev/null; echo EXIT:$? cat: codex/docs/private/secret.txt: Operation not permitted EXIT:1 ``` --------- Co-authored-by: celia-oai <celia@openai.com>	2026-03-18 01:52:02 +00:00
pakrym-oai	83a60fdb94	Add FS abstraction and use in view_image (#14960 ) Adds an environment crate and environment + file system abstraction. Environment is a combination of attributes and services specific to environment the agent is connected to: File system, process management, OS, default shell. The goal is to move most of agent logic that assumes environment to work through the environment abstraction.	2026-03-17 17:36:23 -07:00
Max Johnson	19b887128e	app-server: reject websocket requests with Origin headers (#14995 ) Reject websocket requests that carry an `Origin` header	2026-03-18 00:24:53 +00:00
xl-openai	a5d3114e97	feat: Add product-aware plugin policies and clean up manifest naming (#14993 ) - Add shared Product support to marketplace plugin policy and skill policy (no enforced yet). - Move marketplace installation/authentication under policy and model it as MarketplacePluginPolicy. - Rename plugin/marketplace local manifest types to separate raw serde shapes from resolved in-memory models.	2026-03-17 17:01:34 -07:00
Shaqayeq	fc75d07504	Add Python SDK public API and examples (#14446 ) ## TL;DR WIP esp the examples Thin the Python SDK public surface so the wrapper layer returns canonical app-server generated models directly. - keeps `Codex` / `AsyncCodex` / `Thread` / `Turn` and input helpers, but removes alias-only type layers and custom result models - `metadata` now returns `InitializeResponse` and `run()` returns the generated app-server `Turn` - updates docs, examples, notebook, and tests to use canonical generated types and regenerates `v2_all.py` against current schema - keeps the pinned runtime-package integration flow and real integration coverage ## Validation - `PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests` - `GH_TOKEN="$(gh auth token)" RUN_REAL_CODEX_TESTS=1 PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests -rs` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 16:05:56 -07:00
viyatb-oai	0d1539e74c	fix(linux-sandbox): prefer system /usr/bin/bwrap when available (#14963 ) ## Problem Ubuntu/AppArmor hosts started failing in the default Linux sandbox path after the switch to vendored/default bubblewrap in `0.115.0`. The clearest report is in [#14919](https://github.com/openai/codex/issues/14919), especially [this investigation comment](https://github.com/openai/codex/issues/14919#issuecomment-4076504751): on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or vendored `bwrap` binary fails with errors like `bwrap: setting up uid map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted`. The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict` profile, which grants `userns` access specifically to `/usr/bin/bwrap`. Once Codex started using a vendored/internal bubblewrap path, that path was no longer covered by the distro AppArmor exception, so sandbox namespace setup could fail even when user namespaces were otherwise enabled and `uidmap` was installed. ## What this PR changes - prefer system `/usr/bin/bwrap` whenever it is available - keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is missing - when `/usr/bin/bwrap` is missing, surface a Codex startup warning through the app-server/TUI warning path instead of printing directly from the sandbox helper with `eprintln!` - use the same launcher decision for both the main sandbox execution path and the `/proc` preflight path - document the updated Linux bubblewrap behavior in the Linux sandbox and core READMEs ## Why this fix This still fixes the Ubuntu/AppArmor regression from [#14919](https://github.com/openai/codex/issues/14919), but it keeps the runtime rule simple and platform-agnostic: if the standard system bubblewrap is installed, use it; otherwise fall back to the vendored helper. The warning now follows that same simple rule. If Codex cannot find `/usr/bin/bwrap`, it tells the user that it is falling back to the vendored helper, and it does so through the existing startup warning plumbing that reaches the TUI and app-server instead of low-level sandbox stderr. ## Testing - `cargo test -p codex-linux-sandbox` - `cargo test -p codex-app-server --lib` - `cargo test -p codex-tui-app-server tests::embedded_app_server_start_failure_is_returned` - `cargo clippy -p codex-linux-sandbox --all-targets` - `cargo clippy -p codex-app-server --all-targets` - `cargo clippy -p codex-tui-app-server --all-targets`	2026-03-17 23:05:34 +00:00
Ahmed Ibrahim	98be562fd3	Unify realtime shutdown in core (#14902 ) - route realtime startup, input, and transport failures through a single shutdown path - emit one realtime error/closed lifecycle while clearing session state once --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 15:58:52 -07:00
Ahmed Ibrahim	c6ab4ee537	Gate realtime audio interruption logic to v2 (#14984 ) - thread the realtime version into conversation start and app-server notifications - keep playback-aware mic gating and playback interruption behavior on v2 only, leaving v1 on the legacy path	2026-03-17 15:24:37 -07:00
xl-openai	1a9555eda9	Cleanup skills/remote/xxx endpoints. (#14977 ) Remote skills/remote/xxx as they are not in used for now.	2026-03-17 15:22:36 -07:00
Felipe Coury	43ee72a9b9	fix(tui): implement /mcp inventory for tui_app_server (#14931 ) ## Problem The `/mcp` command did not work in the app-server TUI (remote mode). On `main`, `add_mcp_output()` called `McpManager::effective_servers()` in-process, which only sees locally configured servers, and then emitted a generic stub message for the app-server to handle. In remote usage, that left `/mcp` without a real inventory view. ## Solution Implement `/mcp` for the app-server TUI by fetching MCP server inventory directly from the app-server via the paginated `mcpServerStatus/list` RPC and rendering the results into chat history. The command now follows a three-phase lifecycle: 1. Loading: `ChatWidget::add_mcp_output()` inserts a transient `McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This gives immediate feedback that the command registered. 2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that calls `fetch_all_mcp_server_statuses()` over an app-server request handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded { result }`. 3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell and renders either `new_mcp_tools_output_from_statuses(...)` or an error message. This keeps the main app event loop responsive, so the TUI can repaint before the remote RPC finishes. ## Notes - No `app-server` changes were required. - The rendered inventory includes auth, tools, resources, and resource templates, plus transport details when they are available from local config for display enrichment. - The app-server RPC does not expose authoritative `enabled` or `disabled_reason` state for MCP servers, so the remote `/mcp` view no longer renders a `Status:` row rather than guessing from local config. - RPC failures surface in history as `Failed to load MCP inventory: ...`. ## Tests - `slash_mcp_requests_inventory_via_app_server` - `mcp_inventory_maps_prefix_tool_names_by_server` - `handle_mcp_inventory_result_clears_committed_loading_cell` - `mcp_tools_output_from_statuses_renders_status_only_servers` - `mcp_inventory_loading_snapshot`	2026-03-17 16:11:27 -06:00
Colin Young	0d2ff40a58	Add auth env observability (#14905 ) CXC-410 Emit Env Var Status with `/feedback` report Add more observability on top of #14611 [Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream) [Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream) <img width="1063" height="610" alt="image" src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e" /> ###### Summary - Adds auth-env telemetry that records whether key auth-related env overrides were present on session start and request paths. - Threads those auth-env fields through `/responses`, websocket, and `/models` telemetry and feedback metadata. - Buckets custom provider `env_key` configuration to a safe `"configured"` value instead of emitting raw config text. - Keeps the slice observability-only: no raw token values or raw URLs are emitted. ###### Rationale (from spec findings) - 401 and auth-path debugging needs a way to distinguish env-driven auth paths from sessions with no auth env override. - Startup and model-refresh failures need the same auth-env diagnostics as normal request failures. - Feedback and Sentry tags need the same auth-env signal as OTel events so reports can be triaged consistently. - Custom provider config is user-controlled text, so the telemetry contract must stay presence-only / bucketed. ###### Scope - Adds a small `AuthEnvTelemetry` bundle for env presence collection and threads it through the main request/session telemetry paths. - Does not add endpoint/base-url/provider-header/geo routing attribution or broader telemetry API redesign. ###### Trade-offs - `provider_env_key_name` is bucketed to `"configured"` instead of preserving the literal configured env var name. - `/models` is included because startup/model-refresh auth failures need the same diagnostics, but broader parity work remains out of scope. - This slice keeps the existing telemetry APIs and layers auth-env fields onto them rather than redesigning the metadata model. ###### Client follow-up - Add the separate endpoint/base-url attribution slice if routing-source diagnosis is still needed. - Add provider-header or residency attribution only if auth-env presence proves insufficient in real reports. - Revisit whether any additional auth-related env inputs need safe bucketing after more 401 triage data. ###### Testing - `cargo test -p codex-core emit_feedback_request_tags -- --nocapture` - `cargo test -p codex-core collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture` - `cargo test -p codex-core models_request_telemetry_emits_auth_env_feedback_tags_on_failure -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability -- --nocapture` - `cargo test -p codex-core --no-run --message-format short` - `cargo test -p codex-otel --no-run --message-format short` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 14:26:27 -07:00
pakrym-oai	ee756eb80f	Rename exec_wait tool to wait (#14983 ) Summary - document that code mode only exposes `exec` and the renamed `wait` tool - update code mode tool spec and descriptions to match the new tool name - rename tests and helper references from `exec_wait` to `wait` Testing - Not run (not requested)	2026-03-17 14:22:26 -07:00
iceweasel-oai	2cc4ee413f	temporarily disable private desktop until it works with elevated IPC path (#14986 )	2026-03-17 21:09:57 +00:00
Ahmed Ibrahim	4d9d4b7b0f	Stabilize approval matrix write-file command (#14968 ) ## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 13:52:36 -07:00
Ahmed Ibrahim	23a44ddbe8	Stabilize permissions popup selection tests (#14966 ) ## What is flaky The permissions popup tests in the TUI are flaky, especially on Windows. They assume the popup opens on a specific row and that a fixed number of `Up` or `Down` keypresses will land on a specific preset. They also match popup text too loosely, so a non-selected row can satisfy the assertion. ## Why it was flaky These tests were asserting incidental rendering details rather than the actual selected permission preset. On Windows, the initial selection can differ from non-Windows runs. Some tests also searched the entire popup for text like `Guardian Approvals` or `(current)`, which can match a row that is visible but not selected. Once the popup order or current preset shifted slightly, a test could fail even though the UI behavior was still correct. ## How this PR fixes it This PR adds helpers that identify the selected popup row and selected preset name directly. The tests now assert the current selection by name, navigate to concrete target presets instead of assuming a fixed number of keypresses, and explicitly set the reviewer state in the cases that require `Guardian Approvals` to be current. ## Why this fix fixes the flakiness The assertions now track semantic state, not fragile text placement. Navigation is target-based instead of order-based, so Windows/non-Windows row differences and harmless popup layout changes no longer break the tests. That removes the scheduler- and platform-sensitive assumptions that made the popup suite intermittent. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:45:44 +00:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Matthew Zeng	683c37ce75	[plugins] Support plugin installation elicitation. (#14896 ) It now supports: - Connectors that are from installed and enabled plugins that are not installed yet - Plugins that are on the allowlist that are not installed yet.	2026-03-17 13:19:28 -07:00
Eric Traut	49e7dda2df	Add device-code onboarding and ChatGPT token refresh to app-server TUI (#14952 ) ## Summary - add device-code ChatGPT sign-in to `tui_app_server` onboarding and reuse the existing `chatgptAuthTokens` login path - fall back to browser login when device-code auth is unavailable on the server - treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state during onboarding - add a local ChatGPT auth loader for handing local tokens to the app server and serving refresh requests - handle `account/chatgptAuthTokens/refresh` instead of marking it unsupported, including workspace/account mismatch checks - add focused coverage for onboarding success, existing auth handling, local auth loading, and refresh request behavior ## Testing - `cargo test -p codex-tui-app-server` - `just fix -p codex-tui-app-server`	2026-03-17 14:12:12 -06:00
iceweasel-oai	95bdea93d2	use framed IPC for elevated command runner (#14846 ) ## Summary This is PR 2 of the Windows sandbox runner split. PR 1 introduced the framed IPC runner foundation and related Windows sandbox infrastructure without changing the active elevated one-shot execution path. This PR switches that elevated one-shot path over to the new runner IPC transport and removes the old request-file bootstrap that PR 1 intentionally left in place. After this change, ordinary elevated Windows sandbox commands still behave as one-shot executions, but they now run as the simple case of the same helper/IPC transport that later unified_exec work will build on. ## Why this is needed for unified_exec Windows elevated sandboxed execution crosses a user boundary: the CLI launches a helper as the sandbox user and has to manage command execution from outside that security context. For one-shot commands, the old request-file/bootstrap flow was sufficient. For unified_exec, it is not. Unified_exec needs a long-lived bidirectional channel so the parent can: - send a spawn request - receive structured spawn success/failure - stream stdout and stderr incrementally - eventually support stdin writes, termination, and other session lifecycle events This PR does not add long-lived sessions yet. It converts the existing elevated one-shot path to use the same framed IPC transport so that PR 3 can add unified_exec session semantics on top of a transport that is already exercised by normal elevated command execution. ## Scope This PR: - updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`, and collect framed `Output`/`Exit` messages - removes the old `--request-file=...` execution path from `windows-sandbox-rs/src/elevated/command_runner_win.rs` - keeps the public behavior one-shot: no session reuse or interactive unified_exec behavior is introduced here This PR does not: - add Windows unified_exec session support - add background terminal reuse - add PTY session lifecycle management ## Why Windows needs this and Linux/macOS do not On Linux and macOS, the existing sandbox/process model composes much more directly with long-lived process control. The parent can generally spawn and own the child process (or PTY) directly inside the sandbox model we already use. Windows elevated sandboxing is different. The parent is not directly managing the sandboxed process in the same way; it launches across a different user/security context. That means long-lived control requires an explicit helper process plus IPC for spawn, output, exit, and later stdin/session control. So the extra machinery here is not because unified_exec is conceptually different on Windows. It is because the elevated Windows sandbox boundary requires a helper-mediated transport to support it cleanly. ## Validation - `cargo test -p codex-windows-sandbox`	2026-03-17 11:38:44 -07:00
Keyan Zhang	904dbd414f	generate an internal json schema for `RolloutLine` (#14434 ) ### Why i'm working on something that parses and analyzes codex rollout logs, and i'd like to have a schema for generating a parser/validator. `codex app-server generate-internal-json-schema` writes an `RolloutLine.json` file while doing this, i noticed we have a writer <> reader mismatch issue on `FunctionCallOutputPayload` and reasoning item ID -- added some schemars annotations to fix those ### Test ``` $ just codex app-server generate-internal-json-schema --out ./foo ``` generates an `RolloutLine.json` file, which i validated against jsonl files on disk `just codex app-server --help` doesn't expose the `generate-internal-json-schema` option by default, but you can do `just codex app-server generate-internal-json-schema --help` if you know the command everything else still works --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 11:19:42 -07:00
Ahmed Ibrahim	0d531c05f2	Fix code mode yield startup race (#14959 )	2026-03-17 11:09:12 -07:00
jif-oai	d484bb57d9	feat: add suffix to shell snapshot name (#14938 ) https://github.com/openai/codex/issues/14906	2026-03-17 17:59:27 +00:00
Ahmed Ibrahim	f26ad3c92c	Fix fuzzy search notification buffering in app-server tests (#14955 ) ## What is flaky `codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently loses the expected `fuzzyFileSearch/sessionUpdated` and `fuzzyFileSearch/sessionCompleted` notifications when multiple fuzzy-search sessions are active and CI delivers notifications out of order. ## Why it was flaky The wait helpers were keyed only by JSON-RPC method name. - `wait_for_session_updated` consumed the next `fuzzyFileSearch/sessionUpdated` notification even when it belonged to a different search session. - `wait_for_session_completed` did the same for `fuzzyFileSearch/sessionCompleted`. - Once an unmatched notification was read, it was dropped permanently instead of buffered. - That meant a valid completion for the target search could arrive slightly early, be consumed by the wrong waiter, and disappear before the test started waiting for it. The result depended on notification ordering and runner scheduling instead of on the actual product behavior. ## How this PR fixes it - Add a buffered notification reader in `codex-rs/app-server/tests/common/mcp_process.rs`. - Match fuzzy-search notifications on the identifying payload fields instead of matching only on method name. - Preserve unmatched notifications in the in-process queue so later waiters can still consume them. - Include pending notification methods in timeout failures to make future diagnosis concrete. ## Why this fix fixes the flakiness The test now behaves like a real consumer of an out-of-order event stream: notifications for other sessions stay buffered until the correct waiter asks for them. Reordering no longer loses the target event, so the test result is determined by whether the server emitted the right notifications, not by which one happened to be read first. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 10:52:16 -07:00
Felipe Coury	78e8ee4591	fix(tui): restore remote resume and fork history (#14930 ) ## Problem When the TUI connects to a remote app-server (via WebSocket), resume and fork operations lost all conversation history. `AppServerStartedThread` carried only the `SessionConfigured` event, not the full `Thread` snapshot. After resume or fork, the chat transcript was empty — prior turns were silently discarded. A secondary issue: `primary_session_configured` was not cleared on reset, causing stale session state after reconnection. ## Approach: TUI-side only, zero app-server changes The app-server already returns the full `Thread` object (with populated `turns: Vec<Turn>`) in its `ThreadStartResponse`, `ThreadResumeResponse`, and `ThreadForkResponse`. The data was always there — the TUI was simply throwing it away. The old `AppServerStartedThread` struct only kept the `SessionConfiguredEvent`, discarding the rich turn history that the server had already provided. This PR fixes the problem entirely within `tui_app_server` (3 files changed, 0 changes to `app-server`, `app-server-protocol`, or any other crate). Rather than modifying the server to send history in a different format or adding a new endpoint, the fix preserves the existing `Thread` snapshot and replays it through the TUI's standard event pipeline — making restored sessions indistinguishable from live ones. ## Solution Add a thread snapshot replay path. When the server hands back a `Thread` object (on start, resume, or fork), `restore_started_app_server_thread` converts its historical turns into the same core `Event` sequence the TUI already processes for live interactions, then replays them into the event store so the chat widget renders them. Key changes: - `AppServerStartedThread` now carries the full `Thread` — `started_thread_from_{start,resume,fork}_response` clone the thread into the struct alongside the existing `SessionConfiguredEvent`. - `thread_snapshot_events()` walks the thread's turns and items, producing `TurnStarted` → `ItemCompleted`* → `TurnComplete`/`TurnAborted` event sequences that the TUI already knows how to render. - `restore_started_app_server_thread()` pushes the session event + history events into the thread channel's store, activates the channel, and replays the snapshot — used for initial startup, resume, and fork. - `primary_session_configured` cleared on reset to prevent stale session state after reconnection. ## Tradeoffs - `Thread` is cloned into `AppServerStartedThread`: The full thread snapshot (including all historical turns) is cloned at startup. For long-lived threads this could be large, but it's a one-time cost and avoids lifetime gymnastics with the response. ## Tests - `restore_started_app_server_thread_replays_remote_history` — end-to-end: constructs a `Thread` with one completed turn, restores it, and asserts user/agent messages appear in the transcript. - `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies `thread_snapshot_events` produces the correct event sequence for completed and interrupted turns. ## Test plan - [ ] Verify `cargo check -p codex-tui-app-server` passes - [ ] Verify `cargo test -p codex-tui-app-server` passes - [ ] Manual: connect to a remote app-server, resume an existing thread, confirm history renders in the chat widget - [ ] Manual: fork a thread via remote, confirm prior turns appear	2026-03-17 11:16:08 -06:00
Shijie Rao	8e258eb3f5	Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859 ) ### Summary The goal is for us to get the latest turn model and reasoning effort on thread/resume is no override is provided on the thread/resume func call. This is the part 1 which we write the model and reasoning effort for a thread to the sqlite db and there will be a followup PR to consume the two new fields on thread/resume. [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888) and this one can be merged independently.	2026-03-17 10:14:34 -07:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
jif-oai	e8add54e5d	feat: show effective model in spawn agent event (#14944 ) Show effective model after the full config layering for the sub agent	2026-03-17 16:58:58 +00:00
daveaitel-openai	ef36d39199	Fix agent jobs finalization race and reduce status polling churn (#14843 ) ## Summary - make `report_agent_job_result` atomically transition an item from running to completed while storing `result_json` - remove brittle finalization grace-sleep logic and make finished-item cleanup idempotent - replace blind fixed-interval waiting with status-subscription-based waiting for active worker threads - add state runtime tests for atomic completion and late-report rejection ## Why This addresses the race and polling concerns in #13948 by removing timing-based correctness assumptions and reducing unnecessary status polling churn. ## Validation - `cd codex-rs && just fmt` - `cd codex-rs && cargo test -p codex-state` - `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs` - `cd codex-rs && cargo test` - fails in an unrelated app-server tracing test: `message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children` timed out waiting for response ## Notes - This PR supersedes #14129 with the same agent-jobs fix on a clean branch from `main`. - The earlier PR branch was stacked on unrelated history, which made the review diff include unrelated commits. Fixes #13948	2026-03-17 10:40:14 -04:00
jif-oai	4ed19b0766	feat: rename to get more explicit close agent (#14935 ) https://github.com/openai/codex/issues/14907	2026-03-17 14:37:20 +00:00
jif-oai	31648563c8	feat: centralize package manager version (#14920 )	2026-03-17 12:03:07 +00:00
viyatb-oai	603b6493a9	fix(linux-sandbox): ignore missing writable roots (#14890 ) ## Summary - skip nonexistent `workspace-write` writable roots in the Linux bubblewrap mount builder instead of aborting sandbox startup - keep existing writable roots mounted normally so mixed Windows/WSL configs continue to work - add unit and Linux integration regression coverage for the missing-root case ## Context This addresses regression A from #14875. Regression B will be handled in a separate PR. The old bubblewrap integration added `ensure_mount_targets_exist` as a preflight guard because bubblewrap bind targets must exist, and failing early let Codex return a clearer error than a lower-level mount failure. That policy turned out to be too strict once bubblewrap became the default Linux sandbox: shared Windows/WSL or mixed-platform configs can legitimately contain a well-formed writable root that does not exist on the current machine. This PR keeps bubblewrap's existing-target requirement, but changes Codex to skip missing writable roots instead of treating them as fatal configuration errors.	2026-03-17 00:21:00 -07:00
Eric Traut	d37dcca7e0	Revert tui code so it does not rely on in-process app server (#14899 ) PR https://github.com/openai/codex/pull/14512 added an in-process app server and started to wire up the tui to use it. We were originally planning to modify the `tui` code in place, converting it to use the app server a bit at a time using a hybrid adapter. We've since decided to create an entirely new parallel `tui_app_server` implementation and do the conversion all at once but retain the existing `tui` while we work the bugs out of the new implementation. This PR undoes the changes to the `tui` made in the PR #14512 and restores the old initialization to its previous state. This allows us to modify the `tui_app_server` without the risk of regressing the old `tui` code. For example, we can start to remove support for all legacy core events, like the ones that PR https://github.com/openai/codex/pull/14892 needed to ignore. Testing: * I manually verified that the old `tui` starts and shuts down without a problem.	2026-03-17 00:56:32 -06:00
Eric Traut	57f865c069	Fix tui_app_server: ignore duplicate legacy stream events (#14892 ) The in-process app-server currently emits both typed `ServerNotification`s and legacy `codex/event/*` notifications for the same live turn updates. `tui_app_server` was consuming both paths, so message deltas and completed items could be enqueued twice and rendered as duplicated output in the transcript. Ignore legacy notifications for event types that already have typed (app server) notification handling, while keeping legacy fallback behavior for events that still only arrive on the old path. This preserves compatibility without duplicating streamed commentary or final agent output. We will remove all of the legacy event handlers over time; they're here only during the short window where we're moving the tui to use the app server.	2026-03-17 00:50:25 -06:00
viyatb-oai	db7e02c739	fix: canonicalize symlinked Linux sandbox cwd (#14849 ) ## Problem On Linux, Codex can be launched from a workspace path that is a symlink (for example, a symlinked checkout or a symlinked parent directory). Our sandbox policy intentionally canonicalizes writable/readable roots to the real filesystem path before building the bubblewrap mounts. That part is correct and needed for safety. The remaining bug was that bubblewrap could still inherit the helper process's logical cwd, which might be the symlinked alias instead of the mounted canonical path. In that case, the sandbox starts in a cwd that does not exist inside the sandbox namespace even though the real workspace is mounted. This can cause sandboxed commands to fail in symlinked workspaces. ## Fix This PR keeps the sandbox policy behavior the same, but separates two concepts that were previously conflated: - the canonical cwd used to define sandbox mounts and permissions - the caller's logical cwd used when launching the command On the Linux bubblewrap path, we now thread the logical command cwd through the helper explicitly and only add `--chdir <canonical path>` when the logical cwd differs from the mounted canonical path. That means: - permissions are still computed from canonical paths - bubblewrap starts the command from a cwd that definitely exists inside the sandbox - we do not widen filesystem access or undo the earlier symlink hardening ## Why This Is Safe This is a narrow Linux-only launch fix, not a policy change. - Writable/readable root canonicalization stays intact. - Protected metadata carveouts still operate on canonical roots. - We only override bubblewrap's inherited cwd when the logical path would otherwise point at a symlink alias that is not mounted in the sandbox. ## Tests - kept the existing protocol/core regression coverage for symlink canonicalization - added regression coverage for symlinked cwd handling in the Linux bubblewrap builder/helper path Local validation: - `just fmt` - `cargo test -p codex-protocol` - `cargo test -p codex-core normalize_additional_permissions_canonicalizes_symlinked_write_paths` - `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core --tests -- -D warnings` - `cargo build --bin codex` ## Context This is related to #14694. The earlier writable-root symlink fix addressed the mount/permission side; this PR fixes the remaining symlinked-cwd launch mismatch in the Linux sandbox path.	2026-03-16 22:39:18 -07:00
Ahmed Ibrahim	32e4a5d5d9	[stack 4/4] Reduce realtime self-interruptions during playback (#14827 ) ## Stack Position 4/4. Top-of-stack sibling built on #14830. ## Base - #14830 ## Sibling - #14829 ## Scope - Gate low-level mic chunks while speaker playback is active, while still allowing spoken barge-in. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 05:19:51 +00:00
Ahmed Ibrahim	79f476e47d	[stack 3/4] Add current thread context to realtime startup (#14829 ) ## Stack Position 3/4. Top-of-stack sibling built on #14830. ## Base - #14830 ## Sibling - #14827 ## Scope - Extend the realtime startup context with a bounded summary of the latest thread turns for continuity. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 05:11:05 +00:00
Michael Bolin	15ede607a0	fix: tighten up shell arg quoting in GitHub workflows (#14864 ) Inspired by the work done over in https://github.com/openai/codex-action/pull/74, this tightens up our use of GitHub expressions as shell/environment variables.	2026-03-16 22:01:16 -07:00
Thibault Sottiaux	8e34caffcc	[codex] add Jason as a predefined subagent name (#14881 ) This change adds Jason to codex-core's built-in subagent nickname pool so spawned agents can pick it without any custom role configuration. The default list was simply missing that predefined name (a grave mistake).	2026-03-16 22:01:14 -07:00
xl-openai	e5a28ba0c2	fix: align marketplace display name with existing interface conventions (#14886 ) 1. camelCase for displayName; 2. move displayName under interface.	2026-03-16 21:52:19 -07:00
Ahmed Ibrahim	fbd7f9b986	[stack 2/4] Align main realtime v2 wire and runtime flow (#14830 ) ## Stack Position 2/4. Built on top of #14828. ## Base - #14828 ## Unblocks - #14829 - #14827 ## Scope - Port the realtime v2 wire parsing, session, app-server, and conversation runtime behavior onto the split websocket-method base. - Branch runtime behavior directly on the current realtime session kind instead of parser-derived flow flags. - Keep regression coverage in the existing e2e suites. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-16 21:38:07 -07:00
xl-openai	1d85fe79ed	feat: support remote_sync for plugin install/uninstall. (#14878 ) - Added forceRemoteSync to plugin/install and plugin/uninstall. - With forceRemoteSync=true, we update the remote plugin status first, then apply the local change only if the backend call succeeds. - Kept plugin/list(forceRemoteSync=true) as the main recon path, and for now it treats remote enabled=false as uninstall. We will eventually migrate to plugin/installed for more precise state handling.	2026-03-16 21:37:27 -07:00
xl-openai	49c2b66ece	Add marketplace display names to plugin/list (#14861 ) Add display_name support to marketplace.json.	2026-03-16 19:04:40 -07:00
xl-openai	59533a2c26	skill-creator: default new skills to ~/.codex/skills (#14837 ) ### Motivation - Prevent newly-created skills from being placed in unexpected locations by prompting for an install path and defaulting to a discoverable location so skills are usable immediately. - Make the `skill-creator` instructions explicit about the recommended default (`~/.codex/skills` / `$CODEX_HOME/skills`) so the agent and users follow a consistent, discoverable convention. ### Description - Updated `codex-rs/skills/src/assets/samples/skill-creator/SKILL.md` to add a user prompt: "Where should I create this skill? If you do not have a preference, I will place it in ~/.codex/skills so Codex can discover it automatically.". - Added guidance before running `init_skill.py` that if the user does not specify a location, the agent should default to `~/.codex/skills` (equivalently `$CODEX_HOME/skills`) for auto-discovery. - Updated the `init_skill.py` examples in the same `SKILL.md` to use `~/.codex/skills` as the recommended default while keeping one custom path example. ### Testing - Ran `cargo test -p codex-skills` and the crate's unit test suite passed (`1 passed; 0 failed`). - Verified relevant discovery behavior in code by checking `codex-rs/utils/home-dir/src/lib.rs` (`find_codex_home` defaults to `~/.codex`) and `codex-rs/core/src/skills/loader.rs` (user skill roots include `$CODEX_HOME/skills`). ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69b75a50bb008322a278e55eb0ddccd6)	2026-03-16 18:36:11 -07:00

1 2 3 4 5 ...

4626 commits