core-agent-ide

Author	SHA1	Message	Date
Charley Cunningham	ebbbc52ce4	Align SQLite feedback logs with feedback formatter (#13494 ) ## Summary - store a pre-rendered `feedback_log_body` in SQLite so `/feedback` exports keep span prefixes and structured event fields - render SQLite feedback exports with timestamps and level prefixes to match the old in-memory feedback formatter, while preserving existing trailing newlines - count `feedback_log_body` in the SQLite retention budget so structured or span-prefixed rows still prune correctly - bound `/feedback` row loading in SQL with the retention estimate, then apply exact whole-line truncation in Rust so uploads stay capped without splitting lines ## Details - add a `feedback_log_body` column to `logs` and backfill it from `message` for existing rows - capture span names plus formatted span and event fields at write time, since SQLite does not retain enough structure to reconstruct the old formatter later - keep SQLite feedback queries scoped to the requested thread plus same-process threadless rows - restore a SQL-side cumulative `estimated_bytes` cap for feedback export queries so over-retained partitions do not load every matching row before truncation - add focused formatting coverage for exported feedback lines and parity coverage against `tracing_subscriber` ## Testing - cargo test -p codex-state - just fix -p codex-state - just fmt codex author: `codex resume 019ca1b0-0ecc-78b1-85eb-6befdd7e4f1f` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 22:44:31 +00:00
Ahmed Ibrahim	7b37a0350f	Add final message prefix to realtime handoff output (#15077 ) - prefix realtime handoff output with the agent final message label for both realtime v1 and v2 - update realtime websocket and core expectations to match	2026-03-18 15:19:49 -07:00
xl-openai	86982ca1f9	Revert "fix: harden plugin feature gating" (#15102 ) Reverts openai/codex#15020 I messed up the commit in my PR and accidentally merged changes that were still under review.	2026-03-18 15:19:29 -07:00
Eric Traut	e5de13644d	Add a startup deprecation warning for custom prompts (#15076 ) ## Summary - detect custom prompts in `$CODEX_HOME/prompts` during TUI startup - show a deprecation notice only when prompts are present, with guidance to use `$skill-creator` - add TUI tests and snapshot coverage for present, missing, and empty prompts directories ## Testing - Manually tested	2026-03-18 15:21:30 -06:00
pakrym-oai	5cada46ddf	Return image URL from view_image tool (#15072 ) Cleanup image semantics in code mode. `view_image` now returns `{image_url:string, details?: string}` `image()` now allows both string parameter and `{image_url:string, details?: string}`	2026-03-18 13:58:20 -07:00
pakrym-oai	88e5382fc4	Propagate tool errors to code mode (#15075 ) Clean up error flow to push the FunctionCallError all the way up to dispatcher and allow code mode to surface as exception.	2026-03-18 13:57:55 -07:00
Michael Bolin	392347d436	fix: try to fix "Stage npm package" step in ci.yml (#15092 ) Fix the CI job by updating it to use artifacts from a more recent release (`0.115.0`) instead of the existing one (`0.74.0`). This step in our CI job on PRs started failing today: `334164a6f7/.github/workflows/ci.yml (L33-L47)` I believe it's because this test verifies that the "package npm" script works, but we want it to be fast and not wait for binaries to be built, so it uses a GitHub workflow that's already done. Because it was using a GitHub workflow associated with `0.74.0`, it seems likely that workflow's history has been reaped, so we need to use a newer one.	2026-03-18 13:52:33 -07:00
Felipe Coury	334164a6f7	feat(tui): restore composer history in app-server tui (#14945 ) ## Problem The app-server TUI (`tui_app_server`) lacked composer history support. Pressing Up/Down to recall previous prompts hit a stub that logged a warning and displayed "Not available in app-server TUI yet." New submissions were silently dropped from the shared history file, so nothing persisted for future sessions. ## Mental model Codex maintains a single, append-only history file (`$CODEX_HOME/history.jsonl`) shared across all TUI processes on the same machine. The legacy (in-process) TUI already reads/writes this file through `codex_core::message_history`. The app-server TUI delegates most operations to a separate process over RPC, but history is intentionally not an RPC concern — it's a client-local file. This PR makes the app-server TUI access the same history file directly, bypassing the app-server process entirely. The composer's Up/Down navigation and submit-time persistence now follow the same code paths as the legacy TUI, with the only difference being where the call is dispatched (locally in `App`, rather than inside `CodexThread`). The branch is rebuilt directly on top of `upstream/main`, so it keeps the existing app-server restore architecture intact. `AppServerStartedThread` still restores transcript history from the server `Thread` snapshot via `thread_snapshot_events`; this PR only adds composer-history support. ## Non-goals - Adding history support to the app-server protocol. History remains client-local. - Changing the on-disk format or location of `history.jsonl`. - Surfacing history I/O errors to the user (failures are logged and silently swallowed, matching the legacy TUI). ## Tradeoffs \| Decision \| Why \| Risk \| \|----------\|-----\|------\| \| Widen `message_history` from `pub(crate)` to `pub` \| Avoids duplicating file I/O logic; the module already has a clean, minimal API surface. \| Other workspace crates can now call these functions — the contract is no longer crate-private. However, this is consistent with recent precedent: `590cfa617` exposed `mention_syntax` for TUI consumption, `752402c4f` exposed plugin APIs (`PluginsManager`), and `14fcb6645`/`edacbf7b6` widened internal core APIs for other crates. These were all narrow, intentional exposures of specific APIs — not broad "make internals public" moves. `1af2a37ad` even went the other direction, reducing broad re-exports to tighten boundaries. This change follows the same pattern: a small, deliberate API surface (3 functions) rather than a wholesale visibility change. \| \| Intercept `AddToHistory` / `GetHistoryEntryRequest` in `App` before RPC fallback \| Keeps history ops out of the "unsupported op" error path without changing app-server protocol. \| This now routes through a single `submit_thread_op` entry point, which is safer than the original duplicated dispatch. The remaining risk is organizational: future thread-op submission paths need to keep using that shared entry point. \| \| `session_configured_from_thread_response` is now `async` \| Needs `await` on `history_metadata()` to populate real `history_log_id` / `history_entry_count`. \| Adds an async file-stat + full-file newline scan to the session bootstrap path. The scan is bounded by `history.max_bytes` and matches the legacy TUI's cost profile, but startup latency still scales with file size. \| ## Architecture ``` User presses Up User submits a prompt │ │ ▼ ▼ ChatComposerHistory ChatWidget::do_submit_turn navigate_up() encode_history_mentions() │ │ ▼ ▼ AppEvent::CodexOp Op::AddToHistory { text } (GetHistoryEntryRequest) │ │ ▼ ▼ App::try_handle_local_history_op App::try_handle_local_history_op message_history::append_entry() spawn_blocking { │ message_history::lookup() ▼ } $CODEX_HOME/history.jsonl │ ▼ AppEvent::ThreadEvent (GetHistoryEntryResponse) │ ▼ ChatComposerHistory::on_entry_response() ``` ## Observability - `tracing::warn` on `append_entry` failure (includes thread ID). - `tracing::warn` on `spawn_blocking` lookup join error. - `tracing::warn` from `message_history` internals on file-open, lock, or parse failures. ## Tests - `chat_composer_history::tests::navigation_with_async_fetch` — verifies that Up emits `Op::GetHistoryEntryRequest` (was: checked for stub error cell). - `app::tests::history_lookup_response_is_routed_to_requesting_thread` — verifies multi-thread composer recall routes the lookup result back to the originating thread. - `app_server_session::tests::resume_response_relies_on_snapshot_replay_not_initial_messages` — verifies app-server session restore still uses the upstream thread-snapshot path. - `app_server_session::tests::session_configured_populates_history_metadata` — verifies bootstrap sets nonzero `history_log_id` / `history_entry_count` from the shared local history file.	2026-03-18 11:54:11 -06:00
xl-openai	580f32ad2a	fix: harden plugin feature gating (#15020 ) 1. Use requirement-resolved config.features as the plugin gate. 2. Guard plugin/list, plugin/read, and related flows behind that gate. 3. Skip bad marketplace.json files instead of failing the whole list. 4. Simplify plugin state and caching.	2026-03-18 10:11:43 -07:00
pakrym-oai	606d85055f	Add notify to code-mode (#14842 ) Allows model to send an out-of-band notification. The notification is injected as another tool call output for the same call_id.	2026-03-18 09:37:13 -07:00
jif-oai	7ae99576a6	chore: disable memory read path for morpheus (#15059 ) Because we don't want prompts collisions	2026-03-18 15:42:56 +00:00
Eric Traut	347c6b12ec	Removed remaining core events from tui_app_server (#14942 )	2026-03-18 09:35:05 -06:00
jif-oai	58ac2a8773	nit: disable live memory edition (#15058 )	2026-03-18 14:49:57 +00:00
jif-oai	a265d6043e	feat: add memory citation to agent message (#14821 ) Client side to come	2026-03-18 10:03:38 +00:00
jif-oai	0f9484dc8a	feat: adapt artifacts to new packaging and 2.5.6 (#14947 )	2026-03-18 09:17:44 +00:00
Matthew Zeng	40a7d1d15b	[plugins] Support configuration tool suggest allowlist. (#15022 ) - [x] Support configuration tool suggest allowlist. Supports both plugins and connectors.	2026-03-17 23:58:27 -07:00
Dylan Hurd	84f4e7b39d	fix(subagents) share execpolicy by default (#13702 ) ## Summary If a subagent requests approval, and the user persists that approval to the execpolicy, it should (by default) propagate. We'll need to rethink this a bit in light of coming Permissions changes, though I think this is closer to the end state that we'd want, which is that execpolicy changes to one permissions profile should be synced across threads. ## Testing - [x] Added integration test --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-18 06:42:26 +00:00
viyatb-oai	a3613035f3	Pin setup-zig GitHub Action to immutable SHA (#14858 ) ### Motivation - Pinning the action to an immutable commit SHA reduces the risk of arbitrary code execution in runners with repository access and secrets. ### Description - Replaced `uses: mlugg/setup-zig@v2` with `uses: mlugg/setup-zig@d1434d0886 # v2` in three workflow files. - Updated the following files: ` .github/workflows/rust-ci.yml`, ` .github/workflows/rust-release.yml`, and ` .github/workflows/shell-tool-mcp.yml` to reference the immutable SHA while preserving the original `v2` intent in a trailing comment. ### Testing - No automated tests were run because this is a workflow-only change and does not affect repository source code, so CI validation will occur on the next workflow execution. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_69763f570234832d9c67b1b66a27c78d)	2026-03-17 22:40:14 -07:00
Andrei Eternal	6fef421654	[hooks] userpromptsubmit - hook before user's prompt is executed (#14626 ) - this allows blocking the user's prompts from executing, and also prevents them from entering history - handles the edge case where you can both prevent the user's prompt AND add n amount of additionalContexts - refactors some old code into common.rs where hooks overlap functionality - refactors additionalContext being previously added to user messages, instead we use developer messages for them - handles queued messages correctly Sample hook for testing - if you write "[block-user-submit]" this hook will stop the thread: example run ``` › sup • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (completed) warning: wizard-tower UserPromptSubmit demo inspected: sup hook context: Wizard Tower UserPromptSubmit demo fired. For this reply only, include the exact phrase 'observatory lanterns lit' exactly once near the end. • Just riding the cosmic wave and ready to help, my friend. What are we building today? observatory lanterns lit › and [block-user-submit] • Running UserPromptSubmit hook: reading the observatory notes UserPromptSubmit hook (stopped) warning: wizard-tower UserPromptSubmit demo blocked the prompt on purpose. stop: Wizard Tower demo block: remove [block-user-submit] to continue. ``` .codex/config.toml ``` [features] codex_hooks = true ``` .codex/hooks.json ``` { "hooks": { "UserPromptSubmit": [ { "hooks": [ { "type": "command", "command": "/usr/bin/python3 .codex/hooks/user_prompt_submit_demo.py", "timeoutSec": 10, "statusMessage": "reading the observatory notes" } ] } ] } } ``` .codex/hooks/user_prompt_submit_demo.py ``` #!/usr/bin/env python3 import json import sys from pathlib import Path def prompt_from_payload(payload: dict) -> str: prompt = payload.get("prompt") if isinstance(prompt, str) and prompt.strip(): return prompt.strip() event = payload.get("event") if isinstance(event, dict): user_prompt = event.get("user_prompt") if isinstance(user_prompt, str): return user_prompt.strip() return "" def main() -> int: payload = json.load(sys.stdin) prompt = prompt_from_payload(payload) cwd = Path(payload.get("cwd", ".")).name or "wizard-tower" if "[block-user-submit]" in prompt: print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo blocked the prompt on purpose." ), "decision": "block", "reason": ( "Wizard Tower demo block: remove [block-user-submit] to continue." ), } ) ) return 0 prompt_preview = prompt or "(empty prompt)" if len(prompt_preview) > 80: prompt_preview = f"{prompt_preview[:77]}..." print( json.dumps( { "systemMessage": ( f"{cwd} UserPromptSubmit demo inspected: {prompt_preview}" ), "hookSpecificOutput": { "hookEventName": "UserPromptSubmit", "additionalContext": ( "Wizard Tower UserPromptSubmit demo fired. " "For this reply only, include the exact phrase " "'observatory lanterns lit' exactly once near the end." ), }, } ) ) return 0 if __name__ == "__main__": raise SystemExit(main()) ```	2026-03-17 22:09:22 -07:00
Charley Cunningham	226241f035	Use workspace requirements for guardian prompt override (#14727 ) ## Summary - move `guardian_developer_instructions` from managed config into workspace-managed `requirements.toml` - have guardian continue using the override when present and otherwise fall back to the bundled local guardian prompt - keep the generalized prompt-quality improvements in the shared guardian default prompt - update requirements parsing, layering, schema, and tests for the new source of truth ## Context This replaces the earlier managed-config / MDM rollout plan. The intended rollout path is workspace-managed requirements, including cloud enterprise policies, rather than backend model metadata, Statsig, or Jamf-managed config. That keeps the default/fallback behavior local to `codex-rs` while allowing faster policy updates through the enterprise requirements plane. This is intentionally an admin-managed policy input, not a user preference: the guardian prompt should come either from the bundled `codex-rs` default or from enterprise-managed `requirements.toml`, and normal user/project/session config should not override it. ## Updating The OpenAI Prompt After this lands, the OpenAI-specific guardian prompt should be updated through the workspace Policies UI at `/codex/settings/policies` rather than through Jamf or codex-backend model metadata. Operationally: - open the workspace Policies editor as a Codex admin - edit the default `requirements.toml` policy, or a higher-precedence group-scoped override if we ever want different behavior for a subset of users - set `guardian_developer_instructions = """..."""` to the full OpenAI-specific guardian prompt text - save the policy; codex-backend stores the raw TOML and `codex-rs` fetches the effective requirements file from `/wham/config/requirements` When updating the OpenAI-specific prompt, keep it aligned with the shared default guardian policy in `codex-rs` except for intentional OpenAI-only additions. ## Testing - `cargo check --tests -p codex-core -p codex-config -p codex-cloud-requirements --message-format short` - `cargo run -p codex-core --bin codex-write-config-schema` - `cargo fmt` - `git diff --check` Co-authored-by: Codex <noreply@openai.com>	2026-03-17 22:05:41 -07:00
Ahmed Ibrahim	3ce879c646	Handle realtime conversation end in the TUI (#14903 ) - close live realtime sessions on errors, ctrl-c, and active meter removal - centralize TUI realtime cleanup and avoid duplicate follow-up close info --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 21:04:58 -07:00
pakrym-oai	770616414a	Prefer websockets when providers support them (#13592 ) Remove all flags and model settings. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 19:46:44 -07:00
viyatb-oai	d950543e65	feat: support restricted ReadOnlyAccess in elevated Windows sandbox (#14610 ) ## Summary - support legacy `ReadOnlyAccess::Restricted` on Windows in the elevated setup/runner backend - keep the unelevated restricted-token backend on the legacy full-read model only, and fail closed for restricted read-only policies there - keep the legacy full-read Windows path unchanged while deriving narrower read roots only for elevated restricted-read policies - honor `include_platform_defaults` by adding backend-managed Windows system roots only when requested, while always keeping helper roots and the command `cwd` readable - preserve `workspace-write` semantics by keeping writable roots readable when restricted read access is in use in the elevated backend - document the current Windows boundary: legacy `SandboxPolicy` is supported on both backends, while richer split-only carveouts still fail closed instead of running with weaker enforcement ## Testing - `cargo test -p codex-windows-sandbox` - `cargo check -p codex-windows-sandbox --tests --target x86_64-pc-windows-msvc` - `cargo clippy -p codex-windows-sandbox --tests --target x86_64-pc-windows-msvc -- -D warnings` - `cargo test -p codex-core windows_restricted_token_` ## Notes - local `cargo test -p codex-windows-sandbox` on macOS only exercises the non-Windows stubs; the Windows-targeted compile and clippy runs provide the local signal, and GitHub Windows CI exercises the runtime path	2026-03-17 19:08:50 -07:00
viyatb-oai	6fe8a05dcb	fix: honor active permission profiles in sandbox debug (#14293 ) ## Summary - stop `codex sandbox` from forcing legacy `sandbox_mode` when active `[permissions]` profiles are configured - keep the legacy `read-only` / `workspace-write` fallback for legacy configs and reject `--full-auto` for profile-based configs - use split filesystem and network policies in the macOS/Linux debug sandbox helpers and add regressions for the config-loading behavior assuming "codex/docs/private/secret.txt" = "none" ``` codex -c 'default_permissions="limited-read-test"' sandbox macos -- <command> ... codex sandbox macos -- cat codex/docs/private/secret.txt >/dev/null; echo EXIT:$? cat: codex/docs/private/secret.txt: Operation not permitted EXIT:1 ``` --------- Co-authored-by: celia-oai <celia@openai.com>	2026-03-18 01:52:02 +00:00
pakrym-oai	83a60fdb94	Add FS abstraction and use in view_image (#14960 ) Adds an environment crate and environment + file system abstraction. Environment is a combination of attributes and services specific to environment the agent is connected to: File system, process management, OS, default shell. The goal is to move most of agent logic that assumes environment to work through the environment abstraction.	2026-03-17 17:36:23 -07:00
Max Johnson	19b887128e	app-server: reject websocket requests with Origin headers (#14995 ) Reject websocket requests that carry an `Origin` header	2026-03-18 00:24:53 +00:00
xl-openai	a5d3114e97	feat: Add product-aware plugin policies and clean up manifest naming (#14993 ) - Add shared Product support to marketplace plugin policy and skill policy (no enforced yet). - Move marketplace installation/authentication under policy and model it as MarketplacePluginPolicy. - Rename plugin/marketplace local manifest types to separate raw serde shapes from resolved in-memory models.	2026-03-17 17:01:34 -07:00
Shaqayeq	fc75d07504	Add Python SDK public API and examples (#14446 ) ## TL;DR WIP esp the examples Thin the Python SDK public surface so the wrapper layer returns canonical app-server generated models directly. - keeps `Codex` / `AsyncCodex` / `Thread` / `Turn` and input helpers, but removes alias-only type layers and custom result models - `metadata` now returns `InitializeResponse` and `run()` returns the generated app-server `Turn` - updates docs, examples, notebook, and tests to use canonical generated types and regenerates `v2_all.py` against current schema - keeps the pinned runtime-package integration flow and real integration coverage ## Validation - `PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests` - `GH_TOKEN="$(gh auth token)" RUN_REAL_CODEX_TESTS=1 PYTHONPATH=sdk/python/src python3 -m pytest sdk/python/tests -rs` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 16:05:56 -07:00
viyatb-oai	0d1539e74c	fix(linux-sandbox): prefer system /usr/bin/bwrap when available (#14963 ) ## Problem Ubuntu/AppArmor hosts started failing in the default Linux sandbox path after the switch to vendored/default bubblewrap in `0.115.0`. The clearest report is in [#14919](https://github.com/openai/codex/issues/14919), especially [this investigation comment](https://github.com/openai/codex/issues/14919#issuecomment-4076504751): on affected Ubuntu systems, `/usr/bin/bwrap` works, but a copied or vendored `bwrap` binary fails with errors like `bwrap: setting up uid map: Permission denied` or `bwrap: loopback: Failed RTM_NEWADDR: Operation not permitted`. The root cause is Ubuntu's `/etc/apparmor.d/bwrap-userns-restrict` profile, which grants `userns` access specifically to `/usr/bin/bwrap`. Once Codex started using a vendored/internal bubblewrap path, that path was no longer covered by the distro AppArmor exception, so sandbox namespace setup could fail even when user namespaces were otherwise enabled and `uidmap` was installed. ## What this PR changes - prefer system `/usr/bin/bwrap` whenever it is available - keep vendored bubblewrap as the fallback when `/usr/bin/bwrap` is missing - when `/usr/bin/bwrap` is missing, surface a Codex startup warning through the app-server/TUI warning path instead of printing directly from the sandbox helper with `eprintln!` - use the same launcher decision for both the main sandbox execution path and the `/proc` preflight path - document the updated Linux bubblewrap behavior in the Linux sandbox and core READMEs ## Why this fix This still fixes the Ubuntu/AppArmor regression from [#14919](https://github.com/openai/codex/issues/14919), but it keeps the runtime rule simple and platform-agnostic: if the standard system bubblewrap is installed, use it; otherwise fall back to the vendored helper. The warning now follows that same simple rule. If Codex cannot find `/usr/bin/bwrap`, it tells the user that it is falling back to the vendored helper, and it does so through the existing startup warning plumbing that reaches the TUI and app-server instead of low-level sandbox stderr. ## Testing - `cargo test -p codex-linux-sandbox` - `cargo test -p codex-app-server --lib` - `cargo test -p codex-tui-app-server tests::embedded_app_server_start_failure_is_returned` - `cargo clippy -p codex-linux-sandbox --all-targets` - `cargo clippy -p codex-app-server --all-targets` - `cargo clippy -p codex-tui-app-server --all-targets`	2026-03-17 23:05:34 +00:00
Ahmed Ibrahim	98be562fd3	Unify realtime shutdown in core (#14902 ) - route realtime startup, input, and transport failures through a single shutdown path - emit one realtime error/closed lifecycle while clearing session state once --------- Co-authored-by: Codex <noreply@openai.com> Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>	2026-03-17 15:58:52 -07:00
Ahmed Ibrahim	c6ab4ee537	Gate realtime audio interruption logic to v2 (#14984 ) - thread the realtime version into conversation start and app-server notifications - keep playback-aware mic gating and playback interruption behavior on v2 only, leaving v1 on the legacy path	2026-03-17 15:24:37 -07:00
xl-openai	1a9555eda9	Cleanup skills/remote/xxx endpoints. (#14977 ) Remote skills/remote/xxx as they are not in used for now.	2026-03-17 15:22:36 -07:00
Felipe Coury	43ee72a9b9	fix(tui): implement /mcp inventory for tui_app_server (#14931 ) ## Problem The `/mcp` command did not work in the app-server TUI (remote mode). On `main`, `add_mcp_output()` called `McpManager::effective_servers()` in-process, which only sees locally configured servers, and then emitted a generic stub message for the app-server to handle. In remote usage, that left `/mcp` without a real inventory view. ## Solution Implement `/mcp` for the app-server TUI by fetching MCP server inventory directly from the app-server via the paginated `mcpServerStatus/list` RPC and rendering the results into chat history. The command now follows a three-phase lifecycle: 1. Loading: `ChatWidget::add_mcp_output()` inserts a transient `McpInventoryLoadingCell` and emits `AppEvent::FetchMcpInventory`. This gives immediate feedback that the command registered. 2. Fetch: `App::fetch_mcp_inventory()` spawns a background task that calls `fetch_all_mcp_server_statuses()` over an app-server request handle. When the RPC completes, it sends `AppEvent::McpInventoryLoaded { result }`. 3. Resolve: `App::handle_mcp_inventory_result()` clears the loading cell and renders either `new_mcp_tools_output_from_statuses(...)` or an error message. This keeps the main app event loop responsive, so the TUI can repaint before the remote RPC finishes. ## Notes - No `app-server` changes were required. - The rendered inventory includes auth, tools, resources, and resource templates, plus transport details when they are available from local config for display enrichment. - The app-server RPC does not expose authoritative `enabled` or `disabled_reason` state for MCP servers, so the remote `/mcp` view no longer renders a `Status:` row rather than guessing from local config. - RPC failures surface in history as `Failed to load MCP inventory: ...`. ## Tests - `slash_mcp_requests_inventory_via_app_server` - `mcp_inventory_maps_prefix_tool_names_by_server` - `handle_mcp_inventory_result_clears_committed_loading_cell` - `mcp_tools_output_from_statuses_renders_status_only_servers` - `mcp_inventory_loading_snapshot`	2026-03-17 16:11:27 -06:00
Colin Young	0d2ff40a58	Add auth env observability (#14905 ) CXC-410 Emit Env Var Status with `/feedback` report Add more observability on top of #14611 [Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream) [Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream) <img width="1063" height="610" alt="image" src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e" /> ###### Summary - Adds auth-env telemetry that records whether key auth-related env overrides were present on session start and request paths. - Threads those auth-env fields through `/responses`, websocket, and `/models` telemetry and feedback metadata. - Buckets custom provider `env_key` configuration to a safe `"configured"` value instead of emitting raw config text. - Keeps the slice observability-only: no raw token values or raw URLs are emitted. ###### Rationale (from spec findings) - 401 and auth-path debugging needs a way to distinguish env-driven auth paths from sessions with no auth env override. - Startup and model-refresh failures need the same auth-env diagnostics as normal request failures. - Feedback and Sentry tags need the same auth-env signal as OTel events so reports can be triaged consistently. - Custom provider config is user-controlled text, so the telemetry contract must stay presence-only / bucketed. ###### Scope - Adds a small `AuthEnvTelemetry` bundle for env presence collection and threads it through the main request/session telemetry paths. - Does not add endpoint/base-url/provider-header/geo routing attribution or broader telemetry API redesign. ###### Trade-offs - `provider_env_key_name` is bucketed to `"configured"` instead of preserving the literal configured env var name. - `/models` is included because startup/model-refresh auth failures need the same diagnostics, but broader parity work remains out of scope. - This slice keeps the existing telemetry APIs and layers auth-env fields onto them rather than redesigning the metadata model. ###### Client follow-up - Add the separate endpoint/base-url attribution slice if routing-source diagnosis is still needed. - Add provider-header or residency attribution only if auth-env presence proves insufficient in real reports. - Revisit whether any additional auth-related env inputs need safe bucketing after more 401 triage data. ###### Testing - `cargo test -p codex-core emit_feedback_request_tags -- --nocapture` - `cargo test -p codex-core collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture` - `cargo test -p codex-core models_request_telemetry_emits_auth_env_feedback_tags_on_failure -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability -- --nocapture` - `cargo test -p codex-core --no-run --message-format short` - `cargo test -p codex-otel --no-run --message-format short` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 14:26:27 -07:00
pakrym-oai	ee756eb80f	Rename exec_wait tool to wait (#14983 ) Summary - document that code mode only exposes `exec` and the renamed `wait` tool - update code mode tool spec and descriptions to match the new tool name - rename tests and helper references from `exec_wait` to `wait` Testing - Not run (not requested)	2026-03-17 14:22:26 -07:00
iceweasel-oai	2cc4ee413f	temporarily disable private desktop until it works with elevated IPC path (#14986 )	2026-03-17 21:09:57 +00:00
Ahmed Ibrahim	4d9d4b7b0f	Stabilize approval matrix write-file command (#14968 ) ## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 13:52:36 -07:00
Ahmed Ibrahim	23a44ddbe8	Stabilize permissions popup selection tests (#14966 ) ## What is flaky The permissions popup tests in the TUI are flaky, especially on Windows. They assume the popup opens on a specific row and that a fixed number of `Up` or `Down` keypresses will land on a specific preset. They also match popup text too loosely, so a non-selected row can satisfy the assertion. ## Why it was flaky These tests were asserting incidental rendering details rather than the actual selected permission preset. On Windows, the initial selection can differ from non-Windows runs. Some tests also searched the entire popup for text like `Guardian Approvals` or `(current)`, which can match a row that is visible but not selected. Once the popup order or current preset shifted slightly, a test could fail even though the UI behavior was still correct. ## How this PR fixes it This PR adds helpers that identify the selected popup row and selected preset name directly. The tests now assert the current selection by name, navigate to concrete target presets instead of assuming a fixed number of keypresses, and explicitly set the reviewer state in the cases that require `Guardian Approvals` to be current. ## Why this fix fixes the flakiness The assertions now track semantic state, not fragile text placement. Navigation is target-based instead of order-based, so Windows/non-Windows row differences and harmless popup layout changes no longer break the tests. That removes the scheduler- and platform-sensitive assumptions that made the popup suite intermittent. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:45:44 +00:00
Ahmed Ibrahim	b02388672f	Stabilize Windows cmd-based shell test harnesses (#14958 ) ## What is flaky The Windows shell-driven integration tests in `codex-rs/core` were intermittently unstable, especially: - `apply_patch_cli_can_use_shell_command_output_as_patch_input` - `websocket_test_codex_shell_chain` - `websocket_v2_test_codex_shell_chain` ## Why it was flaky These tests were exercising real shell-tool flows through whichever shell Codex selected on Windows, and the `apply_patch` test also nested a PowerShell read inside `cmd /c`. There were multiple independent sources of nondeterminism in that setup: - The test harness depended on the model-selected Windows shell instead of pinning the shell it actually meant to exercise. - `cmd.exe /c powershell.exe -Command "..."` is quoting-sensitive; on CI that could leave the read command wrapped as a literal string instead of executing it. - Even after getting the quoting right, PowerShell could emit CLIXML progress records like module-initialization output onto stdout. - The `apply_patch` test was building a patch directly from shell stdout, so any quoting artifact or progress noise corrupted the patch input. So the failures were driven by shell startup and output-shape variance, not by the `apply_patch` or websocket logic themselves. ## How this PR fixes it - Add a test-only `user_shell_override` path so Windows integration tests can pin `cmd.exe` explicitly. - Use that override in the websocket shell-chain tests and in the `apply_patch` harness. - Change the nested Windows file read in `apply_patch_cli_can_use_shell_command_output_as_patch_input` to a UTF-8 PowerShell `-EncodedCommand` script. - Run that nested PowerShell process with `-NonInteractive`, set `$ProgressPreference = 'SilentlyContinue'`, and read the file with `[System.IO.File]::ReadAllText(...)`. ## Why this fix fixes the flakiness The outer harness now runs under a deterministic shell, and the inner PowerShell read no longer depends on fragile `cmd` quoting or on progress output staying quiet by accident. The shell tool returns only the file contents, so patch construction and websocket assertions depend on stable test inputs instead of on runner-specific shell behavior. --------- Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 20:21:46 +00:00
Matthew Zeng	683c37ce75	[plugins] Support plugin installation elicitation. (#14896 ) It now supports: - Connectors that are from installed and enabled plugins that are not installed yet - Plugins that are on the allowlist that are not installed yet.	2026-03-17 13:19:28 -07:00
Eric Traut	49e7dda2df	Add device-code onboarding and ChatGPT token refresh to app-server TUI (#14952 ) ## Summary - add device-code ChatGPT sign-in to `tui_app_server` onboarding and reuse the existing `chatgptAuthTokens` login path - fall back to browser login when device-code auth is unavailable on the server - treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state during onboarding - add a local ChatGPT auth loader for handing local tokens to the app server and serving refresh requests - handle `account/chatgptAuthTokens/refresh` instead of marking it unsupported, including workspace/account mismatch checks - add focused coverage for onboarding success, existing auth handling, local auth loading, and refresh request behavior ## Testing - `cargo test -p codex-tui-app-server` - `just fix -p codex-tui-app-server`	2026-03-17 14:12:12 -06:00
iceweasel-oai	95bdea93d2	use framed IPC for elevated command runner (#14846 ) ## Summary This is PR 2 of the Windows sandbox runner split. PR 1 introduced the framed IPC runner foundation and related Windows sandbox infrastructure without changing the active elevated one-shot execution path. This PR switches that elevated one-shot path over to the new runner IPC transport and removes the old request-file bootstrap that PR 1 intentionally left in place. After this change, ordinary elevated Windows sandbox commands still behave as one-shot executions, but they now run as the simple case of the same helper/IPC transport that later unified_exec work will build on. ## Why this is needed for unified_exec Windows elevated sandboxed execution crosses a user boundary: the CLI launches a helper as the sandbox user and has to manage command execution from outside that security context. For one-shot commands, the old request-file/bootstrap flow was sufficient. For unified_exec, it is not. Unified_exec needs a long-lived bidirectional channel so the parent can: - send a spawn request - receive structured spawn success/failure - stream stdout and stderr incrementally - eventually support stdin writes, termination, and other session lifecycle events This PR does not add long-lived sessions yet. It converts the existing elevated one-shot path to use the same framed IPC transport so that PR 3 can add unified_exec session semantics on top of a transport that is already exercised by normal elevated command execution. ## Scope This PR: - updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`, and collect framed `Output`/`Exit` messages - removes the old `--request-file=...` execution path from `windows-sandbox-rs/src/elevated/command_runner_win.rs` - keeps the public behavior one-shot: no session reuse or interactive unified_exec behavior is introduced here This PR does not: - add Windows unified_exec session support - add background terminal reuse - add PTY session lifecycle management ## Why Windows needs this and Linux/macOS do not On Linux and macOS, the existing sandbox/process model composes much more directly with long-lived process control. The parent can generally spawn and own the child process (or PTY) directly inside the sandbox model we already use. Windows elevated sandboxing is different. The parent is not directly managing the sandboxed process in the same way; it launches across a different user/security context. That means long-lived control requires an explicit helper process plus IPC for spawn, output, exit, and later stdin/session control. So the extra machinery here is not because unified_exec is conceptually different on Windows. It is because the elevated Windows sandbox boundary requires a helper-mediated transport to support it cleanly. ## Validation - `cargo test -p codex-windows-sandbox`	2026-03-17 11:38:44 -07:00
Keyan Zhang	904dbd414f	generate an internal json schema for `RolloutLine` (#14434 ) ### Why i'm working on something that parses and analyzes codex rollout logs, and i'd like to have a schema for generating a parser/validator. `codex app-server generate-internal-json-schema` writes an `RolloutLine.json` file while doing this, i noticed we have a writer <> reader mismatch issue on `FunctionCallOutputPayload` and reasoning item ID -- added some schemars annotations to fix those ### Test ``` $ just codex app-server generate-internal-json-schema --out ./foo ``` generates an `RolloutLine.json` file, which i validated against jsonl files on disk `just codex app-server --help` doesn't expose the `generate-internal-json-schema` option by default, but you can do `just codex app-server generate-internal-json-schema --help` if you know the command everything else still works --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 11:19:42 -07:00
Ahmed Ibrahim	0d531c05f2	Fix code mode yield startup race (#14959 )	2026-03-17 11:09:12 -07:00
jif-oai	d484bb57d9	feat: add suffix to shell snapshot name (#14938 ) https://github.com/openai/codex/issues/14906	2026-03-17 17:59:27 +00:00
Ahmed Ibrahim	f26ad3c92c	Fix fuzzy search notification buffering in app-server tests (#14955 ) ## What is flaky `codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently loses the expected `fuzzyFileSearch/sessionUpdated` and `fuzzyFileSearch/sessionCompleted` notifications when multiple fuzzy-search sessions are active and CI delivers notifications out of order. ## Why it was flaky The wait helpers were keyed only by JSON-RPC method name. - `wait_for_session_updated` consumed the next `fuzzyFileSearch/sessionUpdated` notification even when it belonged to a different search session. - `wait_for_session_completed` did the same for `fuzzyFileSearch/sessionCompleted`. - Once an unmatched notification was read, it was dropped permanently instead of buffered. - That meant a valid completion for the target search could arrive slightly early, be consumed by the wrong waiter, and disappear before the test started waiting for it. The result depended on notification ordering and runner scheduling instead of on the actual product behavior. ## How this PR fixes it - Add a buffered notification reader in `codex-rs/app-server/tests/common/mcp_process.rs`. - Match fuzzy-search notifications on the identifying payload fields instead of matching only on method name. - Preserve unmatched notifications in the in-process queue so later waiters can still consume them. - Include pending notification methods in timeout failures to make future diagnosis concrete. ## Why this fix fixes the flakiness The test now behaves like a real consumer of an out-of-order event stream: notifications for other sessions stay buffered until the correct waiter asks for them. Reordering no longer loses the target event, so the test result is determined by whether the server emitted the right notifications, not by which one happened to be read first. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-17 10:52:16 -07:00
Felipe Coury	78e8ee4591	fix(tui): restore remote resume and fork history (#14930 ) ## Problem When the TUI connects to a remote app-server (via WebSocket), resume and fork operations lost all conversation history. `AppServerStartedThread` carried only the `SessionConfigured` event, not the full `Thread` snapshot. After resume or fork, the chat transcript was empty — prior turns were silently discarded. A secondary issue: `primary_session_configured` was not cleared on reset, causing stale session state after reconnection. ## Approach: TUI-side only, zero app-server changes The app-server already returns the full `Thread` object (with populated `turns: Vec<Turn>`) in its `ThreadStartResponse`, `ThreadResumeResponse`, and `ThreadForkResponse`. The data was always there — the TUI was simply throwing it away. The old `AppServerStartedThread` struct only kept the `SessionConfiguredEvent`, discarding the rich turn history that the server had already provided. This PR fixes the problem entirely within `tui_app_server` (3 files changed, 0 changes to `app-server`, `app-server-protocol`, or any other crate). Rather than modifying the server to send history in a different format or adding a new endpoint, the fix preserves the existing `Thread` snapshot and replays it through the TUI's standard event pipeline — making restored sessions indistinguishable from live ones. ## Solution Add a thread snapshot replay path. When the server hands back a `Thread` object (on start, resume, or fork), `restore_started_app_server_thread` converts its historical turns into the same core `Event` sequence the TUI already processes for live interactions, then replays them into the event store so the chat widget renders them. Key changes: - `AppServerStartedThread` now carries the full `Thread` — `started_thread_from_{start,resume,fork}_response` clone the thread into the struct alongside the existing `SessionConfiguredEvent`. - `thread_snapshot_events()` walks the thread's turns and items, producing `TurnStarted` → `ItemCompleted`* → `TurnComplete`/`TurnAborted` event sequences that the TUI already knows how to render. - `restore_started_app_server_thread()` pushes the session event + history events into the thread channel's store, activates the channel, and replays the snapshot — used for initial startup, resume, and fork. - `primary_session_configured` cleared on reset to prevent stale session state after reconnection. ## Tradeoffs - `Thread` is cloned into `AppServerStartedThread`: The full thread snapshot (including all historical turns) is cloned at startup. For long-lived threads this could be large, but it's a one-time cost and avoids lifetime gymnastics with the response. ## Tests - `restore_started_app_server_thread_replays_remote_history` — end-to-end: constructs a `Thread` with one completed turn, restores it, and asserts user/agent messages appear in the transcript. - `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies `thread_snapshot_events` produces the correct event sequence for completed and interrupted turns. ## Test plan - [ ] Verify `cargo check -p codex-tui-app-server` passes - [ ] Verify `cargo test -p codex-tui-app-server` passes - [ ] Manual: connect to a remote app-server, resume an existing thread, confirm history renders in the chat widget - [ ] Manual: fork a thread via remote, confirm prior turns appear	2026-03-17 11:16:08 -06:00
Shijie Rao	8e258eb3f5	Feat: CXA-1831 Persist latest model and reasoning effort in sqlite (#14859 ) ### Summary The goal is for us to get the latest turn model and reasoning effort on thread/resume is no override is provided on the thread/resume func call. This is the part 1 which we write the model and reasoning effort for a thread to the sqlite db and there will be a followup PR to consume the two new fields on thread/resume. [part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888) and this one can be merged independently.	2026-03-17 10:14:34 -07:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
jif-oai	e8add54e5d	feat: show effective model in spawn agent event (#14944 ) Show effective model after the full config layering for the sub agent	2026-03-17 16:58:58 +00:00

1 2 3 4 5 ...

4642 commits