core-agent-ide

Author	SHA1	Message	Date
Charley Cunningham	07aefffb1f	core: bundle settings diff updates into one dev/user envelope (#12417 ) ## Summary - bundle contextual prompt injection into at most one developer message plus one contextual user message in both: - per-turn settings updates - initial context insertion - preserve `<model_switch>` across compaction by rebuilding it through canonical initial-context injection, instead of relying on strip/reattach hacks - centralize contextual user fragment detection in one shared definition table and reuse it for parsing/compaction logic - keep `AGENTS.md` in its natural serialized format: - `# AGENTS.md instructions for {dirname}` - `<INSTRUCTIONS>...</INSTRUCTIONS>` - simplify related tests/helpers and accept the expected snapshot/layout updates from bundled multi-part messages ## Why The goal is to converge toward a simpler, more intentional prompt shape where contextual updates are consistently represented as one developer envelope plus one contextual user envelope, while keeping parsing and compaction behavior aligned with that representation. ## Notable details - the temporary `SettingsUpdateEnvelope` wrapper was removed; these paths now return `Vec<ResponseItem>` directly - local/remote compaction no longer rely on model-switch strip/restore helpers - contextual user detection is now driven by shared fragment definitions instead of ad hoc matcher assembly - AGENTS/user instructions are still the same logical context; only the synthetic `<user_instructions>` wrapper was replaced by the natural AGENTS text format ## Testing - `just fmt` - `cargo test -p codex-app-server codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages -- --exact` - `cargo test -p codex-core compact::tests::collect_user_messages_filters_session_prefix_entries --lib -- --exact` - `cargo test -p codex-core --test all 'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_apps_guidance_as_developer_message_when_enabled' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_developer_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::includes_user_instructions_message_in_request' -- --exact` - `cargo test -p codex-core --test all 'suite::client::resume_includes_initial_messages_and_sends_prior_items' -- --exact` - `cargo test -p codex-core --test all 'suite::review::review_input_isolated_from_parent_history' -- --exact` - `cargo test -p codex-exec --test all 'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' -- --exact` - `cargo test -p core_test_support context_snapshot::tests::full_text_mode_preserves_unredacted_text -- --exact` ## Notes - I also ran several targeted `compact`, `compact_remote`, `prompt_caching`, `model_visible_layout`, and `event_mapping` tests while iterating on prompt-shape changes. - I have not claimed a clean full-workspace `cargo test` from this environment because local sandbox/resource conditions have previously produced unrelated failures in large workspace runs.	2026-02-26 00:12:08 -08:00
Eric Traut	28bfbb8f2b	Enforce user input length cap (#12823 ) Currently there is no bound on the length of a user message submitted in the TUI or through the app server interface. That means users can paste many megabytes of text, which can lead to bad performance, hangs, and crashes. In extreme cases, it can lead to a [kernel panic](https://github.com/openai/codex/issues/12323). This PR limits the length of a user input to 2**20 (about 1M) characters. This value was chosen because it fills the entire context window on the latest models, so accepting longer inputs wouldn't make sense anyway. Summary - add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol and surface it in TUI and app server code - block oversized submissions in the TUI submit flow and emit error history cells when validation fails - reject heavy app-server requests with JSON-RPC `-32602` and structured `input_too_large` data, plus document the behavior Testing - ran the IDE extension with this change and verified that when I attempt to paste a user message that's several MB long, it correctly reports an error instead of crashing or making my computer hot.	2026-02-25 22:23:51 -08:00
Michael Bolin	14116ade8d	feat: include available decisions in command approval requests (#12758 ) Command-approval clients currently infer which choices to show from side-channel fields like `networkApprovalContext`, `proposedExecpolicyAmendment`, and `additionalPermissions`. That makes the request shape harder to evolve, and it forces each client to replicate the server's heuristics instead of receiving the exact decision list for the prompt. This PR introduces a mapping between `CommandExecutionApprovalDecision` and `codex_protocol::protocol::ReviewDecision`: ```rust impl From<CoreReviewDecision> for CommandExecutionApprovalDecision { fn from(value: CoreReviewDecision) -> Self { match value { CoreReviewDecision::Approved => Self::Accept, CoreReviewDecision::ApprovedExecpolicyAmendment { proposed_execpolicy_amendment, } => Self::AcceptWithExecpolicyAmendment { execpolicy_amendment: proposed_execpolicy_amendment.into(), }, CoreReviewDecision::ApprovedForSession => Self::AcceptForSession, CoreReviewDecision::NetworkPolicyAmendment { network_policy_amendment, } => Self::ApplyNetworkPolicyAmendment { network_policy_amendment: network_policy_amendment.into(), }, CoreReviewDecision::Abort => Self::Cancel, CoreReviewDecision::Denied => Self::Decline, } } } ``` And updates `CommandExecutionRequestApprovalParams` to have a new field: ```rust available_decisions: Option<Vec<CommandExecutionApprovalDecision>> ``` when, if specified, should make it easier for clients to display an appropriate list of options in the UI. This makes it possible for `CoreShellActionProvider::prompt()` in `unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly, adding support for `ApprovedForSession` when approving a skill script, which was previously missing in the TUI. Note this results in a significant change to `exec_options()` in `approval_overlay.rs`, as the displayed options are now derived from `available_decisions: &[ReviewDecision]`. ## What Changed - Add `available_decisions` to [`ExecApprovalRequestEvent`](`de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)`), including helpers to derive the legacy default choices when older senders omit the field. - Map `codex_protocol::protocol::ReviewDecision` to app-server `CommandExecutionApprovalDecision` and expose the ordered list as experimental `availableDecisions` in [`CommandExecutionRequestApprovalParams`](`de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)`). - Thread optional `available_decisions` through the core approval path so Unix shell escalation can explicitly request `ApprovedForSession` for session-scoped approvals instead of relying on client heuristics. [`unix_escalation.rs`](`de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214)`) - Update the TUI approval overlay to build its buttons from the ordered decision list, while preserving the legacy fallback when `available_decisions` is missing. - Update the app-server README, test client output, and generated schema artifacts to document and surface the new field. ## Testing - Add `approval_overlay.rs` coverage for explicit decision lists, including the generic `ApprovedForSession` path and network approval options. - Update `chatwidget/tests.rs` and app-server protocol tests to populate the new optional field and keep older event shapes working. ## Developers Docs - If we document `item/commandExecution/requestApproval` on [developers.openai.com/codex](https://developers.openai.com/codex), add experimental `availableDecisions` as the preferred source of approval choices and note that older servers may omit it.	2026-02-26 01:10:46 +00:00
Celia Chen	4f45668106	Revert "Add skill approval event/response (#12633 )" (#12811 ) This reverts commit https://github.com/openai/codex/pull/12633. We no longer need this PR, because we favor sending normal exec command approval server request with `additional_permissions` of skill permissions instead	2026-02-26 01:02:42 +00:00
Charley Cunningham	2f4d6ded1d	Enable request_user_input in Default mode (#12735 ) ## Summary - allow `request_user_input` in Default collaboration mode as well as Plan - update the Default-mode instructions to prefer assumptions first and use `request_user_input` only when a question is unavoidable - update request_user_input and app-server tests to match the new Default-mode behavior - refactor collaboration-mode availability plumbing into `CollaborationModesConfig` for future mode-related flags ## Codex author `codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`	2026-02-25 15:20:46 -08:00
Celia Chen	b6d20748e0	Revert "Ensure shell command skills trigger approval (#12697 )" (#12721 ) This reverts commit `daf0f03ac8`. # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2026-02-25 22:49:53 +00:00
Owen Lin	21f7032dbb	feat(app-server): thread/unsubscribe API (#10954 ) Adds a new v2 app-server API for a client to be able to unsubscribe to a thread: - New RPC method: `thread/unsubscribe` - New server notification: `thread/closed` Today clients can start/resume/archive threads, but there wasn’t a way to explicitly unload a live thread from memory without archiving it. With `thread/unsubscribe`, a client can indicate it is no longer actively working with a live Thread. If this is the only client subscribed to that given thread, the thread will be automatically closed by app-server, at which point the server will send `thread/closed` and `thread/status/changed` with `status: notLoaded` notifications. This gives clients a way to prevent long-running app-server processes from accumulating too many thread (and related) objects in memory. Closed threads will also be removed from `thread/loaded/list`.	2026-02-25 13:14:30 -08:00
sayan-oai	d45ffd5830	make 5.3-codex visible in cli for api users (#12808 ) 5.3-codex released in api, mark it visible for API users via bundled `models.json`.	2026-02-25 13:01:40 -08:00
Michael Bolin	be5bca6f8d	fix: harden zsh fork tests and keep subcommand approvals deterministic (#12809 ) ## Why The prior `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` assertion was brittle under Bazel: command approval payloads in the test could include environment-dependent wrapper/command formatting differences, which makes exact command-string matching flaky even when behavior is correct. (This regression was knowingly introduced in https://github.com/openai/codex/pull/12800, but it was urgent to land that PR.) ## What changed - Hardened `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` in [`turn_start_zsh_fork.rs`](https://github.com/openai/codex/blob/main/codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs): - Replaced strict `approval_command.starts_with("/bin/rm")` checks with intent-based subcommand matching. - Subcommand approvals are now recognized by file-target semantics (`first.txt` or `second.txt`) plus `rm` intent. - Parent approval recognition is now more tolerant of command-format differences while still requiring a definitive parent command context. - Uses a defensive loop that waits for all target subcommand decisions and the parent approval request. - Preserved the existing regression and unit test fixes from earlier commits in `unix_escalation.rs` and `skill_approval.rs`. ## Verification - Ran the zsh fork subcommand decline regression under this change: - `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` - Confirmed the test is now robust against approval-command-string variation instead of hardcoding one expected command shape.	2026-02-25 12:23:30 -08:00
Owen Lin	a0fd94bde6	feat(app-server): add ThreadItem::DynamicToolCall (#12732 ) Previously, clients would call `thread/start` with dynamic_tools set, and when a model invokes a dynamic tool, it would just make the server->client `item/tool/call` request and wait for the client's response to complete the tool call. This works, but it doesn't have an `item/started` or `item/completed` event. Now we are doing this: - [new] emit `item/started` with `DynamicToolCall` populated with the call arguments - send an `item/tool/call` server request - [new] once the client responds, emit `item/completed` with `DynamicToolCall` populated with the response. Also, with `persistExtendedHistory: true`, dynamic tool calls are now reconstructable in `thread/read` and `thread/resume` as `ThreadItem::DynamicToolCall`.	2026-02-25 12:00:10 -08:00
Ahmed Ibrahim	947092283a	Add app-server v2 thread realtime API (#12715 ) Add experimental `thread/realtime/*` v2 requests and notifications, then route app-server realtime events through that thread-scoped surface with integration coverage. --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-25 09:59:10 -08:00
mcgrew-oai	9a393c9b6f	feat(network-proxy): add embedded OTEL policy audit logging (#12046 ) PR Summary This PR adds embedded-only OTEL policy audit logging for `codex-network-proxy` and threads audit metadata from `codex-core` into managed proxy startup. ### What changed - Added structured audit event emission in `network_policy.rs` with target `codex_otel.network_proxy`. - Emitted: - `codex.network_proxy.domain_policy_decision` once per domain-policy evaluation. - `codex.network_proxy.block_decision` for non-domain denies. - Added required policy/network fields, RFC3339 UTC millisecond `event.timestamp`, and fallback defaults (`http.request.method="none"`, `client.address="unknown"`). - Added non-domain deny audit emission in HTTP/SOCKS handlers for mode-guard and proxy-state denies, including unix-socket deny paths. - Added `REASON_UNIX_SOCKET_UNSUPPORTED` and used it for unsupported unix-socket auditing. - Added `NetworkProxyAuditMetadata` to runtime/state, re-exported from `lib.rs` and `state.rs`. - Added `start_proxy_with_audit_metadata(...)` in core config, with `start_proxy()` delegating to default metadata. - Wired metadata construction in `codex.rs` from session/auth context, including originator sanitization for OTEL-safe tagging. - Updated `network-proxy/README.md` with embedded-mode audit schema and behavior notes. - Refactored HTTP block-audit emission to a small local helper to reduce duplication. - Preserved existing unix-socket proxy-disabled host/path behavior for responses and blocked history while using an audit-only endpoint override (`server.address="unix-socket"`, `server.port=0`). ### Explicit exclusions - No standalone proxy OTEL startup work. - No `main.rs` binary wiring. - No `standalone_otel.rs`. - No standalone docs/tests. ### Tests - Extended `network_policy.rs` tests for event mapping, metadata propagation, fallbacks, timestamp format, and target prefix. - Extended HTTP tests to assert unix-socket deny block audit events. - Extended SOCKS tests to cover deny emission from handler deny branches. - Added/updated core tests to verify audit metadata threading into managed proxy state. ### Validation run - `just fmt` - `cargo test -p codex-network-proxy` ✅ - `cargo test -p codex-core` ran with one unrelated flaky timeout (`shell_snapshot::tests::snapshot_shell_does_not_inherit_stdin`), and the test passed when rerun directly ✅ --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-02-25 11:46:37 -05:00
alexsong-oai	6d6570d89d	Support external agent config detect and import (#12660 ) Migration Behavior * Config * Migrates settings.json into config.toml * Only adds fields when config.toml is missing, or when those fields are missing from the existing file * Supported mappings: env -> shell_environment_policy sandbox.enabled = true -> sandbox_mode = "workspace-write" * Skills * Copies home and repo .claude/skills into .agents/skills * Existing skill directories are not overwritten * SKILL.md content is rewritten from Claude-related terms to Codex * AgentsMd * Repo only * Migrates CLAUDE.md into AGENTS.md * Detect/import only proceed when AGENTS.md is missing or present but empty * Content is rewritten from Claude-related terms to Codex	2026-02-25 02:11:51 -08:00
jif-oai	f46b767b7e	feat: add search term to thread list (#12578 ) Add `searchTerm` to `thread/list` that will search for a match in the titles (the condition being `searchTerm` $$\in$$ `title`)	2026-02-25 09:59:41 +00:00
jif-oai	10c04e11b8	feat: add service name to app-server (#12319 ) Add service name to the app-server so that the app can use it's own service name This is on thread level because later we might plan the app-server to become a singleton on the computer	2026-02-25 09:51:42 +00:00
viyatb-oai	c086b36b58	feat(ui): add network approval persistence plumbing (#12358 ) ## Summary - add TUI approval options for persistent network host rules - add app-server v2 approval payload plumbing for network approval context + proposed network policy amendments - add app-server handling to translate `applyNetworkPolicyAmendment` decisions back into core review decisions - update docs/test client output and generated app-server schemas/types	2026-02-25 07:06:19 +00:00
Celia Chen	1151972fb2	feat: add experimental additionalPermissions to v2 command execution approval requests (#12737 ) This adds additionalPermissions to the app-server v2 item/commandExecution/requestApproval payload as an experimental field. The field is now exposed on CommandExecutionRequestApprovalParams and is populated from the existing core approval event when a command requests additional sandbox permissions. This PR also contains changes to make server requests to support experiment API. A real app server test client test: sample payload with experimental flag off: ``` { < "id": 0, < "method": "item/commandExecution/requestApproval", < "params": { < "command": "/bin/zsh -lc 'mkdir -p ~/some/test && touch ~/some/test/file'", < "commandActions": [ < { < "command": "mkdir -p '~/some/test'", < "type": "unknown" < }, < { < "command": "touch '~/some/test/file'", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_QLp0LWkQ1XkU6VW9T2vUZFWB", < "proposedExecpolicyAmendment": [ < "mkdir", < "-p", < "~/some/test" < ], < "reason": "Do you want to allow creating ~/some/test/file outside the workspace?", < "threadId": "019c9309-e209-7d82-a01b-dcf9556a354d", < "turnId": "019c9309-e27a-7f33-834f-6011e795c2d6" < } < } ``` with experimental flag on: ``` < { < "id": 0, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": null, < "network": true < }, < "command": "/bin/zsh -lc 'install -D /dev/null ~/some/test/file'", < "commandActions": [ < { < "command": "install -D /dev/null '~/some/test/file'", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_K3U4b3dRbj3eMCqslmncbGsq", < "proposedExecpolicyAmendment": [ < "install", < "-D" < ], < "reason": "Do you want to allow creating the file at ~/some/test/file outside the workspace sandbox?", < "threadId": "019c9303-3a8e-76e1-81bf-d67ac446d892", < "turnId": "019c9303-3af1-7143-88a1-73132f771234" < } < } ```	2026-02-25 05:16:35 +00:00
Michael Bolin	e88f74d140	feat: pass helper executable paths via Arg0DispatchPaths (#12719 ) ## Why `codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` previously located `codex-execve-wrapper` by scanning `PATH` and sibling directories. That lookup is brittle and can select the wrong binary when the runtime environment differs from startup assumptions. We already pass `codex-linux-sandbox` from `codex-arg0`; `codex-execve-wrapper` should use the same startup-driven path plumbing. ## What changed - Introduced `Arg0DispatchPaths` in `codex-arg0` to carry both helper executable paths: - `codex_linux_sandbox_exe` - `main_execve_wrapper_exe` - Updated `arg0_dispatch_or_else()` to pass `Arg0DispatchPaths` to top-level binaries and preserve helper paths created in `prepend_path_entry_for_codex_aliases()`. - Threaded `Arg0DispatchPaths` through entrypoints in `cli`, `exec`, `tui`, `app-server`, and `mcp-server`. - Added `main_execve_wrapper_exe` to core configuration plumbing (`Config`, `ConfigOverrides`, and `SessionServices`). - Updated zsh-fork shell escalation to consume the configured `main_execve_wrapper_exe` and removed path-sniffing fallback logic. - Updated app-server config reload paths so reloaded configs keep the same startup-provided helper executable paths. ## References - [`Arg0DispatchPaths` definition](`e355b43d5c/codex-rs/arg0/src/lib.rs (L20-L24)`) - [`arg0_dispatch_or_else()` forwarding both paths](`e355b43d5c/codex-rs/arg0/src/lib.rs (L145-L176)`) - [zsh-fork escalation using configured wrapper path](`e355b43d5c/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L109-L150)`) ## Testing - `cargo check -p codex-arg0 -p codex-core -p codex-exec -p codex-tui -p codex-mcp-server -p codex-app-server` - `cargo test -p codex-arg0` - `cargo test -p codex-core tools::runtimes:🐚:unix_escalation:: -- --nocapture`	2026-02-24 17:44:38 -08:00
Michael Bolin	448fb6ac22	fix: clarify the value of SkillMetadata.path (#12729 ) Rename `SkillMetadata.path` to `SkillMetadata.path_to_skills_md` for clarity. Would ideally change the type to `AbsolutePathBuf`, but that can be done later.	2026-02-24 17:15:54 -08:00
Max Johnson	5163850025	codex-rs/app-server: graceful websocket restart on Ctrl-C (#12517 ) ## Summary - add graceful websocket app-server restart on Ctrl-C by draining until no assistant turns are running - stop the websocket acceptor and disconnect existing connections once the drain condition is met - add a websocket integration test that verifies Ctrl-C waits for an in-flight turn before exit ## Verification - `cargo check -p codex-app-server --quiet` - `cargo test -p codex-app-server --test all suite::v2::connection_handling_websocket` - I (maxj) tested remote and local Codex.app --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-24 16:27:59 -08:00
pakrym-oai	5571a022eb	Add app-server event tracing (#12695 ) To help with debugging	2026-02-24 14:45:50 -08:00
daveaitel-openai	dcab40123f	Agent jobs (spawn_agents_on_csv) + progress UI (#10935 ) ## Summary - Add agent job support: spawn a batch of sub-agents from CSV, auto-run, auto-export, and store results in SQLite. - Simplify workflow: remove run/resume/get-status/export tools; spawn is deterministic and completes in one call. - Improve exec UX: stable, single-line progress bar with ETA; suppress sub-agent chatter in exec. ## Why Enables map-reduce style workflows over arbitrarily large repos using the existing Codex orchestrator. This addresses review feedback about overly complex job controls and non-deterministic monitoring. ## Demo (progress bar) ``` ./codex-rs/target/debug/codex exec \ --enable collab \ --enable sqlite \ --full-auto \ --progress-cursor \ -c agents.max_threads=16 \ -C /Users/daveaitel/code/codex \ - <<'PROMPT' Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows: path = item-01..item-30, area = test. Then call spawn_agents_on_csv with: - csv_path: /tmp/agent_job_progress_demo.csv - instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1." - output_csv_path: /tmp/agent_job_progress_demo_out.csv PROMPT ``` ## Review feedback addressed - Auto-start jobs on spawn; removed run/resume/status/export tools. - Auto-export on success. - More descriptive tool spec + clearer prompts. - Avoid deadlocks on spawn failure; pending/running handled safely. - Progress bar no longer scrolls; stable single-line redraw. ## Tests - `cd codex-rs && cargo test -p codex-exec` - `cd codex-rs && cargo build -p codex-cli`	2026-02-24 21:00:19 +00:00
pakrym-oai	daf0f03ac8	Ensure shell command skills trigger approval (#12697 ) Summary - detect skill-invoking shell commands based on the original command string, request approvals when needed, and cache positive decisions per session - keep implicit skill invocation emitted after approval and keep skill approval decline messaging centralized to the shell handler - expand and adjust skill approval tests to cover shell-based skill scripts while matching the new detection expectations Testing - Not run (not requested)	2026-02-24 12:13:20 -08:00
Michael Bolin	3ca0e7673b	feat: run zsh fork shell tool via shell-escalation (#12649 ) ## Why This PR switches the `shell_command` zsh-fork path over to `codex-shell-escalation` so the new shell tool can use the shared exec-wrapper/escalation protocol instead of the `zsh_exec_bridge` implementation that was introduced in https://github.com/openai/codex/pull/12052. `zsh_exec_bridge` relied on UNIX domain sockets, which is not as tamper-proof as the FD-based approach in `codex-shell-escalation`. ## What Changed - Added a Unix zsh-fork runtime adapter in `core` (`core/src/tools/runtimes/shell/unix_escalation.rs`) that: - runs zsh-fork commands through `codex_shell_escalation::run_escalate_server` - bridges exec-policy / approval decisions into `ShellActionProvider` - executes escalated commands via a `ShellCommandExecutor` that calls `process_exec_tool_call` - Updated `ShellRuntime` / `ShellCommandHandler` / tool spec wiring to select a `shell_command` backend (`classic` vs `zsh-fork`) while leaving the generic `shell` tool path unchanged. - Removed the `zsh_exec_bridge`-based session service and deleted `core/src/zsh_exec_bridge/mod.rs`. - Moved exec-wrapper entrypoint dispatch to `arg0` by handling the `codex-execve-wrapper` arg0 alias there, and removed the old `codex_core::maybe_run_zsh_exec_wrapper_mode()` hooks from `cli` and `app-server` mains. - Added the needed `codex-shell-escalation` dependencies for `core` and `arg0`. ## Tests - `cargo test -p codex-core shell_zsh_fork_prefers_shell_command_over_unified_exec` - `cargo test -p codex-app-server turn_start_shell_zsh_fork -- --nocapture` - verifies zsh-fork command execution and approval flows through the new backend - includes subcommand approve/decline coverage using the shared zsh DotSlash fixture in `app-server/tests/suite/zsh` - To test manually, I added the following to `~/.codex/config.toml`: ```toml zsh_path = "/Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh" [features] shell_zsh_fork = true ``` Then I ran `just c` to run the dev build of Codex with these changes and sent it the message: ``` run `echo $0` ``` And it replied with: ``` echo $0 printed: /Users/mbolin/code/codex3/codex-rs/app-server/tests/suite/zsh In this tool context, $0 reflects the script path used to invoke the shell, not just zsh. ``` so the tool appears to be wired up correctly. ## Notes - The zsh subcommand-decline integration test now uses `rm` under a `WorkspaceWrite` sandbox. The previous `/usr/bin/true` scenario is auto-allowed by the new `shell-escalation` policy path, which no longer produces subcommand approval prompts.	2026-02-24 10:31:08 -08:00
Dylan Hurd	f6053fdfb3	feat(core) Introduce Feature::RequestPermissions (#11871 ) ## Summary Introduces the initial implementation of Feature::RequestPermissions. RequestPermissions allows the model to request that a command be run inside the sandbox, with additional permissions, like writing to a specific folder. Eventually this will include other rules as well, and the ability to persist these permissions, but this PR is already quite large - let's get the core flow working and go from there! <img width="1279" height="541" alt="Screenshot 2026-02-15 at 2 26 22 PM" src="https://github.com/user-attachments/assets/0ee3ec0f-02ec-4509-91a2-809ac80be368" /> ## Testing - [x] Added tests - [x] Tested locally - [x] Feature	2026-02-24 09:48:57 -08:00
sayan-oai	7e46e5b9c2	chore: rm hardcoded PRESETS list (#12650 ) rm `PRESETS` list harcoded in `model_presets` as we now have bundled `models.json` with equivalent info. update logic to rely on bundled models instead, update tests.	2026-02-23 22:35:51 -08:00
pakrym-oai	58763afa0f	Add skill approval event/response (#12633 ) Set the stage for skill-level permission approval in addition to command-level. Behind a feature flag.	2026-02-23 22:28:58 -08:00
Michael Bolin	38f84b6b29	refactor: delete exec-server and move execve wrapper into shell-escalation (#12632 ) ## Why We already plan to remove the shell-tool MCP path, and doing that cleanup first makes the follow-on `shell-escalation` work much simpler. This change removes the last remaining reason to keep `codex-rs/exec-server` around by moving the `codex-execve-wrapper` binary and shared shell test fixtures to the crates/tests that now own that functionality. ## What Changed ### Delete `codex-rs/exec-server` - Remove the `exec-server` crate, including the MCP server binary, MCP-specific modules, and its test support/test suite - Remove `exec-server` from the `codex-rs` workspace and update `Cargo.lock` ### Move `codex-execve-wrapper` into `codex-rs/shell-escalation` - Move the wrapper implementation into `shell-escalation` (`src/unix/execve_wrapper.rs`) - Add the `codex-execve-wrapper` binary entrypoint under `shell-escalation/src/bin/` - Update `shell-escalation` exports/module layout so the wrapper entrypoint is hosted there - Move the wrapper README content from `exec-server` to `shell-escalation/README.md` ### Move shared shell test fixtures to `app-server` - Move the DotSlash `bash`/`zsh` test fixtures from `exec-server/tests/suite/` to `app-server/tests/suite/` - Update `app-server` zsh-fork tests to reference the new fixture paths ### Keep `shell-tool-mcp` as a shell-assets package - Update `.github/workflows/shell-tool-mcp.yml` packaging so the npm artifact contains only patched Bash/Zsh payloads (no Rust binaries) - Update `shell-tool-mcp/package.json`, `shell-tool-mcp/src/index.ts`, and docs to reflect the shell-assets-only package shape - `shell-tool-mcp-ci.yml` does not need changes because it is already JS-only ## Verification - `cargo shear` - `cargo clippy -p codex-shell-escalation --tests` - `just clippy`	2026-02-23 20:10:22 -08:00
Javi	5a3bdcb27b	app-server: fix connecting via websockets with `Sec-WebSocket-Extensions: permessage-deflate` (#12629 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2026-02-24 02:41:03 +00:00
Charley Cunningham	3cea3e665e	app-server: box request dispatch future to reduce stack pressure (#12421 )	2026-02-23 10:05:41 -08:00
Michael Bolin	e8949f4507	test: vendor zsh fork via DotSlash and stabilize zsh-fork tests (#12518 ) ## Why The zsh integration tests were still brittle in two ways: - they relied on `CODEX_TEST_ZSH_PATH` / environment-specific setup, so they often did not exercise the patched zsh fork that `shell-tool-mcp` ships - once the tests consistently used the vendored zsh fork, they exposed real Linux-specific zsh-fork issues in CI In particular, the Linux failures were not just test noise: - the zsh-fork launch path was dropping `ExecRequest.arg0`, so Linux `codex-linux-sandbox` arg0 dispatch did not run and zsh wrapper-mode could receive malformed arguments - the `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` test uses the zsh exec bridge (which talks to the parent over a Unix socket), but Linux restricted sandbox seccomp denies `connect(2)`, causing timeouts on `ubuntu-24.04` x86/arm This PR makes the zsh tests consistently run against the intended vendored zsh fork and fixes/hardens the zsh-fork path so the Linux CI signal is meaningful. ## What Changed - Added a single shared test-only DotSlash file for the patched zsh fork at `codex-rs/exec-server/tests/suite/zsh` (analogous to the existing `bash` test resource). - Updated both app-server and exec-server zsh tests to use that shared DotSlash zsh (no duplicate zsh DotSlash file, no `CODEX_TEST_ZSH_PATH` dependency). - Updated the app-server zsh-fork test helper to resolve the shared DotSlash zsh and avoid silently falling back to host zsh. - Kept the app-server zsh-fork tests configured via `config.toml`, using a test wrapper path where needed to force `zsh -df` (and rewrite `-lc` to `-c`) for the subcommand-decline test. - Hardened the app-server subcommand-decline zsh-fork test for CI variability: - tolerate an extra `/responses` POST with a no-op mock response - tolerate non-target approval ordering while remaining strict on the two `/usr/bin/true` approvals and decline behavior - use `DangerFullAccess` on Linux for this one test because it validates zsh approval flow, not Linux sandbox socket restrictions - Fixed zsh-fork process launching on Linux by preserving `req.arg0` in `ZshExecBridge::execute_shell_request(...)` so `codex-linux-sandbox` arg0 dispatch continues to work. - Moved `maybe_run_zsh_exec_wrapper_mode()` under `arg0_dispatch_or_else(...)` in `app-server` and `cli` so wrapper-mode handling coexists correctly with arg0-dispatched helper modes. - Consolidated duplicated `dotslash -- fetch` resolution logic into shared test support (`core/tests/common/lib.rs`). - Updated `codex-rs/exec-server/tests/suite/accept_elicitation.rs` to use DotSlash zsh and hardened the zsh elicitation test for Bazel/zsh differences by: - resolving an absolute `git` path - running `git init --quiet .` - asserting success / `.git` creation instead of relying on banner text ## Verification - `cargo test -p codex-app-server turn_start_zsh_fork -- --nocapture` - `cargo test -p codex-exec-server accept_elicitation -- --nocapture` - `bazel test //codex-rs/exec-server:exec-server-all-test --test_output=streamed --test_arg=--nocapture --test_arg=accept_elicitation_for_prompt_rule_with_zsh` - CI (`rust-ci`) on the final cleaned commit: `Tests — ubuntu-24.04 - x86_64-unknown-linux-gnu` and `Tests — ubuntu-24.04-arm - aarch64-unknown-linux-gnu` passed in [run 22291424358](https://github.com/openai/codex/actions/runs/22291424358)	2026-02-22 19:39:56 -08:00
Max Johnson	37610240ec	app-server: retain thread listener across disconnects (#12373 ) - keep the per-thread app-server listener alive when the last client unsubscribes or disconnects - preserve listener-side active turn history so running `thread/resume` can merge an in-progress turn snapshot after reconnect - add `ThreadStateManager` regressions for disconnect/unsubscribe retention and explicit thread teardown cleanup Added unit tests, and I manually tested to confirm the fix --------- Co-authored-by: Codex <noreply@openai.com>	2026-02-22 05:33:33 +00:00
Michael Bolin	1af2a37ada	chore: remove codex-core public protocol/shell re-exports (#12432 ) ## Why `codex-rs/core/src/lib.rs` re-exported a broad set of types and modules from `codex-protocol` and `codex-shell-command`. That made it easy for workspace crates to import those APIs through `codex-core`, which in turn hides dependency edges and makes it harder to reduce compile-time coupling over time. This change removes those public re-exports so call sites must import from the source crates directly. Even when a crate still depends on `codex-core` today, this makes dependency boundaries explicit and unblocks future work to drop `codex-core` dependencies where possible. ## What Changed - Removed public re-exports from `codex-rs/core/src/lib.rs` for: - `codex_protocol::protocol` and related protocol/model types (including `InitialHistory`) - `codex_protocol::config_types` (`protocol_config_types`) - `codex_shell_command::{bash, is_dangerous_command, is_safe_command, parse_command, powershell}` - Migrated workspace Rust call sites to import directly from: - `codex_protocol::protocol` - `codex_protocol::config_types` - `codex_protocol::models` - `codex_shell_command` - Added explicit `Cargo.toml` dependencies (`codex-protocol` / `codex-shell-command`) in crates that now import those crates directly. - Kept `codex-core` internal modules compiling by using `pub(crate)` aliases in `core/src/lib.rs` (internal-only, not part of the public API). - Updated the two utility crates that can already drop a `codex-core` dependency edge entirely: - `codex-utils-approval-presets` - `codex-utils-cli` ## Verification - `cargo test -p codex-utils-approval-presets` - `cargo test -p codex-utils-cli` - `cargo check --workspace --all-targets` - `just clippy`	2026-02-20 23:45:35 -08:00
Yaroslav Volovich	dca9c40dd5	test(app-server): wait for turn/completed in turn_start tests (#12376 ) ## Summary - switch a few app-server `turn_start` tests from `codex/event/task_complete` waits to `turn/completed` waits - avoid matching unrelated/background `task_complete` events - keep this flaky test fix separate from the /title feature PR ## Why On Windows ARM CI, these tests can return early after observing a generic `codex/event/task_complete` notification from another task. That can leave the mock Responses server with fewer calls than expected and fail the test with a wiremock verification mismatch. Using `turn/completed` matches the app-server turn lifecycle notification the tests actually care about. ## Validation - `cargo test -p codex-app-server turn_start_updates_sandbox_and_cwd_between_turns_v2 -- --nocapture` - `cargo test -p codex-app-server turn_start_exec_approval_ -- --nocapture` - `just fmt`	2026-02-20 21:15:21 -08:00
Michael Bolin	a73efab8dd	fix: address flakiness in thread_resume_rejoins_running_thread_even_with_override_mismatch (#12381 ) ## Why `thread/resume` responses for already-running threads can be reported as `Idle` even while a turn is still in progress. This is caused by a timing window where the runtime watch state has not yet observed the running-thread transition, so API clients can receive stale status information at resume time. Possibly related: https://github.com/openai/codex/pull/11786 ## What - Add a shared status normalization helper, `resolve_thread_status`, in `codex-rs/app-server/src/thread_status.rs` that resolves `Idle`/`NotLoaded` to `Active { active_flags: [] }` when an in-progress turn is known. - Reuse this helper across thread response paths in `codex-rs/app-server/src/codex_message_processor.rs` (including `thread/start`, `thread/unarchive`, `thread/read`, `thread/resume`, `thread/fork`, and review/thread-started notification responses). - In `handle_pending_thread_resume_request`, use both the in-memory `active_turn_snapshot` and the resumed rollout turns to decide whether a turn is in progress before resolving thread status for the response. - Extend `thread_status` tests to validate the new status-resolution behavior directly. ## Verification - `cargo test -p codex-app-server suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`	2026-02-20 20:36:04 -08:00
natea-oai	936e744c93	Add field to Thread object for the latest rename set for a given thread (#12301 ) Exposes through the app server updated names set for a thread. This enables other surfaces to use the core as the source of truth for thread naming. `threadName` is gathered using the helper functions used to interact with `session_index.jsonl`, and is hydrated in: - `thread/list` - `thread/read` - `thread/resume` - `thread/unarchive` - `thread/rollback` We don't do this for `thread/start` and `thread/fork`.	2026-02-20 18:26:57 -08:00
pakrym-oai	1bb7989b20	Add ability to attach extra files to feedback (#12370 ) Allow clients to provide extra files.	2026-02-20 22:26:14 +00:00
Matthew Zeng	354e7fedd2	[apps] Enforce simple logo url format. (#12374 ) - [x] Enforce simple logo url format when loading apps directory to save bandwidth.	2026-02-20 22:05:55 +00:00
Max Johnson	6b1091fc92	app-server: harden disconnect cleanup paths (#12218 ) Hardens codex-rs/app-server connection lifecycle and outbound routing for websocket clients. Fixes some FUD I was having - Added per-connection disconnect signaling (CancellationToken) for websocket transports. - Split websocket handling into independent inbound/outbound tasks coordinated by cancellation. - Changed outbound routing so websocket connections use non-blocking try_send; slow/full websocket writers are disconnected instead of stalling broadcast delivery. - Kept stdio behavior blocking-on-send (no forced disconnect) so local stdio clients are not dropped when queues are temporarily full. - Simplified outbound router flow by removing deferred pending_closed_connections handling. - Added guards to drop incoming response/notification/error messages from unknown connections. - Fixed listener teardown race in thread listener tasks using a listener_generation check so stale tasks do not clear newer listeners. Fixes https://linear.app/openai/issue/CODEX-4966/multiclient-handle-slow-notification-consumers ## Tests Added/updated transport tests covering: - broadcast does not block on a slow/full websocket connection - stdio connection waits instead of disconnecting on full queue I (maxj) have tested manually and will retest before landing	2026-02-20 20:35:16 +00:00
Matthew Zeng	aa121a115e	[apps] Implement apps configs. (#12086 ) - [x] Implement apps configs.	2026-02-20 12:05:21 -08:00
viyatb-oai	28c0089060	fix(network-proxy): add unix socket allow-all and update seatbelt rules (#11368 ) ## Summary Adds support for a Unix socket escape hatch so we can bypass socket allowlisting when explicitly enabled. ## Description * added a new flag, `network.dangerously_allow_all_unix_sockets` as an explicit escape hatch * In codex-network-proxy, enabling that flag now allows any absolute Unix socket path from x-unix-socket instead of requiring each path to be explicitly allowlisted. Relative paths are still rejected. * updated the macOS seatbelt path in core so it enforces the same Unix socket behavior: * allowlisted sockets generate explicit network* subpath rules * allow-all generates a broad network* (subpath "/") rule --------- Co-authored-by: Codex <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>	2026-02-20 10:56:57 -08:00
viyatb-oai	e8afaed502	Refactor network approvals to host/protocol/port scope (#12140 ) ## Summary Simplify network approvals by removing per-attempt proxy correlation and moving to session-level approval dedupe keyed by (host, protocol, port). Instead of encoding attempt IDs into proxy credentials/URLs, we now treat approvals as a destination policy decision. - Concurrent calls to the same destination share one approval prompt. - Different destinations (or same host on different ports) get separate prompts. - Allow once approves the current queued request group only. - Allow for session caches that (host, protocol, port) and auto-allows future matching requests. - Never policy continues to deny without prompting. Example: - 3 calls: - a.com (line 443) - b.com (line 443) - a.com (line 443) => 2 prompts total (a, b), second a waits on the first decision. - a.com:80 is treated separately from a.com line 443 ## Testing - `just fmt` (in `codex-rs`) - `cargo test -p codex-core tools::network_approval::tests` - `cargo test -p codex-core` (unit tests pass; existing integration-suite failures remain in this environment)	2026-02-20 10:39:55 -08:00
Max Johnson	41f15bf07b	app-server: add JSON tracing logs (#12287 ) - add `LOG_FORMAT=json` support for app-server tracing logs via `tracing_subscriber`'s built-in JSON formatter - keep the default human-readable format unchanged and keep `RUST_LOG` filtering behavior - document the env var and update lockfile	2026-02-20 10:10:51 -08:00
jif-oai	035c4c30bb	fix: nick name at thread/read (#12347 )	2026-02-20 17:53:51 +00:00
jif-oai	4d60c803ba	feat: cleaner TUI for sub-agents (#12327 ) <img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25" src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765" />	2026-02-20 15:26:33 +00:00
jif-oai	0f9eed3a6f	feat: add nick name to sub-agents (#12320 ) Adding random nick name to sub-agents. Used for UX At the same time, also storing and wiring the role of the sub-agent	2026-02-20 14:39:49 +00:00
Max Johnson	b06f91c4fe	app-server: improve thread resume rejoin flow (#11776 ) thread/resume response includes latest turn with all items, in band so no events are stale or lost Testing - e2e tested using app-server-test-client using flow described in "Testing Thread Rejoin Behavior" in codex-rs/app-server-test-client/README.md - e2e tested in codex desktop by reconnecting to a running turn	2026-02-20 05:29:05 +00:00
Michael Bolin	366ecaf17a	app-server: fix flaky list_apps_returns_connectors_with_accessible_flags test (#12286 ) ## Why `app/list` emits `app/list/updated` after whichever async load finishes first (directory connectors or accessible tools). This test assumed the directory-backed update always arrived first because it injected a tools delay, but that assumption is not stable when the process-global Codex Apps tools cache is already warm. In that case the accessible-tools path can return immediately and the first notification shape flips, which makes the assertion flaky. Relevant code paths: - [`codex-rs/app-server/src/codex_message_processor.rs`](`13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034)`) (concurrent loads + per-load `app/list/updated` notifications) - [`codex-rs/core/src/mcp_connection_manager.rs`](`13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197)`) (Codex Apps tools cache hit path) ## What Changed Updated `suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags` in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either valid first `app/list/updated` payload: - the directory-first snapshot - the accessible-tools-first snapshot The test still keeps the later assertions strict: - the second `app/list/updated` notification must be the fully merged result - the final `app/list` response must match the same merged result I also added an inline comment explaining why the first notification is intentionally order-insensitive. ## Verification - `cargo test -p codex-app-server`	2026-02-20 02:27:18 +00:00
Michael Bolin	4fa304306b	tests: centralize in-flight turn cleanup helper (#12271 ) ## Why Several tests intentionally exercise behavior while a turn is still active. The cleanup sequence for those tests (`turn/interrupt` + waiting for `codex/event/turn_aborted`) was duplicated across files, which made the rationale easy to lose and the pattern easy to apply inconsistently. This change centralizes that cleanup in one place with a single explanatory doc comment. ## What Changed ### Added shared helper In `codex-rs/app-server/tests/common/mcp_process.rs`: - Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`. - Added a doc comment explaining why explicit interrupt + terminal wait is required for tests that intentionally leave a turn in-flight. ### Migrated call sites Replaced duplicated interrupt/aborted blocks with the helper in: - `codex-rs/app-server/tests/suite/v2/thread_resume.rs` - `thread_resume_rejects_history_when_thread_is_running` - `thread_resume_rejects_mismatched_path_when_thread_is_running` - `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs` - `turn_start_shell_zsh_fork_executes_command_v2` - `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` - `codex-rs/app-server/tests/suite/v2/turn_steer.rs` - `turn_steer_returns_active_turn_id` ### Existing cleanup retained In `codex-rs/app-server/tests/suite/v2/turn_start.rs`: - `turn_start_accepts_local_image_input` continues to explicitly wait for `turn/completed` so the turn lifecycle is fully drained before test exit. ## Verification - `cargo test -p codex-app-server`	2026-02-20 01:47:34 +00:00
Michael Bolin	7ed3e3760d	tests(thread_resume): interrupt running turns in resume error-path tests (#12269 ) ## Why `thread_resume` tests can intentionally create an in-flight turn, assert a `thread/resume` error path, and return immediately. That leaves turn work active during teardown, which can surface as intermittent `LEAK` failures. Sample output that motivated this investigation (reported during test runs): ```text LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch ``` ## What Changed Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`: - `thread_resume_rejects_history_when_thread_is_running` - `thread_resume_rejects_mismatched_path_when_thread_is_running` Both tests now: 1. capture the running turn id from `TurnStartResponse` 2. assert the expected `thread/resume` error 3. call `turn/interrupt` for that running turn 4. wait for `codex/event/turn_aborted` before returning ## Why This Is The Correct Fix These tests are specifically validating resume behavior while a turn is active. They should also own cleanup of that active turn before exiting. Explicitly interrupting and waiting for the terminal abort notification removes teardown races and avoids relying on process-drop behavior to clean up in-flight work. ## Repro / Verification Repro command used for investigation: ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) \| test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) \| test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) \| test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)' ``` Observed before this change: intermittent `LEAK` in `thread_resume_rejects_history_when_thread_is_running`. Also verified with: - `cargo test -p codex-app-server` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269). * #12271 * __->__ #12269	2026-02-19 21:51:18 +00:00

1 2 3 4 5 ...

443 commits