core-agent-ide

Author	SHA1	Message	Date
sayan-oai	7b2cee53db	chore: wire through plugin policies + category from marketplace.json (#14305 ) wire plugin marketplace metadata through app-server endpoints: - `plugin/list` has `installPolicy` and `authPolicy` - `plugin/install` has plugin-level `authPolicy` `plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when installing. added tests.	2026-03-11 12:33:10 -07:00
Owen Lin	fa1242c83b	fix(otel): make HTTP trace export survive app-server runtimes (#14300 ) ## Summary This PR fixes OTLP HTTP trace export in runtimes where the previous exporter setup was unreliable, especially around app-server usage. It also removes the old `codex_otel::otel_provider` compatibility shim and switches remaining call sites over to the crate-root `codex_otel::OtelProvider` export. ## What changed - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes. - Add an async HTTP client path for trace export when we are already inside a multi-thread Tokio runtime. - Make provider shutdown flush traces before tearing down the tracer provider. - Add loopback coverage that verifies traces are actually sent to `/v1/traces`: - outside Tokio - inside a multi-thread Tokio runtime - inside a current-thread Tokio runtime - Remove the `codex_otel::otel_provider` shim and update remaining imports. ## Why I hit cases where spans were being created correctly but never made it to the collector. The issue turned out to be in exporter/runtime behavior rather than the span plumbing itself. This PR narrows that gap and gives us regression coverage for the actual export path.	2026-03-11 12:33:10 -07:00
pakrym-oai	548583198a	Allow bool web_search in ToolsToml (#14352 ) Summary - add a custom deserializer so `[tools].web_search` can be a bool (treated as disabled) or a config object - extend core and app-server tests to cover bool handling in TOML config Testing - Not run (not requested)	2026-03-11 12:33:10 -07:00
Celia Chen	c1a424691f	chore: add a separate reject-policy flag for skill approvals (#14271 ) ## Summary - add `skill_approval` to `RejectConfig` and the app-server v2 `AskForApproval::Reject` payload so skill-script prompts can be configured independently from sandbox and rule-based prompts - update Unix shell escalation to reject prompts based on the actual decision source, keeping prefix rules tied to `rules`, unmatched command fallbacks tied to `sandbox_approval`, and skill scripts tied to `skill_approval` - regenerate the affected protocol/config schemas and expand unit/integration coverage for the new flag and skill approval behavior	2026-03-11 12:33:09 -07:00
Leo Shimonaka	889b4796fc	feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155 ) Add additional macOS Sandbox Permissions levers for the following: - Launch Services - Contacts - Reminders	2026-03-11 12:33:09 -07:00
joeytrasatti-openai	8ac27b2a16	Add ephemeral flag support to thread fork (#14248 ) ### Summary This PR adds first-class ephemeral support to thread/fork, bringing it in line with thread/start. The goal is to support one-off completions on full forked threads without persisting them as normal user-visible threads. ### Testing	2026-03-11 12:33:08 -07:00
Dylan Hurd	d5694529ca	app-server: propagate nested experimental gating for AskForApproval::Reject (#14191 ) ## Summary This change makes `AskForApproval::Reject` gate correctly anywhere it appears inside otherwise-stable app-server protocol types. Previously, experimental gating for `approval_policy: Reject` was handled with request-specific logic in `ClientRequest` detection. That covered a few request params types, but it did not generalize to other nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or `ConfigRequirements`. This PR replaces that ad hoc handling with a generic nested experimental propagation mechanism. ## Testing seeing this when run app-server-test-client without experimental api enabled: ``` initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" } > { > "id": "50244f6a-270a-425d-ace0-e9e98205bde7", > "method": "thread/start", > "params": { > "approvalPolicy": { > "reject": { > "mcp_elicitations": false, > "request_permissions": true, > "rules": false, > "sandbox_approval": true > } > }, > "baseInstructions": null, > "config": null, > "cwd": null, > "developerInstructions": null, > "dynamicTools": null, > "ephemeral": null, > "experimentalRawEvents": false, > "mockExperimentalField": null, > "model": null, > "modelProvider": null, > "persistExtendedHistory": false, > "personality": null, > "sandbox": null, > "serviceName": null > } > } < { < "error": { < "code": -32600, < "message": "askForApproval.reject requires experimentalApi capability" < }, < "id": "50244f6a-270a-425d-ace0-e9e98205bde7" < } [verified] thread/start rejected approvalPolicy=Reject without experimentalApi ``` --------- Co-authored-by: celia-oai <celia@openai.com>	2026-03-11 12:33:08 -07:00
xl-openai	d751e68f44	feat: Allow sync with remote plugin status. (#14176 ) Add forceRemoteSync to plugin/list. When it is set to True, we will sync the local plugin status with the remote one (backend-api/plugins/list).	2026-03-11 12:33:08 -07:00
guinness-oai	4ac6042850	Mark incomplete resumed turns interrupted when idle (#14125 ) Fixes a Codex app bug where quitting the app mid-run could leave the reopened thread stuck in progress and non-interactable. On cold thread resume, app-server could return an idle thread with a replayed turn still marked in progress. This marks incomplete replayed turns as interrupted unless the thread is actually active.	2026-03-11 12:33:07 -07:00
Eric Traut	f9cba5cb16	Log ChatGPT user ID for feedback tags (#13901 ) There are some bug investigations that currently require us to ask users for their user ID even though they've already uploaded logs and session details via `/feedback`. This frustrates users and increases the time for diagnosis. This PR includes the ChatGPT user ID in the metadata uploaded for `/feedback` (both the TUI and app-server).	2026-03-11 12:33:07 -07:00
Ahmed Ibrahim	f3f47cf455	Stabilize app-server notify initialize test (#13939 ) ## What changed - This PR changes only the flaky test setup for `turn_start_notify_payload_includes_initialize_client_name`. - Instead of shelling out to `python3` to write the notify payload, the test uses the first-party `codex-app-server-test-notify-capture` helper. - The helper writes `notify.json` atomically and the test waits for the file to exist before reading it. ## Why this fixes the flake - The old test depended on an external Python interpreter being present and behaving consistently on every CI runner. - It also raced the file write: the test could observe the path before the payload had been fully written, which produced partial reads and intermittent assertion failures. - Moving the write into a repo-owned helper removes the external dependency, and atomic write-plus-wait makes the handoff deterministic. ## Scope - Test-only change.	2026-03-09 23:41:58 -07:00
Ahmed Ibrahim	b39ae9501f	Stabilize websocket test server binding (#14002 ) ## Summary - stop reserving a localhost port in the websocket tests before spawning the server - let the app-server bind `127.0.0.1:0` itself and read back the actual bound websocket address from stderr - update the websocket test helpers and callers to use the discovered address ## Why this fixes the flake The previous harness reserved a port in the test process, dropped it, and then asked the server process to bind that same address. On busy runners there is a race between releasing the reservation and the child process rebinding it, which can produce sporadic startup failures. Binding to port `0` inside the server removes that race entirely, and waiting for the server to report the real bound address makes the tests connect only after the listener is actually ready.	2026-03-09 23:39:56 -07:00
Ahmed Ibrahim	2e24be2134	Use realtime transcript for handoff context (#14132 ) - collect input/output transcript deltas into active handoff transcript state - attach and clear that transcript on each handoff, and regenerate schema/tests	2026-03-09 22:30:03 -07:00
Channing Conger	c6343e0649	Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296 ) ### Purpose While trying to build out CLI-Tools for the agent to use under skills we have found that those tools sometimes need to invoke a user elicitation. These elicitations are handled out of band of the codex app-server but need to indicate to the exec manager that the command running is not going to progress on the usual timeout horizon. ### Example Model calls universal exec: `$ download-credit-card-history --start-date 2026-01-19 --end-date 2026-02-19 > credit_history.jsonl` download-cred-card-history might hit a hosted/preauthenticated service to fetch data. That service might decide that the request requires an end user approval the access to the personal data. It should be able to signal to the running thread that the command in question is blocked on user elicitation. In that case we want the exec to continue, but the timeout to not expire on the tool call, essentially freezing time until the user approves or rejects the command at which point the tool would signal the app-server to decrement the outstanding elicitation count. Now timeouts would proceed as normal. ### What's Added - New v2 RPC methods: - thread/increment_elicitation - thread/decrement_elicitation - Protocol updates in: - codex-rs/app-server-protocol/src/protocol/common.rs - codex-rs/app-server-protocol/src/protocol/v2.rs - App-server handlers wired in: - codex-rs/app-server/src/codex_message_processor.rs ### Behavior - Counter starts at 0 per thread. - increment atomically increases the counter. - decrement atomically decreases the counter; decrement at 0 returns invalid request. - Transition rules: - 0 -> 1: broadcast pause state, pausing all active stopwatches immediately. - \>0 -> >0: remain paused. - 1 -> 0: broadcast unpause state, resuming stopwatches. - Core thread/session logic: - codex-rs/core/src/codex_thread.rs - codex-rs/core/src/codex.rs - codex-rs/core/src/mcp_connection_manager.rs ### Exec-server stopwatch integration - Added centralized stopwatch tracking/controller: - codex-rs/exec-server/src/posix/stopwatch_controller.rs - Hooked pause/unpause broadcast handling + stopwatch registration: - codex-rs/exec-server/src/posix/mcp.rs - codex-rs/exec-server/src/posix/stopwatch.rs - codex-rs/exec-server/src/posix.rs	2026-03-09 22:29:26 -07:00
Matthew Zeng	566e4cee4b	[apps] Fix apps enablement condition. (#14011 ) - [x] Fix apps enablement condition to check both the feature flag and that the user is not an API key user.	2026-03-09 22:25:43 -07:00
xl-openai	0c33af7746	feat: support disabling bundled system skills (#13792 ) Support disable bundled system skills with a config: [skills.bundled] enabled = false	2026-03-09 22:02:53 -07:00
pakrym-oai	d71e042694	Enforce single tool output type in codex handlers (#14157 ) We'll need to associate output schema with each tool. Each tool can only have on output type.	2026-03-09 21:49:44 -07:00
Andrei Eternal	244b2d53f4	start of hooks engine (#13276 ) (Experimental) This PR adds a first MVP for hooks, with SessionStart and Stop The core design is: - hooks live in a dedicated engine under codex-rs/hooks - each hook type has its own event-specific file - hook execution is synchronous and blocks normal turn progression while running - matching hooks run in parallel, then their results are aggregated into a normalized HookRunSummary On the AppServer side, hooks are exposed as operational metadata rather than transcript-native items: - new live notifications: hook/started, hook/completed - persisted/replayed hook results live on Turn.hookRuns - we intentionally did not add hook-specific ThreadItem variants Hooks messages are not persisted, they remain ephemeral. The context changes they add are (they get appended to the user's prompt)	2026-03-10 04:11:31 +00:00
viyatb-oai	b0cbc25a48	fix(protocol): preserve legacy workspace-write semantics (#13957 ) ## Summary This is a fast follow to the initial `[permissions]` structure. - keep the new split-policy carveout behavior for narrower non-write entries under broader writable roots - preserve legacy `WorkspaceWrite` semantics by using a cwd-aware bridge that drops only redundant nested readable roots when projecting from `SandboxPolicy` - route the legacy macOS seatbelt adapter through that same legacy bridge so redundant nested readable roots do not become read-only carveouts on macOS - derive the legacy bridge for `command_exec` using the sandbox root cwd rather than the request cwd so policy derivation matches later sandbox enforcement - add regression coverage for the legacy macOS nested-readable-root case ## Examples ### Legacy `workspace-write` on macOS A legacy `workspace-write` policy can redundantly list a nested readable root under an already-writable workspace root. For example, legacy config can effectively mean: - workspace root (`.` / `cwd`) is writable - `docs/` is also listed in `readable_roots` The new shared split-policy helper intentionally treats a narrower non-write entry under a broader writable root as a carveout for real `[permissions]` configs. Without this fast follow, the unchanged macOS seatbelt legacy adapter could project that legacy shape into a `FileSystemSandboxPolicy` that treated `docs/` like a read-only carveout under the writable workspace root. In practice, legacy callers on macOS could unexpectedly lose write access inside `docs/`, even though that path was writable before the `[permissions]` migration work. This change fixes that by routing the legacy seatbelt path through the cwd-aware legacy bridge, so: - legacy `workspace-write` keeps `docs/` writable when `docs/` was only a redundant readable root - explicit `[permissions]` entries like `'.' = 'write'` and `'docs' = 'read'` still make `docs/` read-only, which is the new intended split-policy behavior ### Legacy `command_exec` with a subdirectory cwd `command_exec` can run a command from a request cwd that is narrower than the sandbox root cwd. For example: - sandbox root cwd is `/repo` - request cwd is `/repo/subdir` - legacy policy is still `workspace-write` rooted at `/repo` Before this fast follow, `command_exec` derived the legacy bridge using the request cwd, but the sandbox was later built using the sandbox root cwd. That mismatch could miss redundant legacy readable roots during projection and accidentally reintroduce read-only carveouts for paths that should still be writable under the legacy model. This change fixes that by deriving the legacy bridge with the same sandbox root cwd that sandbox enforcement later uses. ## Verification - `just fmt` - `cargo test -p codex-core seatbelt_legacy_workspace_write_nested_readable_root_stays_writable` - `cargo test -p codex-core test_sandbox_config_parsing` - `cargo clippy -p codex-core -p codex-app-server --all-targets -- -D warnings` - `cargo clean`	2026-03-09 18:43:27 -07:00
Dylan Hurd	6da84efed8	feat(approvals) RejectConfig for request_permissions (#14118 ) ## Summary We need to support allowing request_permissions calls when using `Reject` policy <img width="1133" height="588" alt="Screenshot 2026-03-09 at 12 06 40 PM" src="https://github.com/user-attachments/assets/a8df987f-c225-4866-b8ab-5590960daec5" /> Note that this is a backwards-incompatible change for Reject policy. I'm not sure if we need to add a default based on our current use/setup ## Testing - [x] Added tests - [x] Tested locally	2026-03-09 18:16:54 -07:00
Max Johnson	66e71cce11	codex-rs/app-server: add health endpoints for --listen websocket server (#13782 ) Healthcheck endpoints for the websocket server - serve `GET /readyz` and `GET /healthz` from the same listener used for `--listen ws://...` - switch the websocket listener over to `axum` upgrade handling instead of manual socket parsing - add websocket transport coverage for the health endpoints and document the new behavior Testing - integration tests - built and tested e2e ``` > curl -i http://127.0.0.1:9234/readyz HTTP/1.1 200 OK content-length: 0 date: Fri, 06 Mar 2026 19:20:23 GMT > curl -i http://127.0.0.1:9234/healthz HTTP/1.1 200 OK content-length: 0 date: Fri, 06 Mar 2026 19:20:24 GMT ```	2026-03-09 22:11:30 +00:00
Dylan Hurd	d241dc598c	feat(core) Persist request_permission data across turns (#14009 ) ## Summary request_permissions flows should support persisting results for the session. Open Question: Still deciding if we need within-turn approvals - this adds complexity but I could see it being useful ## Testing - [x] Updated unit tests --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-09 14:36:38 -07:00
sayan-oai	6ad448b658	chore: plugin/uninstall endpoint (#14111 ) add `plugin/uninstall` app-server endpoint to fully rm plugin from plugins cache dir and rm entry from user config file. plugin-enablement is session-scoped, so uninstalls are only picked up in new sessions (like installs). added tests.	2026-03-09 12:40:25 -07:00
xl-openai	c1f3ef16ec	fix(plugin): Also load curated plugins for TUI. (#14050 ) Also run maybe_start_curated_repo_sync_for_config at TUI start time.	2026-03-09 11:05:02 -07:00
Ahmed Ibrahim	10bf6008f4	Stabilize thread resume replay tests (#13885 ) ## What changed - The thread-resume replay tests now use unchecked mock sequencing so the replay flow can complete before the test asserts. - They also poll outbound `/responses` request counts and fail immediately if replay emits an unexpected extra request. ## Why this fixes the flake - The previous version asserted while the replay machinery was still mid-flight, so the test was sometimes checking an intermediate state instead of the completed behavior. - Strict mock sequencing made that problem worse by forcing the test to care about exact sub-step timing rather than about the end result. - Letting replay settle and then asserting on stabilized request counts makes the test validate the real contract: the replay path finishes and does not send extra model requests. ## Scope - Test-only change.	2026-03-09 10:41:23 -07:00
Ahmed Ibrahim	0dc242a672	Order websocket initialize after handshake (#13943 ) ## What changed - `app-server` now sends initialize notifications to the specific websocket connection before that connection is marked outbound-ready. - `message_processor` now exposes the forwarding hook needed to target that initialize delivery path. ## Why this fixes the flake - This was a real websocket ordering bug. - The old code allowed “connection is ready for outbound broadcasts” to become true before the initialize notification had been routed to the intended client. - On CI this showed up as a race where tests would occasionally miss or misorder initialize delivery depending on scheduler timing. - Sending initialize to the exact connection first, then exposing it to the general outbound path, removes that race instead of hiding it with timing slack. ## Scope - Production logic change.	2026-03-09 10:27:19 -07:00
Ahmed Ibrahim	6b68d1ef66	Stabilize plan item app-server tests (#14058 ) ## What changed - run the two plan-mode app-server tests on a multi-thread Tokio runtime instead of the default single-thread test runtime - stop relying on wiremock teardown expectations for `/responses` and explicitly wait for the expected request count after the turn completes ## Why this fixes the flake - this failure was showing up on Windows ARM as a late wiremock panic saying the mock server saw zero `/responses` calls, but the real issue was that the test could stall around app-server startup and only fail during teardown - moving these tests to the same multi-thread runtime used by the other collaboration-mode app-server tests removes that startup scheduling race - asserting the `/responses` count directly makes the test deterministic: we now wait for the real POST instead of depending on a drop-time verification that can hide the underlying timing issue ## Scope - test-only change; no production logic changes	2026-03-09 10:24:18 -07:00
Ahmed Ibrahim	615ed0e437	Stabilize zsh fork app-server tests (#13872 ) ## What changed - `turn_start_shell_zsh_fork_executes_command_v2` now keeps the shell command alive with a file marker until the interrupt arrives instead of using a command that can finish too quickly. - `turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2` now waits for `turn/completed` before sending a fallback interrupt and accepts the real terminal outcomes observed across platforms. ## Why this fixes the flake - The original tests assumed a narrow ordering window: the child command would still be running when the interrupt happened, and completion would always arrive in one specific order. - In CI, especially across different shells and runner speeds, those assumptions break. Sometimes the child finishes before the interrupt; sometimes the protocol completes while the fallback path is still arming itself. - Holding the command open until the interrupt and waiting for the explicit protocol completion event makes the tests synchronize on the behavior under test instead of on wall-clock timing. ## Scope - Test-only change.	2026-03-09 09:38:16 -07:00
Ahmed Ibrahim	3f1280ce1c	Reduce app-server test timeout pressure (#13884 ) ## What changed - The auth/account/fuzzy-file-search test configs disable unrelated `shell_snapshot` setup. - The fuzzy-file-search fixture set was reduced so the stop-updates test does less incidental work before reaching the assertion. ## Why this fixes the flake - These failures were caused by cumulative timeout pressure, not by a missing product-level delay. - The old tests were paying for shell snapshot initialization and extra fixture volume that were not part of the behavior being validated. - Removing that incidental work keeps the same coverage but shortens the critical path enough that the tests finish comfortably inside the existing timeout budget, which is the right fix versus simply extending the timeout. ## Scope - Test-only change.	2026-03-09 09:37:41 -07:00
Ahmed Ibrahim	2bc3e52a91	Stabilize app list update ordering test (#14052 ) ## Summary - make `list_apps_waits_for_accessible_data_before_emitting_directory_updates` accept the two valid notification paths the server can emit - keep rejecting the real bug this test is meant to catch: a directory-only `app/list/updated` notification before accessible app data is available ## Why this fixes the flake The old test used a fixed `150ms` silence window and assumed the first notification after that window had to be the fully merged final update. In CI, scheduling occasionally lets accessible app data arrive before directory data, so the first valid notification can be an accessible-only interim update. That made the test fail even though the server behavior was correct. This change makes the test deterministic by reading notifications until the final merged payload arrives. Any interim update is only accepted if it contains accessible apps only; if the server ever emits inaccessible directory data before accessible data is ready, the test still fails immediately. ## Change type - test-only; no production app-list logic changes	2026-03-09 00:16:13 -07:00
Jack Mousseau	e6b93841c5	Add request permissions tool (#13092 ) Adds a built-in `request_permissions` tool and wires it through the Codex core, protocol, and app-server layers so a running turn can ask the client for additional permissions instead of relying on a static session policy. The new flow emits a `RequestPermissions` event from core, tracks the pending request by call ID, forwards it through app-server v2 as an `item/permissions/requestApproval` request, and resumes the tool call once the client returns an approved subset of the requested permission profile.	2026-03-08 20:23:06 -07:00
Celia Chen	340f9c9ecb	app-server: include experimental skill metadata in exec approval requests (#13929 ) ## Summary This change surfaces skill metadata on command approval requests so app-server clients can tell when an approval came from a skill script and identify the originating `SKILL.md`. - add `skill_metadata` to exec approval events in the shared protocol - thread skill metadata through core shell escalation and delegated approval handling for skill-triggered approvals - expose the field in app-server v2 as experimental `skillMetadata` - regenerate the JSON/TypeScript schemas and cover the new field in protocol, transport, core, and TUI tests ## Why Skill-triggered approvals already carry skill context inside core, but app-server clients could not see which skill caused the prompt. Sending the skill metadata with the approval request makes it possible for clients to present better approval UX and connect the prompt back to the relevant skill definition. ## example event in app-server-v2 verified that we see this event when experimental api is on: ``` < { < "id": 11, < "method": "item/commandExecution/requestApproval", < "params": { < "additionalPermissions": { < "fileSystem": null, < "macos": { < "accessibility": false, < "automations": { < "bundle_ids": [ < "com.apple.Notes" < ] < }, < "calendar": false, < "preferences": "read_only" < }, < "network": null < }, < "approvalId": "25d600ee-5a3c-4746-8d17-e2e61fb4c563", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes", < "itemId": "call_jZp3xFpNg4D8iKAD49cvEvZy", < "skillMetadata": { < "pathToSkillsMd": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/SKILL.md" < }, < "threadId": "019ccc10-b7d3-7ff2-84fe-3a75e7681e69", < "turnId": "019ccc10-b848-76f1-81b3-4a1fa225493f" < } < }` ``` & verified that this is the event when experimental api is off: ``` < { < "id": 13, < "method": "item/commandExecution/requestApproval", < "params": { < "approvalId": "5fbbf776-261b-4cf8-899b-c125b547f2c0", < "availableDecisions": [ < "accept", < "acceptForSession", < "cancel" < ], < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "commandActions": [ < { < "command": "/Applications/ChatGPT.app/Contents/Resources/CodexAppServer_CodexAppServerBundledSkills.bundle/Contents/Resources/skills/apple-notes/scripts/notes_info", < "type": "unknown" < } < ], < "cwd": "/Users/celia/code/codex/codex-rs", < "itemId": "call_OV2DHzTgYcbYtWaTTBWlocOt", < "threadId": "019ccc16-2a2b-7be1-8500-e00d45b892d4", < "turnId": "019ccc16-2a8e-7961-98ec-649600e7d06a" < } < } ```	2026-03-08 18:07:46 -07:00
Eric Traut	da3689f0ef	Add in-process app server and wire up exec to use it (#14005 ) This is a subset of PR #13636. See that PR for a full overview of the architectural change. This PR implements the in-process app server and modifies the non-interactive "exec" entry point to use the app server. --------- Co-authored-by: Felipe Coury <felipe.coury@gmail.com>	2026-03-08 18:43:55 -06:00
Matthew Zeng	a684a36091	[app-server] Support hot-reload user config when batch writing config. (#13839 ) - [x] Support hot-reload user config when batch writing config.	2026-03-08 17:38:01 -07:00
Charley Cunningham	7ba1fccfc1	fix(ci): restore guardian coverage and bazel unit tests (#13912 ) ## Summary - restore the guardian review request snapshot test and its tracked snapshot after it was dropped from `main` - make Bazel Rust unit-test wrappers resolve runfiles correctly on manifest-only platforms like macOS and point Insta at the real workspace root - harden the shell-escalation socket-closure assertion so the musl Bazel test no longer depends on fd reuse behavior ## Verification - cargo test -p codex-core guardian_review_request_layout_matches_model_visible_request_snapshot - cargo test -p codex-shell-escalation - bazel test //codex-rs/exec:exec-unit-tests //codex-rs/shell-escalation:shell-escalation-unit-tests Supersedes #13894. --------- Co-authored-by: Ahmed Ibrahim <aibrahim@openai.com> Co-authored-by: viyatb-oai <viyatb@openai.com> Co-authored-by: Codex <noreply@openai.com>	2026-03-08 12:05:19 -07:00
Ahmed Ibrahim	dc19e78962	Stabilize abort task follow-up handling (#13874 ) - production logic plus tests; cancel running tasks before clearing pending turn state - suppress follow-up model requests after cancellation and assert on stabilized request counts instead of fixed sleeps	2026-03-07 22:56:00 -08:00
jif-oai	cf143bf71e	feat: simplify DB further (#13771 )	2026-03-07 03:48:36 -08:00
iceweasel-oai	4b4f61d379	app-server: require absolute cwd for windowsSandbox/setupStart (#13833 ) ## Summary - require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf - reject relative cwd values at request parsing instead of normalizing them later in the setup flow - add RPC-layer coverage for relative cwd rejection and update the checked-in protocol schemas/docs ## Why windowsSandbox/setupStart was carrying the client-provided cwd as a raw PathBuf for command_cwd while config derivation normalized the same value into an absolute policy_cwd. That left room for relative-path ambiguity in the setup path, especially for inputs like cwd: "repo". Making the RPC accept only absolute paths removes that split entirely: the handler now receives one already-validated absolute path and uses it for both config derivation and setup. This keeps the trust model unchanged. Trusted clients could already choose the session cwd; this change is only about making the setup RPC reject relative paths so command_cwd and policy_cwd cannot diverge. ## Testing - cargo test -p codex-app-server windows_sandbox_setup (run locally by user) - cargo test -p codex-app-server-protocol windows_sandbox (run locally by user)	2026-03-06 22:47:08 -08:00
Michael Bolin	22ac6b9aaa	sandboxing: plumb split sandbox policies through runtime (#13439 ) ## Why `#13434` introduces split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy`, but the runtime still made most execution-time sandbox decisions from the legacy `SandboxPolicy` projection. That projection loses information about combinations like unrestricted filesystem access with restricted network access. In practice, that means the runtime can choose the wrong platform sandbox behavior or set the wrong network-restriction environment for a command even when config has already separated those concerns. This PR carries the split policies through the runtime so sandbox selection, process spawning, and exec handling can consult the policy that actually matters. ## What changed - threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through `TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state, unified exec, and app-server exec overrides - updated sandbox selection in `core/src/sandboxing/mod.rs` and `core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus `NetworkSandboxPolicy`, rather than inferring behavior only from the legacy `SandboxPolicy` - updated process spawning in `core/src/spawn.rs` and the platform wrappers to use `NetworkSandboxPolicy` when deciding whether to set `CODEX_SANDBOX_NETWORK_DISABLED` - kept additional-permissions handling and legacy `ExternalSandbox` compatibility projections aligned with the split policies, including explicit user-shell execution and Windows restricted-token routing - updated callers across `core`, `app-server`, and `linux-sandbox` to pass the split policies explicitly ## Verification - added regression coverage in `core/tests/suite/user_shell_cmd.rs` to verify `RunUserShellCommand` does not inherit `CODEX_SANDBOX_NETWORK_DISABLED` from the active turn - added coverage in `core/src/exec.rs` for Windows restricted-token sandbox selection when the legacy projection is `ExternalSandbox` - updated Linux sandbox coverage in `linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy exec path - verified the current PR state with `just clippy` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439). * #13453 * #13452 * #13451 * #13449 * #13448 * #13445 * #13440 * __->__ #13439 --------- Co-authored-by: viyatb-oai <viyatb@openai.com>	2026-03-07 02:30:21 +00:00
viyatb-oai	25fa974166	fix: support managed network allowlist controls (#12752 ) ## Summary - treat `requirements.toml` `allowed_domains` and `denied_domains` as managed network baselines for the proxy - in restricted modes by default, build the effective runtime policy from the managed baseline plus user-configured allowlist and denylist entries, so common hosts can be pre-approved without blocking later user expansion - add `experimental_network.managed_allowed_domains_only = true` to pin the effective allowlist to managed entries, ignore user allowlist additions, and hard-deny non-managed domains without prompting - apply `managed_allowed_domains_only` anywhere managed network enforcement is active, including full access, while continuing to respect denied domains from all sources - add regression coverage for merged-baseline behavior, managed-only behavior, and full-access managed-only enforcement ## Behavior Assuming `requirements.toml` defines both `experimental_network.allowed_domains` and `experimental_network.denied_domains`. ### Default mode - By default, the effective allowlist is `experimental_network.allowed_domains` plus user or persisted allowlist additions. - By default, the effective denylist is `experimental_network.denied_domains` plus user or persisted denylist additions. - Allowlist misses can go through the network approval flow. - Explicit denylist hits and local or private-network blocks are still hard-denied. - When `experimental_network.managed_allowed_domains_only = true`, only managed `allowed_domains` are respected, user allowlist additions are ignored, and non-managed domains are hard-denied without prompting. - Denied domains continue to be respected from all sources. ### Full access - With managed requirements present, the effective allowlist is pinned to `experimental_network.allowed_domains`. - With managed requirements present, the effective denylist is pinned to `experimental_network.denied_domains`. - There is no allowlist-miss approval path in full access. - Explicit denylist hits are hard-denied. - `experimental_network.managed_allowed_domains_only = true` now also applies in full access, so managed-only behavior remains in effect anywhere managed network enforcement is active.	2026-03-06 17:52:54 -08:00
Ruslan Nigmatullin	e9bd8b20a1	app-server: Add streaming and tty/pty capabilities to `command/exec` (#13640 ) * Add an ability to stream stdin, stdout, and stderr * Streaming of stdout and stderr has a configurable cap for total amount of transmitted bytes (with an ability to disable it) * Add support for overriding environment variables * Add an ability to terminate running applications (using `command/exec/terminate`) * Add TTY/PTY support, with an ability to resize the terminal (using `command/exec/resize`)	2026-03-06 17:30:17 -08:00
Rohan Mehta	61098c7f51	Allow full web search tool config (#13675 ) Previously, we could only configure whether web search was on/off. This PR enables sending along a web search config, which includes all the stuff responsesapi supports: filters, location, etc.	2026-03-07 00:50:50 +00:00
xl-openai	0243734300	feat: Add curated plugin marketplace + Metadata Cleanup. (#13712 ) 1. Add a synced curated plugin marketplace and include it in marketplace discovery. 2. Expose optional plugin.json interface metadata in plugin/list 3. Tighten plugin and marketplace path handling using validated absolute paths. 4. Let manifests override skill, MCP, and app config paths. 5. Restrict plugin enablement/config loading to the user config layer so plugin enablement is at global level	2026-03-06 19:39:35 -05:00
sayan-oai	8a54d3caaa	feat: structured plugin parsing (#13711 ) #### What Add structured `@plugin` parsing and TUI support for plugin mentions. - Core: switch from plain-text `@display_name` parsing to structured `plugin://...` mentions via `UserInput::Mention` and `[$...](plugin://...)` links in text, same pattern as apps/skills. - TUI: add plugin mention popup, autocomplete, and chips when typing `$`. Load plugin capability summaries and feed them into the composer; plugin mentions appear alongside skills and apps. - Generalize mention parsing to a sigil parameter, still defaults to `$` <img width="797" height="119" alt="image" src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb" /> Builds on #13510. Currently clients have to build their own `id` via `plugin@marketplace` and filter plugins to show by `enabled`, but we will add `id` and `available` as fields returned from `plugin/list` soon. ####Tests Added tests, verified locally.	2026-03-06 11:08:36 -08:00
Ruslan Nigmatullin	51fcdc760d	app-server: Emit `thread/name/updated` event globally (#13674 )	2026-03-06 10:25:18 -08:00
Owen Lin	6c98a59dbd	fix(app-server): fix turn_start_shell_zsh_fork_executes_command_v2 flake (#13770 ) This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test. The interrupt path can race with the follow-up `/responses` request that reports the aborted tool call, so the test now allows that extra no-op response instead of assuming there will only ever be one request. The assertions still stay focused on the behavior the test actually cares about: starting the zsh-forked command correctly. Testing: - `just fmt` - `cargo test -p codex-app-server --test all suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2 -- --exact --nocapture`	2026-03-06 10:10:16 -08:00
jif-oai	fa16c26908	feat: drop sqlite db feature flag (#13750 )	2026-03-06 17:57:52 +01:00
Matthew Zeng	98dca99db7	[elicitations] Switch to use MCP style elicitation payload for mcp tool approvals. (#13621 ) - [x] Switch to use MCP style elicitation payload for mcp tool approvals. - [ ] TODO: Update the UI to support the full spec.	2026-03-06 01:50:26 -08:00
sayan-oai	014a59fb0b	check app auth in plugin/install (#13685 ) #### What on `plugin/install`, check if installed apps are already authed on chatgpt, and return list of all apps that are not. clients can use this list to trigger auth workflows as needed. checks are best effort based on `codex_apps` loading, much like `app/list`. #### Tests Added integration tests, tested locally.	2026-03-06 06:45:00 +00:00
viyatb-oai	6a79ed5920	refactor: remove proxy admin endpoint (#13687 ) ## Summary - delete the network proxy admin server and its runtime listener/task plumbing - remove the admin endpoint config, runtime, requirement, protocol, schema, and debug-surface fields - update proxy docs to reflect the remaining HTTP and SOCKS listeners only	2026-03-05 22:03:16 -08:00

1 2 3 4 5 ...

541 commits