core-agent-ide

Author	SHA1	Message	Date
sayan-oai	7b2cee53db	chore: wire through plugin policies + category from marketplace.json (#14305 ) wire plugin marketplace metadata through app-server endpoints: - `plugin/list` has `installPolicy` and `authPolicy` - `plugin/install` has plugin-level `authPolicy` `plugin/install` also now enforces `NOT_AVAILABLE` `installPolicy` when installing. added tests.	2026-03-11 12:33:10 -07:00
Owen Lin	fa1242c83b	fix(otel): make HTTP trace export survive app-server runtimes (#14300 ) ## Summary This PR fixes OTLP HTTP trace export in runtimes where the previous exporter setup was unreliable, especially around app-server usage. It also removes the old `codex_otel::otel_provider` compatibility shim and switches remaining call sites over to the crate-root `codex_otel::OtelProvider` export. ## What changed - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes. - Add an async HTTP client path for trace export when we are already inside a multi-thread Tokio runtime. - Make provider shutdown flush traces before tearing down the tracer provider. - Add loopback coverage that verifies traces are actually sent to `/v1/traces`: - outside Tokio - inside a multi-thread Tokio runtime - inside a current-thread Tokio runtime - Remove the `codex_otel::otel_provider` shim and update remaining imports. ## Why I hit cases where spans were being created correctly but never made it to the collector. The issue turned out to be in exporter/runtime behavior rather than the span plumbing itself. This PR narrows that gap and gives us regression coverage for the actual export path.	2026-03-11 12:33:10 -07:00
pakrym-oai	548583198a	Allow bool web_search in ToolsToml (#14352 ) Summary - add a custom deserializer so `[tools].web_search` can be a bool (treated as disabled) or a config object - extend core and app-server tests to cover bool handling in TOML config Testing - Not run (not requested)	2026-03-11 12:33:10 -07:00
Rasmus Rygaard	7f22329389	Revert "Pass more params to compaction" (#14298 )	2026-03-11 12:33:10 -07:00
Channing Conger	fd4a673525	Responses: set x-client-request-id as convesration_id when talking to responses (#14312 ) Right now we're sending the header session_id to responses which is ignored/dropped. This sets a useful x-client-request-id to the conversation_id.	2026-03-11 12:33:10 -07:00
Fouad Matin	f385199cc0	fix(arc_monitor): api path (#14290 ) This PR just fixes the API path for ARC monitor.	2026-03-11 12:33:10 -07:00
gabec-openai	180a5820fc	Add keyboard based fast switching between agents in TUI (#13923 )	2026-03-11 12:33:10 -07:00
pakrym-oai	12ee9eb6e0	Add snippets annotated with types to tools when code mode enabled (#14284 ) Main purpose is for code mode to understand the return type.	2026-03-11 12:33:09 -07:00
Ahmed Ibrahim	a4d884c767	Split spawn_csv from multi_agent (#14282 ) - make `spawn_csv` a standalone feature for CSV agent jobs - keep `spawn_csv -> multi_agent` one-way and preserve restricted subagent disable paths	2026-03-11 12:33:09 -07:00
Ahmed Ibrahim	39c1bc1c68	Add realtime start instructions config override (#14270 ) - add `realtime_start_instructions` config support - thread it into realtime context updates, schema, docs, and tests	2026-03-11 12:33:09 -07:00
pakrym-oai	31bf1dbe63	Make unified exec session_id numeric (#14279 ) It's a number on the write_stdin input, make it a number on the output and also internally.	2026-03-11 12:33:09 -07:00
pakrym-oai	01792a4c61	Prefix code mode output with success or failure message and include error stack (#14272 )	2026-03-11 12:33:09 -07:00
pash-openai	da74da6684	render local file links from target paths (#13857 ) Co-authored-by: Josh McKinney <joshka@openai.com>	2026-03-11 12:33:09 -07:00
Ahmed Ibrahim	c8446d7cf3	Stabilize websocket response.failed error delivery (#14017 ) ## What changed - Drop failed websocket connections immediately after a terminal stream error instead of awaiting a graceful close handshake before forwarding the error to the caller. - Keep the success path and the closed-connection guard behavior unchanged. ## Why this fixes the flake - The failing integration test waits for the second websocket stream to surface the model error before issuing a follow-up request. - On slower runners, the old error path awaited `ws_stream.close().await` before sending the error downstream. If that close handshake stalled, the test kept waiting for an error that had already happened server-side and nextest timed it out. - Dropping the failed websocket immediately makes the terminal error observable right away and marks the session closed so the next request reconnects cleanly instead of depending on a best-effort close handshake. ## Code or test? - This is a production logic fix in `codex-api`. The existing websocket integration test already exercises the regression path.	2026-03-11 12:33:09 -07:00
Ahmed Ibrahim	285b3a5143	Show spawned agent model and effort in TUI (#14273 ) - include the requested sub-agent model and reasoning effort in the spawn begin event\n- render that metadata next to the spawned agent name and role in the TUI transcript --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:09 -07:00
pakrym-oai	8a099b3dfb	Rename code mode tool to exec (#14254 ) Summary - update the code-mode handler, runner, instructions, and error text to refer to the `exec` tool name everywhere that used to say `code_mode` - ensure generated documentation strings and tool specs describe `exec` and rely on the shared `PUBLIC_TOOL_NAME` - refresh the suite tests so they invoke `exec` instead of the old name Testing - Not run (not requested)	2026-03-11 12:33:09 -07:00
maja-openai	e77b2fd925	prompt changes to guardian (#14263 ) ## Summary - update the guardian prompting - clarify the guardian rejection message so an action may still proceed if the user explicitly approves it after being informed of the risk ## Testing - cargo run on selected examples	2026-03-11 12:33:09 -07:00
Ahmed Ibrahim	9b5078d3e8	Stabilize pipe process stdin round-trip test (#14013 ) ## What changed - keep the explicit stdin-close behavior after writing so the child still receives EOF deterministically - on Windows, stop using `python -c` for the round-trip assertion and instead run a native `cmd.exe` pipeline that reads one line from stdin with `set /p` and echoes it back - send ` ` on Windows so the stdin payload matches the platform-native line ending the shell reader expects ## Why this fixes flakiness The failing branch-local flake was not in `spawn_pipe_process` itself. The child exited cleanly, but the Windows ARM runner sometimes produced an empty stdout string when the test used Python as the stdin consumer. That makes the test sensitive to Python startup and stdin-close timing rather than the pipe primitive we actually want to validate. Switching the Windows path to a native `cmd.exe` reader keeps the assertion focused on our pipe behavior: bytes written to stdin should come back on stdout before EOF closes the process. The explicit ` ` write removes line-ending ambiguity on Windows. ## Scope - test-only - no production logic change	2026-03-11 12:33:09 -07:00
Celia Chen	c1a424691f	chore: add a separate reject-policy flag for skill approvals (#14271 ) ## Summary - add `skill_approval` to `RejectConfig` and the app-server v2 `AskForApproval::Reject` payload so skill-script prompts can be configured independently from sandbox and rule-based prompts - update Unix shell escalation to reject prompts based on the actual decision source, keeping prefix rules tied to `rules`, unmatched command fallbacks tied to `sandbox_approval`, and skill scripts tied to `skill_approval` - regenerate the affected protocol/config schemas and expand unit/integration coverage for the new flag and skill approval behavior	2026-03-11 12:33:09 -07:00
pakrym-oai	83b22bb612	Add store/load support for code mode (#14259 ) adds support for transferring state across code mode invocations.	2026-03-11 12:33:09 -07:00
Rasmus Rygaard	2621ba17e3	Pass more params to compaction (#14247 ) Pass more params to /compact. This should give us parity with the /responses endpoint to improve caching. I'm torn about the MCP await. Blocking will give us parity but it seems like we explicitly don't block on MCPs. Happy either way	2026-03-11 12:33:09 -07:00
Leo Shimonaka	889b4796fc	feat: Add additional macOS Sandbox Permissions for Launch Services, Contacts, Reminders (#14155 ) Add additional macOS Sandbox Permissions levers for the following: - Launch Services - Contacts - Reminders	2026-03-11 12:33:09 -07:00
joeytrasatti-openai	8ac27b2a16	Add ephemeral flag support to thread fork (#14248 ) ### Summary This PR adds first-class ephemeral support to thread/fork, bringing it in line with thread/start. The goal is to support one-off completions on full forked threads without persisting them as normal user-visible threads. ### Testing	2026-03-11 12:33:08 -07:00
pakrym-oai	07c22d20f6	Add code_mode output helpers for text and images (#14244 ) Summary - document how code-mode can import `output_text`/`output_image` and ensure `add_content` stays compatible - add a synthetic `@openai/code_mode` module that appends content items and validates inputs - cover the new behavior with integration tests for structured text and image outputs Testing - Not run (not requested)	2026-03-11 12:33:08 -07:00
Ahmed Ibrahim	ce1d9abf11	Clarify close_agent tool description (#14269 ) - clarify the `close_agent` tool description so it nudges models to close agents they no longer need - keep the change scoped to the tool spec text only Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:08 -07:00
gabec-openai	a67660da2d	Load agent metadata from role files (#14177 )	2026-03-11 12:33:08 -07:00
pakrym-oai	3d41ff0b77	Add model-controlled truncation for code mode results (#14258 ) Summary - document that `@openai/code_mode` exposes `set_max_output_tokens_per_exec_call` and that `code_mode` truncates the final Rust-side output when the budget is exceeded - enforce the configured budget in the Rust tool runner, reusing truncation helpers so text-only outputs follow the unified-exec wrapper and mixed outputs still fit within the limit - ensure the new behavior is covered by a code-mode integration test and string spec update Testing - Not run (not requested)	2026-03-11 12:33:08 -07:00
pakrym-oai	ee8f84153e	Add output schema to MCP tools and expose MCP tool results in code mode (#14236 ) Summary - drop `McpToolOutput` in favor of `CallToolResult`, moving its helpers to keep MCP tooling focused on the final result shape - wire the new schema definitions through code mode, context, handlers, and spec modules so MCP tools serialize the exact output shape expected by the model - extend code mode tests to cover multiple MCP call scenarios and ensure the serialized data matches the new schema - refresh JS runner helpers and protocol models alongside the schema changes Testing - Not run (not requested)	2026-03-11 12:33:08 -07:00
Dylan Hurd	d5694529ca	app-server: propagate nested experimental gating for AskForApproval::Reject (#14191 ) ## Summary This change makes `AskForApproval::Reject` gate correctly anywhere it appears inside otherwise-stable app-server protocol types. Previously, experimental gating for `approval_policy: Reject` was handled with request-specific logic in `ClientRequest` detection. That covered a few request params types, but it did not generalize to other nested uses such as `ProfileV2`, `Config`, `ConfigReadResponse`, or `ConfigRequirements`. This PR replaces that ad hoc handling with a generic nested experimental propagation mechanism. ## Testing seeing this when run app-server-test-client without experimental api enabled: ``` initialize response: InitializeResponse { user_agent: "codex-toy-app-server/0.0.0 (Mac OS 26.3.1; arm64) vscode/2.4.36 (codex-toy-app-server; 0.0.0)" } > { > "id": "50244f6a-270a-425d-ace0-e9e98205bde7", > "method": "thread/start", > "params": { > "approvalPolicy": { > "reject": { > "mcp_elicitations": false, > "request_permissions": true, > "rules": false, > "sandbox_approval": true > } > }, > "baseInstructions": null, > "config": null, > "cwd": null, > "developerInstructions": null, > "dynamicTools": null, > "ephemeral": null, > "experimentalRawEvents": false, > "mockExperimentalField": null, > "model": null, > "modelProvider": null, > "persistExtendedHistory": false, > "personality": null, > "sandbox": null, > "serviceName": null > } > } < { < "error": { < "code": -32600, < "message": "askForApproval.reject requires experimentalApi capability" < }, < "id": "50244f6a-270a-425d-ace0-e9e98205bde7" < } [verified] thread/start rejected approvalPolicy=Reject without experimentalApi ``` --------- Co-authored-by: celia-oai <celia@openai.com>	2026-03-11 12:33:08 -07:00
Won Park	722e8f08e1	unifying all image saves to /tmp to bug-proof (#14149 ) image-gen feature will have the model saving to /tmp by default + at all times	2026-03-11 12:33:08 -07:00
Ahmed Ibrahim	91ca20c7c3	Add spawn_agent model overrides (#14160 ) - add `model` and `reasoning_effort` to the `spawn_agent` schema so the values pass through - validate requested models against `model.model` and only check that the selected model supports the requested reasoning effort --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-11 12:33:08 -07:00
alexsong-oai	3d4628c9c4	Add granular metrics for cloud requirements load (#14108 )	2026-03-11 12:33:08 -07:00
xl-openai	d751e68f44	feat: Allow sync with remote plugin status. (#14176 ) Add forceRemoteSync to plugin/list. When it is set to True, we will sync the local plugin status with the remote one (backend-api/plugins/list).	2026-03-11 12:33:08 -07:00
Matthew Zeng	f2d66fadd8	add(core): arc_monitor (#13936 ) ## Summary - add ARC monitor support for MCP tool calls by serializing MCP approval requests into the ARC action shape and sending the relevant conversation/policy context to the `/api/codex/safety/arc` endpoint - route ARC outcomes back into MCP approval flow so `ask-user` falls back to a user prompt and `steer-model` blocks the tool call, with guardian/ARC tests covering the new request shape - update the TUI approval copy from “Approve Once” to “Allow” / “Allow for this session” and refresh the related snapshots --------- Co-authored-by: Fouad Matin <fouad@openai.com> Co-authored-by: Fouad Matin <169186268+fouad-openai@users.noreply.github.com>	2026-03-11 12:33:08 -07:00
Charlie Guo	b7f8e9195a	Add OpenAI Docs skill (#13596 ) ## Summary - add the OpenAI Docs skill under codex-rs/skills/src/assets/samples/openai-docs - include the skill metadata, assets, and GPT-5.4 upgrade reference files - exclude the test harness and test fixtures ## Testing - not run (skill-only asset copy)	2026-03-11 12:33:08 -07:00
Eugene Brevdo	3b1c78a5c5	[skill-creator] Add forward-testing instructions (#13600 ) This updates the `skill-creator` sample skill to explicitly cover forward-testing as part of the skill authoring workflow. The guidance now treats subagent-based validation as a first-class step for complex or fragile skills, with an emphasis on preserving evaluation integrity and avoiding leaked context. The sample initialization script is also updated so newly created skills point authors toward forward-testing after validation. Together, these changes make the sample more opinionated about how skills should be iterated on once the initial implementation is complete. - Add new guidance to `SKILL.md` on protecting validation integrity, when to use subagents for forward-testing, and how to structure realistic test prompts without leaking expected answers. - Expand the skill creation workflow so iteration explicitly includes forward-testing for complex skills, including approval guidance for expensive or risky validation runs.	2026-03-11 12:33:08 -07:00
guinness-oai	4ac6042850	Mark incomplete resumed turns interrupted when idle (#14125 ) Fixes a Codex app bug where quitting the app mid-run could leave the reopened thread stuck in progress and non-interactable. On cold thread resume, app-server could return an idle thread with a replayed turn still marked in progress. This marks incomplete replayed turns as interrupted unless the thread is actually active.	2026-03-11 12:33:07 -07:00
pakrym-oai	c4d35084f5	Reuse McpToolOutput in McpHandler (#14229 ) We already have a type to represent the MCP tool output, reuse it instead of the custom McpHandlerOutput	2026-03-11 12:33:07 -07:00
Ahmed Ibrahim	52a7f4b68b	Stabilize split PTY output on Windows (#14003 ) ## Summary - run the split stdout/stderr PTY test through the normal shell helper on every platform - use a Windows-native command string instead of depending on Python to emit split streams - assert CRLF line endings on Windows explicitly ## Why this fixes the flake The earlier PTY split-output test used a Python one-liner on Windows while the rest of the file exercised shell-command behavior. That made the test depend on runner-local Python availability and masked the real Windows shell output shape. Using a native cmd-compatible command and asserting the actual CRLF output makes the split stdout/stderr coverage deterministic on Windows runners.	2026-03-11 12:33:07 -07:00
pakrym-oai	00ea8aa7ee	Expose strongly-typed result for exec_command (#14183 ) Summary - document output types for the various tool handlers and registry so the API exposes richer descriptions - update unified execution helpers and client tests to align with the new output metadata - clean up unused helpers across tool dispatch paths Testing - Not run (not requested)	2026-03-11 12:33:07 -07:00
Eric Traut	f9cba5cb16	Log ChatGPT user ID for feedback tags (#13901 ) There are some bug investigations that currently require us to ask users for their user ID even though they've already uploaded logs and session details via `/feedback`. This frustrates users and increases the time for diagnosis. This PR includes the ChatGPT user ID in the metadata uploaded for `/feedback` (both the TUI and app-server).	2026-03-11 12:33:07 -07:00
Eric Traut	026cfde023	Fix Linux tmux segfault in user shell lookup (#13900 ) Replace the Unix shell lookup path in `codex-rs/core/src/shell.rs` to use `libc::getpwuid_r()` instead of `libc::getpwuid()` when resolving the current user's shell. Why: - `getpwuid()` can return pointers into libc-managed shared storage - on the musl static Linux build, concurrent callers can race on that storage - this matches the crash pattern reported in tmux/Linux sessions with parallel shell activity Refs: - Fixes #13842	2026-03-11 12:33:07 -07:00
Eric Traut	7144f84c69	Fix release-mode integration test compiler failure (#13603 ) Addresses #13586 This doesn't affect our CI scripts. It was user-reported. Summary - add `wiremock::ResponseTemplate` and `body_string_contains` imports behind `#[cfg(not(debug_assertions))]` in `codex-rs/core/tests/suite/view_image.rs` so release builds only pull the helpers they actually use	2026-03-11 12:33:07 -07:00
Ahmed Ibrahim	f3f47cf455	Stabilize app-server notify initialize test (#13939 ) ## What changed - This PR changes only the flaky test setup for `turn_start_notify_payload_includes_initialize_client_name`. - Instead of shelling out to `python3` to write the notify payload, the test uses the first-party `codex-app-server-test-notify-capture` helper. - The helper writes `notify.json` atomically and the test waits for the file to exist before reading it. ## Why this fixes the flake - The old test depended on an external Python interpreter being present and behaving consistently on every CI runner. - It also raced the file write: the test could observe the path before the payload had been fully written, which produced partial reads and intermittent assertion failures. - Moving the write into a repo-owned helper removes the external dependency, and atomic write-plus-wait makes the handoff deterministic. ## Scope - Test-only change.	2026-03-09 23:41:58 -07:00
Ahmed Ibrahim	b39ae9501f	Stabilize websocket test server binding (#14002 ) ## Summary - stop reserving a localhost port in the websocket tests before spawning the server - let the app-server bind `127.0.0.1:0` itself and read back the actual bound websocket address from stderr - update the websocket test helpers and callers to use the discovered address ## Why this fixes the flake The previous harness reserved a port in the test process, dropped it, and then asked the server process to bind that same address. On busy runners there is a race between releasing the reservation and the child process rebinding it, which can produce sporadic startup failures. Binding to port `0` inside the server removes that race entirely, and waiting for the server to report the real bound address makes the tests connect only after the listener is actually ready.	2026-03-09 23:39:56 -07:00
Ahmed Ibrahim	6b7253b123	Fix unified exec test output assertion (#14184 ) ## Summary - update the unified exec test to use truncated_output() instead of the removed output field - fix the compile failure on latest main after ExecCommandToolOutput changed shape	2026-03-09 23:12:36 -07:00
Ahmed Ibrahim	aa6a57dfa2	Stabilize incomplete SSE retry test (#13879 ) ## What changed - The retry test now uses the same streaming SSE test server used by production-style tests instead of a wiremock sequence. - The fixture is resolved via `find_resource!`, and the test asserts that exactly two outbound requests were sent. ## Why this fixes the flake - The old wiremock sequence approximated early-close behavior, but it did not reproduce the same streaming semantics the real client sees. - That meant the retry path depended on mock implementation details instead of on the actual transport behavior we care about. - Switching to the streaming SSE helper makes the test exercise the real early-close/retry contract, and counting requests directly verifies that we retried exactly once rather than merely hoping the sequence aligned. ## Scope - Test-only change.	2026-03-09 22:34:44 -07:00
Ahmed Ibrahim	2e24be2134	Use realtime transcript for handoff context (#14132 ) - collect input/output transcript deltas into active handoff transcript state - attach and clear that transcript on each handoff, and regenerate schema/tests	2026-03-09 22:30:03 -07:00
Channing Conger	c6343e0649	Implemented thread-level atomic elicitation counter for stopwatch pausing (#12296 ) ### Purpose While trying to build out CLI-Tools for the agent to use under skills we have found that those tools sometimes need to invoke a user elicitation. These elicitations are handled out of band of the codex app-server but need to indicate to the exec manager that the command running is not going to progress on the usual timeout horizon. ### Example Model calls universal exec: `$ download-credit-card-history --start-date 2026-01-19 --end-date 2026-02-19 > credit_history.jsonl` download-cred-card-history might hit a hosted/preauthenticated service to fetch data. That service might decide that the request requires an end user approval the access to the personal data. It should be able to signal to the running thread that the command in question is blocked on user elicitation. In that case we want the exec to continue, but the timeout to not expire on the tool call, essentially freezing time until the user approves or rejects the command at which point the tool would signal the app-server to decrement the outstanding elicitation count. Now timeouts would proceed as normal. ### What's Added - New v2 RPC methods: - thread/increment_elicitation - thread/decrement_elicitation - Protocol updates in: - codex-rs/app-server-protocol/src/protocol/common.rs - codex-rs/app-server-protocol/src/protocol/v2.rs - App-server handlers wired in: - codex-rs/app-server/src/codex_message_processor.rs ### Behavior - Counter starts at 0 per thread. - increment atomically increases the counter. - decrement atomically decreases the counter; decrement at 0 returns invalid request. - Transition rules: - 0 -> 1: broadcast pause state, pausing all active stopwatches immediately. - \>0 -> >0: remain paused. - 1 -> 0: broadcast unpause state, resuming stopwatches. - Core thread/session logic: - codex-rs/core/src/codex_thread.rs - codex-rs/core/src/codex.rs - codex-rs/core/src/mcp_connection_manager.rs ### Exec-server stopwatch integration - Added centralized stopwatch tracking/controller: - codex-rs/exec-server/src/posix/stopwatch_controller.rs - Hooked pause/unpause broadcast handling + stopwatch registration: - codex-rs/exec-server/src/posix/mcp.rs - codex-rs/exec-server/src/posix/stopwatch.rs - codex-rs/exec-server/src/posix.rs	2026-03-09 22:29:26 -07:00
Matthew Zeng	566e4cee4b	[apps] Fix apps enablement condition. (#14011 ) - [x] Fix apps enablement condition to check both the feature flag and that the user is not an API key user.	2026-03-09 22:25:43 -07:00

1 2 3 4 5 ...

3762 commits