core-agent-ide

Author	SHA1	Message	Date
zhao-oai	72af589398	storing credits (#6858 ) Expand the rate-limit cache/TUI: store credit snapshots alongside primary and secondary windows, render “Credits” when the backend reports they exist (unlimited vs rounded integer balances)	2025-11-19 10:49:35 -08:00
Ahmed Ibrahim	d5dfba2509	feat: arcticfox in the wild (#6906 ) <img width="485" height="600" alt="image" src="https://github.com/user-attachments/assets/4341740d-dd58-4a3e-b69a-33a3be0606c5" /> --------- Co-authored-by: jif-oai <jif@openai.com>	2025-11-19 16:31:06 +00:00
Owen Lin	1924500250	[app-server] populate thread>turns>items on thread/resume (#6848 ) This PR allows clients to render historical messages when resuming a thread via `thread/resume` by reading from the list of `EventMsg` payloads loaded from the rollout, and then transforming them into Turns and ThreadItems to be returned on the `Thread` object. This is implemented by leveraging `SessionConfiguredNotification` which returns this list of `EventMsg` objects when resuming a conversation, and then applying a stateful `ThreadHistoryBuilder` that parses from this EventMsg log and transforms it into Turns and ThreadItems. Note that we only persist a subset of `EventMsg`s in a rollout as defined in `policy.rs`, so we lose fidelity whenever we resume a thread compared to when we streamed the thread's turns originally. However, this behavior is at parity with the legacy API.	2025-11-19 15:58:09 +00:00
Ahmed Ibrahim	793063070b	fix: typos in model picker (#6859 ) # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes. Include a link to a bug report or enhancement request.	2025-11-19 06:29:02 +00:00
Michael Bolin	a75321a64c	fix: add more fields to ThreadStartResponse and ThreadResumeResponse (#6847 ) This adds the following fields to `ThreadStartResponse` and `ThreadResumeResponse`: ```rust pub model: String, pub model_provider: String, pub cwd: PathBuf, pub approval_policy: AskForApproval, pub sandbox: SandboxPolicy, pub reasoning_effort: Option<ReasoningEffort>, ``` This is important because these fields are optional in `ThreadStartParams` and `ThreadResumeParams`, so the caller needs to be able to determine what values were ultimately used to start/resume the conversation. (Though note that any of these could be changed later between turns in the conversation.) Though to get this information reliably, it must be read from the internal `SessionConfiguredEvent` that is created in response to the start of a conversation. Because `SessionConfiguredEvent` (as defined in `codex-rs/protocol/src/protocol.rs`) did not have all of these fields, a number of them had to be added as part of this PR. Because `SessionConfiguredEvent` is referenced in many tests, test instances of `SessionConfiguredEvent` had to be updated, as well, which is why this PR touches so many files.	2025-11-18 21:18:43 -08:00
Celia Chen	b395dc1be6	[app-server] introduce `turn/completed` v2 event (#6800 ) similar to logic in `codex/codex-rs/exec/src/event_processor_with_jsonl_output.rs`. translation of v1 -> v2 events: `codex/event/task_complete` -> `turn/completed` `codex/event/turn_aborted` -> `turn/completed` with `interrupted` status `codex/event/error` -> `turn/completed` with `error` status this PR also makes `items` field in `Turn` optional. For now, we only populate it when we resume a thread, and leave it as None for all other places until we properly rewrite core to keep track of items. tested using the codex app server client. example new event: ``` < { < "method": "turn/completed", < "params": { < "turn": { < "id": "0", < "items": [], < "status": "interrupted" < } < } < } ```	2025-11-19 01:55:24 +00:00
jif-oai	8ddae8cde3	feat: review in app server (#6613 )	2025-11-18 21:58:54 +00:00
Ahmed Ibrahim	ddcc60a085	Update defaults to gpt-5.1 (#6652 ) ## Summary - update documentation, example configs, and automation defaults to reference gpt-5.1 / gpt-5.1-codex - bump the CLI and core configuration defaults, model presets, and error messaging to the new models while keeping the model-family/tool coverage for legacy slugs - refresh tests, fixtures, and TUI snapshots so they expect the upgraded defaults ## Testing - `cargo test -p codex-core config::tests::test_precedence_fixture_with_gpt5_profile` ------ [Codex Task](https://chatgpt.com/codex/tasks/task_i_6916c5b3c2b08321ace04ee38604fc6b)	2025-11-17 17:40:11 -08:00
Owen Lin	cecbd5b021	[app-server] feat: add v2 command execution approval flow (#6758 ) This PR adds the API V2 version of the command‑execution approval flow for the shell tool. This PR wires the new RPC (`item/commandExecution/requestApproval`, V2 only) and related events (`item/started`, `item/completed`, and `item/commandExecution/delta`, which are emitted in both V1 and V2) through the app-server protocol. The new approval RPC is only sent when the user initiates a turn with the new `turn/start` API so we don't break backwards compatibility with VSCE. The approach I took was to make as few changes to the Codex core as possible, leveraging existing `EventMsg` core events, and translating those in app-server. I did have to add additional fields to `EventMsg::ExecCommandEndEvent` to capture the command's input so that app-server can statelessly transform these events to a `ThreadItem::CommandExecution` item for the `item/completed` event. Once we stabilize the API and it's complete enough for our partners, we can work on migrating the core to be aware of command execution items as a first-class concept. Note: We'll need followup work to make sure these APIs work for the unified exec tool, but will wait til that's stable and landed before doing a pass on app-server. Example payloads below: ``` { "method": "item/started", "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": null, "exitCode": null, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "inProgress", "type": "commandExecution" } } } ``` ``` { "id": 0, "method": "item/commandExecution/requestApproval", "params": { "itemId": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "reason": "Need to create file in /tmp which is outside workspace sandbox", "risk": null, "threadId": "019a93e8-0a52-7fe3-9808-b6bc40c0989a", "turnId": "1" } } ``` ``` { "id": 0, "result": { "acceptSettings": { "forSession": false }, "decision": "accept" } } ``` ``` { "params": { "item": { "aggregatedOutput": null, "command": "/bin/zsh -lc 'touch /tmp/should-trigger-approval'", "cwd": "/Users/owen/repos/codex/codex-rs", "durationMs": 224, "exitCode": 0, "id": "call_lNWWsbXl1e47qNaYjFRs0dyU", "parsedCmd": [ { "cmd": "touch /tmp/should-trigger-approval", "type": "unknown" } ], "status": "completed", "type": "commandExecution" } } } ```	2025-11-18 00:23:54 +00:00
Owen Lin	db2aa57d73	[app-server] small fixes for JSON schema export and one-of types (#6614 ) A partner is consuming our generated JSON schema bundle for app-server and identified a few issues: - not all polymorphic / one-of types have a type descriminator - `"$ref": "#/definitions/v2/SandboxPolicy"` is missing - "Option<>" is an invalid schema name, and also unnecessary This PR: - adds the type descriminator to the various types that are missing it except for `SessionSource` and `SubAgentSource` because they are serialized to disk (adding this would break backwards compat for resume), and they should not be necessary to consume for an integration with app-server. - removes the special handling in `export.rs` of various types like SandboxPolicy, which turned out to be unnecessary and incorrect - filters out `Option<>` which was auto-generated for request params that don't need a body For context, we currently pull in wayyy more types than we need through the `EventMsg` god object which we are not planning to expose in API v2 (this is how I suspect `SessionSource` and `SubAgentSource` are being pulled in). But until we have all the necessary v2 notifications in place that will allow us to remove `EventMsg`, we will keep exporting it for now.	2025-11-13 16:25:17 -08:00
pakrym-oai	e3dd362c94	Reasoning level update (#6586 ) Automatically update reasoning levels when migrating between models	2025-11-13 06:24:36 +00:00
Ahmed Ibrahim	e3aaee00c8	feat: show gpt mini (#6583 )	2025-11-13 05:21:00 +00:00
Ahmed Ibrahim	ad7eaa80f9	Change model picker to include gpt5.1 (#6569 ) - Change the presets - Change the tests that make sure we keep the list of tools updated - Filter out deprecated models	2025-11-12 19:44:53 -08:00
Owen Lin	964220ac94	[app-server] feat: thread/resume supports history, path, and overrides (#6483 ) This updates `thread/resume` to be at parity with v1's `ResumeConversationParams`. Turns out history is useful for codex cloud and path is useful for the VSCode extension. And config overrides are always useful.	2025-11-12 22:02:43 +00:00
Owen Lin	2030b28083	[app-server] feat: expose additional fields on Thread (#6338 ) Add the following fields to Thread: ``` pub preview: String, pub model_provider: String, pub created_at: i64, ``` Will prob need another PR once this lands: https://github.com/openai/codex/pull/6337	2025-11-07 04:08:45 +00:00
Celia Chen	e84e39940b	[App-server] Implement `account/read` endpoint (#6336 ) This PR does two things: 1. add a new function in core that maps the core-internal plan type to the external plan type; 2. implement account/read that get account status (v2 of `getAuthStatus`).	2025-11-06 19:43:13 -08:00
Owen Lin	fdb9fa301e	chore: move relevant tests to app-server/tests/suite/v2 (#6289 ) These are technically app-server v2 APIs, so move them to the same directory as the others.	2025-11-06 10:53:17 -08:00
Owen Lin	6582554926	[app-server] feat: v2 Turn APIs (#6216 ) Implements: ``` turn/start turn/interrupt ``` along with their integration tests. These are relatively light wrappers around the existing core logic, and changes to core logic are minimal. However, an improvement made for developer ergonomics: - `turn/start` replaces both `SendUserMessage` (no turn overrides) and `SendUserTurn` (can override model, approval policy, etc.)	2025-11-06 16:36:36 +00:00
Thibault Sottiaux	667e841d3e	feat: support models with single reasoning effort (#6300 )	2025-11-05 23:06:45 -08:00
Celia Chen	229d18f4d2	[App-server] Add account/login/cancel v2 endpoint (#6288 ) Add `account/login/cancel` v2 endpoint for auth. this is similar implementation to `cancelLoginChatgpt` v1 endpoint.	2025-11-06 01:13:55 +00:00
Celia Chen	05f0b4f590	[App-server] Implement v2 for `account/login/start` and `account/login/completed` (#6183 ) This PR implements `account/login/start` and `account/login/completed`. Instead of having separate endpoints for login with chatgpt and api, we have a single enum handling different login methods. For sync auth methods like sign in with api key, we still send a `completed` notification back to be compatible with the async login flow.	2025-11-05 13:52:50 -08:00
Eric Traut	d7953aed74	Fixes intermittent test failures in CI (#6282 ) I'm seeing two tests fail intermittently in CI. This PR attempts to address (or at least mitigate) the flakiness. * summarize_context_three_requests_and_instructions - The test snapshots server.received_requests() immediately after observing TaskComplete. Because the OpenAI /v1/responses call is streamed, the HTTP request can still be draining when that event fires, so wiremock occasionally reports only two captured requests. Fix is to wait for async activity to complete. * archive_conversation_moves_rollout_into_archived_directory - times out on a slow CI run. Mitigation is to increase timeout value from 10s to 20s.	2025-11-05 13:12:25 -08:00
Owen Lin	2ab1650d4d	[app-server] feat: v2 Thread APIs (#6214 ) Implements: ``` thread/list thread/start thread/resume thread/archive ``` along with their integration tests. These are relatively light wrappers around the existing core logic, and changes to core logic are minimal. However, an improvement made for developer ergonomics: - `thread/start` and `thread/resume` automatically attaches a conversation listener internally, so clients don't have to make a separate `AddConversationListener` call like they do today. For consistency, also updated `model/list` and `feedback/upload` (naming conventions, list API params).	2025-11-05 20:28:43 +00:00
Owen Lin	edf4c3f627	[app-server] feat: export.rs supports a v2 namespace, initial v2 notifications (#6212 ) Typescript and JSON schema exports While working on Thread/Turn/Items type definitions, I realize we will run into name conflicts between v1 and v2 APIs (e.g. `RateLimitWindow` which won't be reusable since v1 uses `RateLimitWindow` from `protocol/` which uses snake_case, but we want to expose camelCase everywhere, so we'll define a V2 version of that struct that serializes as camelCase). To set us up for a clean and isolated v2 API, generate types into a `v2/` namespace for both typescript and JSON schema. - TypeScript: v2 types emit under `out_dir/v2/.ts`, and root index.ts now re-exports them via `export as v2 from "./v2"`;. - JSON Schemas: v2 definitions bundle under `#/definitions/v2/` rather than the root. The location for the original types (v1 and types pulled from `protocol/` and other core crates) haven't changed and are still at the root. This is for backwards compatibility: no breaking changes to existing usages of v1 APIs and types. Notifications* While working on export.rs, I: - refactored server/client notifications with macros (like we already do for methods) so they also get exported (I noticed they weren't being exported at all). - removed the hardcoded list of types to export as JSON schema by leveraging the existing macros instead - and took a stab at API V2 notifications. These aren't wired up yet, and I expect to iterate on these this week.	2025-11-05 01:02:39 +00:00
Celia Chen	d3187dbc17	[App-server] v2 for account/updated and account/logout (#6175 ) V2 for `account/updated` and `account/logout` for app server. correspond to old `authStatusChange` and `LogoutChatGpt` respectively. Followup PRs will make other v2 endpoints call `account/updated` instead of `authStatusChange` too.	2025-11-03 22:01:33 -08:00
pakrym-oai	2371d771cc	Update user instruction message format (#6010 )	2025-10-30 18:44:02 -07:00
Anton Panasenko	9572cfc782	[codex] add developer instructions (#5897 ) we are using developer instructions for code reviews, we need to pass them in cli as well.	2025-10-30 11:18:31 -07:00
Rasmus Rygaard	39e09c289d	Add a wrapper around raw response items (#5923 ) We currently have nested enums when sending raw response items in the app-server protocol. This makes downstream schemas confusing because we need to embed `type`-discriminated enums within each other. This PR adds a small wrapper around the response item so we can keep the schemas separate	2025-10-29 20:32:40 +00:00
Anton Panasenko	149e198ce8	[codex][app-server] resume conversation from history (#5893 )	2025-10-28 18:18:03 -07:00
Gabriel Peal	1d76ba5ebe	[App Server] Allow fetching or resuming a conversation summary from the conversation id (#5890 ) This PR adds an option to app server to allow conversation summaries to be fetched from just the conversation id rather than rollout path for convenience at the cost of some latency to discover the rollout path. This convenience is non-trivial as it allows app servers to simply maintain conversation ids rather than rollout paths and the associated platform (Windows) handling associated with storing and encoding them correctly.	2025-10-28 20:17:22 -04:00
Owen Lin	266419217e	chore: use anyhow::Result for all app-server integration tests (#5836 ) There's a lot of visual noise in app-server's integration tests due to the number of `.expect("<some_msg>")` lines which are largely redundant / not very useful. Clean them up by using `anyhow::Result` + `?` consistently. Replaces the existing pattern of: ``` let codex_home = TempDir::new().expect("create temp dir"); create_config_toml(codex_home.path()).expect("write config.toml"); let mut mcp = McpProcess::new(codex_home.path()) .await .expect("spawn mcp process"); timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()) .await .expect("initialize timeout") .expect("initialize request"); ``` With: ``` let codex_home = TempDir::new()?; create_config_toml(codex_home.path())?; let mut mcp = McpProcess::new(codex_home.path()).await?; timeout(DEFAULT_READ_TIMEOUT, mcp.initialize()).await??; ```	2025-10-28 08:10:23 -07:00
Celia Chen	4a42c4e142	[Auth] Choose which auth storage to use based on config (#5792 ) This PR is a follow-up to #5591. It allows users to choose which auth storage mode they want by using the new `cli_auth_credentials_store_mode` config.	2025-10-27 19:41:49 -07:00
Celia Chen	eb5b1b627f	[Auth] Introduce New Auth Storage Abstraction for Codex CLI (#5569 ) This PR introduces a new `Auth Storage` abstraction layer that takes care of read, write, and load of auth tokens based on the AuthCredentialsStoreMode. It is similar to how we handle MCP client oauth [here](https://github.com/openai/codex/blob/main/codex-rs/rmcp-client/src/oauth.rs). Instead of reading and writing directly from disk for auth tokens, Codex CLI workflows now should instead use this auth storage using the public helper functions. This PR is just a refactor of the current code so the behavior stays the same. We will add support for keyring and hybrid mode in follow-up PRs. I have read the CLA Document and I hereby sign the CLA	2025-10-27 11:01:14 -07:00
Michael Bolin	5907422d65	feat: annotate conversations with model_provider for filtering (#5658 ) Because conversations that use the Responses API can have encrypted reasoning messages, trying to resume a conversation with a different provider could lead to confusing "failed to decrypt" errors. (This is reproducible by starting a conversation using ChatGPT login and resuming it as a conversation that uses OpenAI models via Azure.) This changes `ListConversationsParams` to take a `model_providers: Option<Vec<String>>` and adds `model_provider` on each `ConversationSummary` it returns so these cases can be disambiguated. Note this ended up making changes to `codex-rs/core/src/rollout/tests.rs` because it had a number of cases where it expected `Some` for the value of `next_cursor`, but the list of rollouts was complete, so according to this docstring: `bcd64c7e72/codex-rs/app-server-protocol/src/protocol.rs (L334-L337)` If there are no more items to return, then `next_cursor` should be `None`. This PR updates that logic. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/openai/codex/pull/5658). * #5803 * #5793 * __->__ #5658	2025-10-27 02:03:30 -07:00
Ahmed Ibrahim	f178805252	Add feedback upload request handling (#5682 )	2025-10-27 05:53:39 +00:00
Eric Traut	0533bd2e7c	Fixed flaky unit test (#5654 ) This PR fixes a test that is sporadically failing in CI. The problem is that two unit tests (the older `login_and_cancel_chatgpt` and a recently added `login_chatgpt_includes_forced_workspace_query_param`) exercise code paths that start the login server. The server binds to a hard-coded localhost port number, so attempts to start more than one server at the same time will fail. If these two tests happen to run concurrently, one of them will fail. To fix this, I've added a simple mutex. We can use this same mutex for future tests that use the same pattern.	2025-10-24 16:31:24 -07:00
Anton Panasenko	6af83d86ff	[codex][app-server] introduce codex/event/raw_item events (#5578 )	2025-10-24 22:41:52 +00:00
Eric Traut	f8af4f5c8d	Added model summary and risk assessment for commands that violate sandbox policy (#5536 ) This PR adds support for a model-based summary and risk assessment for commands that violate the sandbox policy and require user approval. This aids the user in evaluating whether the command should be approved. The feature works by taking a failed command and passing it back to the model and asking it to summarize the command, give it a risk level (low, medium, high) and a risk category (e.g. "data deletion" or "data exfiltration"). It uses a new conversation thread so the context in the existing thread doesn't influence the answer. If the call to the model fails or takes longer than 5 seconds, it falls back to the current behavior. For now, this is an experimental feature and is gated by a config key `experimental_sandbox_command_assessment`. Here is a screen shot of the approval prompt showing the risk assessment and summary. <img width="723" height="282" alt="image" src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b" />	2025-10-24 15:23:44 -07:00
pakrym-oai	3c90728a29	Add new thread items and rewire event parsing to use them (#5418 ) 1. Adds AgentMessage, Reasoning, WebSearch items. 2. Switches the ResponseItem parsing to use new items and then also emit 3. Removes user-item kind and filters out "special" (environment) user items when returning to clients.	2025-10-22 10:14:50 -07:00
Owen Lin	26f314904a	[app-server] model/list API (#5382 ) Adds a `model/list` paginated API that returns the list of models supported by Codex.	2025-10-21 11:15:17 -07:00
Owen Lin	5c680c6587	[app-server] read rate limits API (#5302 ) Adds a `GET account/rateLimits/read` API to app-server. This calls the codex backend to fetch the user's current rate limits. This would be helpful in checking rate limits without having to send a message. For calling the codex backend usage API, I generated the types and manually copied the relevant ones into `codex-backend-openapi-types`. It'll be nice to extend our internal openapi generator to support Rust so we don't have to run these manual steps. # External (non-OpenAI) Pull Request Requirements Before opening this Pull Request, please read the dedicated "Contributing" markdown file or your PR may be closed: https://github.com/openai/codex/blob/main/docs/contributing.md If your PR conforms to our contribution guidelines, replace this text with a detailed and high quality description of your changes.	2025-10-20 14:11:54 -07:00
Gabriel Peal	d87f87e25b	Add forced_chatgpt_workspace_id and forced_login_method configuration options (#5303 ) This PR adds support for configs to specify a forced login method (chatgpt or api) as well as a forced chatgpt account id. This lets enterprises uses [managed configs](https://developers.openai.com/codex/security#managed-configuration) to force all employees to use their company's workspace instead of their own or any other. When a workspace id is set, a query param is sent to the login flow which auto-selects the given workspace or errors if the user isn't a member of it. This PR is large but a large % of it is tests, wiring, and required formatting changes. API login with chatgpt forced <img width="1592" height="116" alt="CleanShot 2025-10-19 at 22 40 04" src="https://github.com/user-attachments/assets/560c6bb4-a20a-4a37-95af-93df39d057dd" /> ChatGPT login with api forced <img width="1018" height="100" alt="CleanShot 2025-10-19 at 22 40 29" src="https://github.com/user-attachments/assets/d010bbbb-9c8d-4227-9eda-e55bf043b4af" /> Onboarding with api forced <img width="892" height="460" alt="CleanShot 2025-10-19 at 22 41 02" src="https://github.com/user-attachments/assets/cc0ed45c-b257-4d62-a32e-6ca7514b5edd" /> Onboarding with ChatGPT forced <img width="1154" height="426" alt="CleanShot 2025-10-19 at 22 41 27" src="https://github.com/user-attachments/assets/41c41417-dc68-4bb4-b3e7-3b7769f7e6a1" /> Logging in with the wrong workspace <img width="2222" height="84" alt="CleanShot 2025-10-19 at 22 42 31" src="https://github.com/user-attachments/assets/0ff4222c-f626-4dd3-b035-0b7fe998a046" />	2025-10-20 08:50:54 -07:00
Michael Bolin	995f5c3614	feat: add Vec<ParsedCommand> to ExecApprovalRequestEvent (#5222 ) This adds `parsed_cmd: Vec<ParsedCommand>` to `ExecApprovalRequestEvent` in the core protocol (`protocol/src/protocol.rs`), which is also what this field is named on `ExecCommandBeginEvent`. Honestly, I don't love the name (it sounds like a single command, but it is actually a list of them), but I don't want to get distracted by a naming discussion right now. This also adds `parsed_cmd` to `ExecCommandApprovalParams` in `codex-rs/app-server-protocol/src/protocol.rs`, so it will be available via `codex app-server`, as well. For consistency, I also updated `ExecApprovalElicitRequestParams` in `codex-rs/mcp-server/src/exec_approval.rs` to include this field under the name `codex_parsed_cmd`, as that struct already has a number of special `codex_*` fields. Note this is the code for when Codex is used as an MCP _server_ and therefore has to conform to the official spec for an MCP elicitation type.	2025-10-15 13:58:40 -07:00
jif-oai	69cb72f842	chore: sandbox refactor 2 (#4653 ) Revert the revert and fix the UI issue	2025-10-03 11:17:39 +01:00
Shijie Rao	7be3b484ad	feat: add file name to fuzzy search response (#4619 ) ### Summary * Updated fuzzy search result to include the file name. * This should not affect CLI usage and the UI there will be addressed in a separate PR. ### Testing Tested locally and with the extension. ### Screenshot <img width="431" height="244" alt="Screenshot 2025-10-02 at 11 08 44 AM" src="https://github.com/user-attachments/assets/ba2ca299-a81d-4453-9242-1750e945aea2" /> --------- Co-authored-by: shijie.rao <shijie.rao@squareup.com>	2025-10-02 18:19:13 -07:00
Ahmed Ibrahim	ed5d656fa8	Revert "chore: sanbox extraction" (#4626 ) Reverts openai/codex#4286	2025-10-02 21:09:21 +00:00
jif-oai	b8195a17e5	chore: sanbox extraction (#4286 ) # Extract and Centralize Sandboxing - Goal: Improve safety and clarity by centralizing sandbox planning and execution. - Approach: - Add planner (ExecPlan) and backend registry (Direct/Seatbelt/Linux) with run_with_plan. - Refactor codex.rs to plan-then-execute; handle failures/escalation via the plan. - Delegate apply_patch to the codex binary and run it with an empty env for determinism.	2025-10-01 12:05:12 +01:00
Michael Bolin	5881c0d6d4	fix: remove mcp-types from app server protocol (#4537 ) We continue the separation between `codex app-server` and `codex mcp-server`. In particular, we introduce a new crate, `codex-app-server-protocol`, and migrate `codex-rs/protocol/src/mcp_protocol.rs` into it, renaming it `codex-rs/app-server-protocol/src/protocol.rs`. Because `ConversationId` was defined in `mcp_protocol.rs`, we move it into its own file, `codex-rs/protocol/src/conversation_id.rs`, and because it is referenced in a ton of places, we have to touch a lot of files as part of this PR. We also decide to get away from proper JSON-RPC 2.0 semantics, so we also introduce `codex-rs/app-server-protocol/src/jsonrpc_lite.rs`, which is basically the same `JSONRPCMessage` type defined in `mcp-types` except with all of the `"jsonrpc": "2.0"` removed. Getting rid of `"jsonrpc": "2.0"` makes our serialization logic considerably simpler, as we can lean heavier on serde to serialize directly into the wire format that we use now.	2025-10-01 02:16:26 +00:00
Michael Bolin	32853ecbc5	fix: use macros to ensure request/response symmetry (#4529 ) Manually curating `protocol-ts/src/lib.rs` was error-prone, as expected. I finally asked Codex to write some Rust macros so we can ensure that: - For every variant of `ClientRequest` and `ServerRequest`, there is an associated `params` and `response` type. - All response types are included automatically in the output of `codex generate-ts`.	2025-09-30 18:06:05 -07:00
Michael Bolin	d9dbf48828	fix: separate `codex mcp` into `codex mcp-server` and `codex app-server` (#4471 ) This is a very large PR with some non-backwards-compatible changes. Historically, `codex mcp` (or `codex mcp serve`) started a JSON-RPC-ish server that had two overlapping responsibilities: - Running an MCP server, providing some basic tool calls. - Running the app server used to power experiences such as the VS Code extension. This PR aims to separate these into distinct concepts: - `codex mcp-server` for the MCP server - `codex app-server` for the "application server" Note `codex mcp` still exists because it already has its own subcommands for MCP management (`list`, `add`, etc.) The MCP logic continues to live in `codex-rs/mcp-server` whereas the refactored app server logic is in the new `codex-rs/app-server` folder. Note that most of the existing integration tests in `codex-rs/mcp-server/tests/suite` were actually for the app server, so all the tests have been moved with the exception of `codex-rs/mcp-server/tests/suite/mod.rs`. Because this is already a large diff, I tried not to change more than I had to, so `codex-rs/app-server/tests/common/mcp_process.rs` still uses the name `McpProcess` for now, but I will do some mechanical renamings to things like `AppServer` in subsequent PRs. While `mcp-server` and `app-server` share some overlapping functionality (like reading streams of JSONL and dispatching based on message types) and some differences (completely different message types), I ended up doing a bit of copypasta between the two crates, as both have somewhat similar `message_processor.rs` and `outgoing_message.rs` files for now, though I expect them to diverge more in the near future. One material change is that of the initialize handshake for `codex app-server`, as we no longer use the MCP types for that handshake. Instead, we update `codex-rs/protocol/src/mcp_protocol.rs` to add an `Initialize` variant to `ClientRequest`, which takes the `ClientInfo` object we need to update the `USER_AGENT_SUFFIX` in `codex-rs/app-server/src/message_processor.rs`. One other material change is in `codex-rs/app-server/src/codex_message_processor.rs` where I eliminated a use of the `send_event_as_notification()` method I am generally trying to deprecate (because it blindly maps an `EventMsg` into a `JSONNotification`) in favor of `send_server_notification()`, which takes a `ServerNotification`, as that is intended to be a custom enum of all notification types supported by the app server. So to make this update, I had to introduce a new variant of `ServerNotification`, `SessionConfigured`, which is a non-backwards compatible change with the old `codex mcp`, and clients will have to be updated after the next release that contains this PR. Note that `codex-rs/app-server/tests/suite/list_resume.rs` also had to be update to reflect this change. I introduced `codex-rs/utils/json-to-toml/src/lib.rs` as a small utility crate to avoid some of the copying between `mcp-server` and `app-server`.	2025-09-30 07:06:18 +00:00

50 commits