Commit graph

373 commits

Author SHA1 Message Date
Max Johnson
fb0aaf94de
codex-rs: fix thread resume rejoin semantics (#11756)
## Summary
- always rejoin an in-memory running thread on `thread/resume`, even
when overrides are present
- reject `thread/resume` when `history` is provided for a running thread
- reject `thread/resume` when `path` mismatches the running thread
rollout path
- warn (but do not fail) on override mismatches for running threads
- add more `thread_resume` integration tests and fixes; including
restart-based resume-with-overrides coverage

## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all thread_resume`
- manual test with app-server-test-client
https://github.com/openai/codex/pull/11755
- manual test both stdio and websocket in app
2026-02-13 23:09:58 +00:00
Jeremy Rose
e4f8263798
[app-server] add fuzzyFileSearch/sessionCompleted (#11773)
this is to allow the client to know when to stop showing a spinner.
2026-02-13 15:08:14 -08:00
Matthew Zeng
8468871e2b
[apps] Improve app listing filtering. (#11697)
- [x] If an installed app is not on the app listing, remove it from the
final list.
2026-02-13 11:54:16 -08:00
Matthew Zeng
f93037f55d
[apps] Fix app loading logic. (#11518)
When `app/list` is called with `force_refetch=True`, we should seed the
results with what is already cached instead of starting from an empty
list. Otherwise when we send app/list/updated events, the client will
first see an empty list of accessible apps and then get the updated one.
2026-02-13 03:55:10 +00:00
acrognale-oai
ebe359b876
Add cwd as an optional field to thread/list (#11651)
Add's the ability to filter app-server thread/list by cwd
2026-02-13 02:05:04 +00:00
Charley Cunningham
f24669d444
Persist complete TurnContextItem state via canonical conversion (#11656)
## Summary

This PR delivers the first small, shippable step toward model-visible
state diffing by making
`TurnContextItem` more complete and standardizing how it is built.

Specifically, it:
- Adds persisted network context to `TurnContextItem`.
- Introduces a single canonical `TurnContext -> TurnContextItem`
conversion path.
- Routes existing rollout write sites through that canonical conversion
helper.

No context injection/diff behavior changes are included in this PR.

## Why this change

The design goal is to make `TurnContextItem` the canonical source of
truth for context-diff
decisions.
Before this PR:
- `TurnContextItem` did not include all TurnContext-derived environment
inputs needed for v1
completeness.
- Construction was duplicated at multiple write sites.

This PR addresses both with a minimal, reviewable change.

## Changes

### 1) Extend `TurnContextItem` with network state
- Added `TurnContextNetworkItem { allowed_domains, denied_domains }`.
- Added `network: Option<TurnContextNetworkItem>` to `TurnContextItem`.
- Kept backward compatibility by making the new field optional and
skipped when absent.

Files:
- `codex-rs/protocol/src/protocol.rs`

### 2) Canonical conversion helper
- Added `TurnContext::to_turn_context_item(collaboration_mode)` in core.
- Added internal helper to derive network fields from
`config_layer_stack.requirements().network`.

Files:
- `codex-rs/core/src/codex.rs`

### 3) Use canonical conversion at rollout write sites
- Replaced ad hoc `TurnContextItem { ... }` construction with
`to_turn_context_item(...)` in:
  - sampling request path
  - compaction path

Files:
- `codex-rs/core/src/codex.rs`
- `codex-rs/core/src/compact.rs`

### 4) Update fixtures/tests for new optional field
- Updated existing `TurnContextItem` literals in tests to include
`network: None`.
- Added protocol tests for:
  - deserializing old payloads with no `network`
  - serializing when `network` is present

Files:
- `codex-rs/core/tests/suite/resume_warning.rs`
- No replay/diff logic changes.
- Persisted rollout `TurnContextItem` now carries additional network
context when available.
- Older rollout lines without `network` remain readable.
2026-02-12 17:22:44 -08:00
Matthew Zeng
c37560069a
[apps] Add is_enabled to app info. (#11417)
- [x] Add is_enabled to app info and the response of `app/list`.
- [x] Update TUI to have Enable/Disable button on the app detail page.
2026-02-13 00:30:52 +00:00
Owen Lin
8d97b5c246
fix(app-server): surface more helpful errors for json-rpc (#11638)
Propagate client JSON-RPC errors for app-server request callbacks.
Previously a number of possible errors were collapsed to `channel
closed`. Now we should be able to see the underlying client error.

### Summary
This change stops masking client JSON-RPC error responses as generic
callback cancellation in app-server server->client request flows.

Previously, when the client responded with a JSON-RPC error, we removed
the callback entry but did not send anything to the waiting oneshot
receiver. Waiters then observed channel closure (for example, auth
refresh request canceled: channel closed), which hid the actual client
error.

Now, client JSON-RPC errors are forwarded through the callback channel
and handled explicitly by request consumers.

### User-visible behavior
- External auth refresh now surfaces real client JSON-RPC errors when
provided.
- True transport/callback-drop cases still report
canceled/channel-closed semantics.

### Example: client JSON-RPC error is now propagated (not masked as
"canceled")

When app-server asks the client to refresh ChatGPT auth tokens, it sends
a server->client JSON-RPC request like:

```json
{
  "id": 42,
  "method": "account/chatgptAuthTokens/refresh",
  "params": {
    "reason": "unauthorized",
    "previousAccountId": "org-abc"
  }
}
```

If the client cannot refresh and responds with a JSON-RPC error:
```
{
  "id": 42,
  "error": {
    "code": -32000,
    "message": "refresh failed",
    "data": null
  }
}
```

app-server now forwards that error through the callback path and
surfaces:
`auth refresh request failed: code=-32000 message=refresh failed`

Previously, this same case could be reported as:
`auth refresh request canceled: channel closed`
2026-02-13 00:14:55 +00:00
Michael Bolin
2825ac85a8
app-server: stabilize detached review start on Windows (#11646)
## Why

`review_start_with_detached_delivery_returns_new_thread_id` has been
failing on Windows CI. The failure mode is a process crash
(`tokio-runtime-worker` stack overflow) during detached review setup,
which causes EOF in the test harness.

This test is intended to validate detached review thread identity, not
shell snapshot behavior. We also still want detached review to avoid
unnecessary rollout-path rediscovery when the parent thread is already
loaded.

## What Changed

- Updated detached review startup in
`codex-rs/app-server/src/codex_message_processor.rs`:
  - `start_detached_review` now receives the loaded parent thread.
  - It prefers `parent_thread.rollout_path()`.
- It falls back to `find_thread_path_by_id_str(...)` only if the
in-memory path is unavailable.
- Hardened the review test fixture in
`codex-rs/app-server/tests/suite/v2/review.rs` by setting
`shell_snapshot = false` in test config, so this test no longer depends
on unrelated Windows PowerShell snapshot initialization.

## Verification

- `cargo test -p codex-app-server`
- Verified
`suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
passes locally.

## Notes

- Related context: rollout-path lookup behavior changed in #10532.
2026-02-12 16:12:44 -08:00
Michael Bolin
aef4af1079
app-server tests: disable shell_snapshot for review suite (#11657)
## Why


`suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
was failing on Windows CI due to an unrelated process crash during shell
snapshot initialization (`tokio-runtime-worker` stack overflow).

This review test suite validates review API behavior and should not
depend on shell snapshot behavior. Keeping shell snapshot enabled in
this fixture made the test flaky for reasons outside the scenario under
test.

## What Changed

- Updated the review suite test config in
`codex-rs/app-server/tests/suite/v2/review.rs` to set:
  - `shell_snapshot = false`

This keeps the review tests focused on review behavior by disabling
shell snapshot initialization in this fixture.

## Verification

- `cargo test -p codex-app-server`
- Confirmed the previously failing Windows CI job for this test now
passes on this PR.
2026-02-12 23:56:43 +00:00
Owen Lin
76256a8cec
fix: skip review_start_with_detached_delivery_returns_new_thread_id o… (#11645)
…n windows
2026-02-12 15:12:57 -08:00
Michael Bolin
a4cc1a4a85
feat: introduce Permissions (#11633)
## Why
We currently carry multiple permission-related concepts directly on
`Config` for shell/unified-exec behavior (`approval_policy`,
`sandbox_policy`, `network`, `shell_environment_policy`,
`windows_sandbox_mode`).

Consolidating these into one in-memory struct makes permission handling
easier to reason about and sets up the next step: supporting named
permission profiles (`[permissions.PROFILE_NAME]`) without changing
behavior now.

This change is mostly mechanical: it updates existing callsites to go
through `config.permissions`, but it does not yet refactor those
callsites to take a single `Permissions` value in places where multiple
permission fields are still threaded separately.

This PR intentionally **does not** change the on-disk `config.toml`
format yet and keeps compatibility with legacy config keys.

## What Changed
- Introduced `Permissions` in `core/src/config/mod.rs`.
- Added `Config::permissions` and moved effective runtime permission
fields under it:
  - `approval_policy`
  - `sandbox_policy`
  - `network`
  - `shell_environment_policy`
  - `windows_sandbox_mode`
- Updated config loading/building so these effective values are still
derived from the same existing config inputs and constraints.
- Updated Windows sandbox helpers/resolution to read/write via
`permissions`.
- Threaded the new field through all permission consumers across core
runtime, app-server, CLI/exec, TUI, and sandbox summary code.
- Updated affected tests to reference `config.permissions.*`.
- Renamed the struct/field from
`EffectivePermissions`/`effective_permissions` to
`Permissions`/`permissions` and aligned variable naming accordingly.

## Verification
- `just fix -p codex-core -p codex-tui -p codex-cli -p codex-app-server
-p codex-exec -p codex-utils-sandbox-summary`
- `cargo build -p codex-core -p codex-tui -p codex-cli -p
codex-app-server -p codex-exec -p codex-utils-sandbox-summary`
2026-02-12 14:42:54 -08:00
Owen Lin
efc8d45750
feat(app-server): experimental flag to persist extended history (#11227)
This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).

### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.

Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.

### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).

This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)

In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit

For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.

And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.

#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
2026-02-12 19:34:22 +00:00
Jeremy Rose
66e0c3aaa3
app-server: add fuzzy search sessions for streaming file search (#10268) 2026-02-12 10:49:44 -08:00
jif-oai
ba6f7a9e15
chore: drop mcp validation of dynamic tools (#11609)
Drop validation of dynamic tools using MCP names to reduce latency
2026-02-12 17:15:25 +00:00
Michael Bolin
abbd74e2be
feat: make sandbox read access configurable with ReadOnlyAccess (#11387)
`SandboxPolicy::ReadOnly` previously implied broad read access and could
not express a narrower read surface.
This change introduces an explicit read-access model so we can support
user-configurable read restrictions in follow-up work, while preserving
current behavior today.

It also ensures unsupported backends fail closed for restricted-read
policies instead of silently granting broader access than intended.

## What

- Added `ReadOnlyAccess` in protocol with:
  - `Restricted { include_platform_defaults, readable_roots }`
  - `FullAccess`
- Updated `SandboxPolicy` to carry read-access configuration:
  - `ReadOnly { access: ReadOnlyAccess }`
  - `WorkspaceWrite { ..., read_only_access: ReadOnlyAccess }`
- Preserved existing behavior by defaulting current construction paths
to `ReadOnlyAccess::FullAccess`.
- Threaded the new fields through sandbox policy consumers and call
sites across `core`, `tui`, `linux-sandbox`, `windows-sandbox`, and
related tests.
- Updated Seatbelt policy generation to honor restricted read roots by
emitting scoped read rules when full read access is not granted.
- Added fail-closed behavior on Linux and Windows backends when
restricted read access is requested but not yet implemented there
(`UnsupportedOperation`).
- Regenerated app-server protocol schema and TypeScript artifacts,
including `ReadOnlyAccess`.

## Compatibility / rollout

- Runtime behavior remains unchanged by default (`FullAccess`).
- API/schema changes are in place so future config wiring can enable
restricted read access without another policy-shape migration.
2026-02-11 18:31:14 -08:00
Michael Bolin
572ab66496
test(app-server): stabilize app/list thread feature-flag test by using file-backed MCP OAuth creds (#11521)
## Why

`suite::v2::app_list::list_apps_uses_thread_feature_flag_when_thread_id_is_provided`
has been flaky in CI. The test exercises `thread/start`, which
initializes `codex_apps`. In CI/Linux, that path can reach OS
keyring-backed MCP OAuth credential lookup (`Codex MCP Credentials`) and
intermittently abort the MCP process (observed stack overflow in
`zbus`), causing the test to fail before the assertion logic runs.

## What Changed

- Updated the test config in
`codex-rs/app-server/tests/suite/v2/app_list.rs` to set
`mcp_oauth_credentials_store = "file"` in both relevant config-writing
paths:
- The in-test config override inside
`list_apps_uses_thread_feature_flag_when_thread_id_is_provided`
- `write_connectors_config(...)`, which is used by the v2 `app_list`
test suite
- This keeps test coverage focused on thread-scoped app feature flags
while removing OS keyring/DBus dependency from this test path.

## How It Was Verified

- `cargo test -p codex-app-server`
- `cargo test -p codex-app-server
list_apps_uses_thread_feature_flag_when_thread_id_is_provided --
--nocapture`
2026-02-11 18:30:18 -08:00
Max Johnson
c0ecc2e1e1
app-server: thread resume subscriptions (#11474)
This stack layer makes app-server thread event delivery connection-aware
so resumed/attached threads only emit notifications and approval prompts
to subscribed connections.

- Added per-thread subscription tracking in `ThreadState`
(`subscribed_connections`) and mapped subscription ids to `(thread_id,
connection_id)`.
- Updated listener lifecycle so removing a subscription or closing a
connection only removes that connection from the thread’s subscriber
set; listener shutdown now happens when the last subscriber is gone.
- Added `connection_closed(connection_id)` plumbing (`lib.rs` ->
`message_processor.rs` -> `codex_message_processor.rs`) so disconnect
cleanup happens immediately.
- Scoped bespoke event handling outputs through `TargetedOutgoing` to
send requests/notifications only to subscribed connections.
- Kept existing threadresume behavior while aligning with the latest
split-loop transport structure.
2026-02-11 16:21:13 -08:00
Max Johnson
b5339a591d
refactor: codex app-server ThreadState (#11419)
this is a no-op functionality wise. consolidates thread-specific message
processor / event handling state in ThreadState
2026-02-11 12:20:54 -08:00
iceweasel-oai
87279de434
Promote Windows Sandbox (#11341)
1. Move Windows Sandbox NUX to right after trust directory screen
2. Don't offer read-only as an option in Sandbox NUX.
Elevated/Legacy/Quit
3. Don't allow new untrusted directories. It's trust or quit
4. move experimental sandbox features to `[windows]
sandbox="elevated|unelevatd"`
5. Copy tweaks = elevated -> default, non-elevated -> non-admin
2026-02-11 11:48:33 -08:00
Max Johnson
7053aa5457
Reapply "Add app-server transport layer with websocket support" (#11370)
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.

This reverts commit 47356ff83c.

## Summary

To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
   - inbound handling (`transport_event_rx`)
   - outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks

## Validation

Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests

<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-02-11 18:13:39 +00:00
gt-oai
886d9377d3
Cache cloud requirements (#11305)
We're loading these from the web on every startup. This puts them in a
local file with a 1hr TTL.

We sign the downloaded requirements with a key compiled into the Codex
CLI to prevent unsophisticated tampering (determined circumvention is
outside of our threat model: after all, one could just compile Codex
without any of these checks).

If any of the following are true, we ignore the local cache and re-fetch
from Cloud:
* The signature is invalid for the payload (== requirements, sign time,
ttl, user identity)
* The identity does not match the auth'd user's identity
* The TTL has expired
* We cannot parse requirements.toml from the payload
2026-02-11 14:06:41 +00:00
Michael Bolin
8b7f8af343
feat: split codex-common into smaller utils crates (#11422)
We are removing feature-gated shared crates from the `codex-rs`
workspace. `codex-common` grouped several unrelated utilities behind
`[features]`, which made dependency boundaries harder to reason about
and worked against the ongoing effort to eliminate feature flags from
workspace crates.

Splitting these utilities into dedicated crates under `utils/` aligns
this area with existing workspace structure and keeps each dependency
explicit at the crate boundary.

## What changed

- Removed `codex-rs/common` (`codex-common`) from workspace members and
workspace dependencies.
- Added six new utility crates under `codex-rs/utils/`:
  - `codex-utils-cli`
  - `codex-utils-elapsed`
  - `codex-utils-sandbox-summary`
  - `codex-utils-approval-presets`
  - `codex-utils-oss`
  - `codex-utils-fuzzy-match`
- Migrated the corresponding modules out of `codex-common` into these
crates (with tests), and added matching `BUILD.bazel` targets.
- Updated direct consumers to use the new crates instead of
`codex-common`:
  - `codex-rs/cli`
  - `codex-rs/tui`
  - `codex-rs/exec`
  - `codex-rs/app-server`
  - `codex-rs/mcp-server`
  - `codex-rs/chatgpt`
  - `codex-rs/cloud-tasks`
- Updated workspace lockfile entries to reflect the new dependency graph
and removal of `codex-common`.
2026-02-11 12:59:24 +00:00
Michael Bolin
476c1a7160
Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
## Why

`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.

## Net Change

- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.

## Outcome

- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
2026-02-10 22:44:02 -08:00
xl-openai
fdd0cd1de9
feat: support multiple rate limits (#11260)
Added multi-limit support end-to-end by carrying limit_name in
rate-limit snapshots and handling multiple buckets instead of only
codex.
Extended /usage client parsing to consume additional_rate_limits
Updated TUI /status and in-memory state to store/render per-limit
snapshots
Extended app-server rate-limit read response: kept rate_limits and added
rate_limits_by_name.
Adjusted usage-limit error messaging for non-default codex limit buckets
2026-02-10 20:09:31 -08:00
Celia Chen
641d5268fa
chore: persist turn_id in rollout session and make turn_id uuid based (#11246)
Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.

This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.

This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.

example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```

backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.
2026-02-11 03:56:01 +00:00
pakrym-oai
c68999ee6d
Prefer websocket transport when model opts in (#11386)
Summary
- add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
in all fixtures and constructors
- wire the new flag into websocket selection so models that opt in
always use websocket transport even when the feature gate is off

Testing
- Not run (not requested)
2026-02-10 18:50:48 -08:00
pakrym-oai
bfd4e2112c
Disable very flaky tests (#11394)
Collected from last 20 builds of main in
https://github.com/openai/codex/commits/main/.
2026-02-10 18:50:11 -08:00
jif-oai
847a6092e6
fix: reduce usage of open_if_present (#11344) 2026-02-10 19:25:07 +00:00
jif-oai
a364dd8b56
feat: opt-out of events in the app-server (#11319)
Add `optOutNotificationMethods` in the app-server to opt-out events
based on exact method matching
2026-02-10 18:04:52 +00:00
Shijie Rao
c4b771a16f
Fix: update parallel tool call exec approval to approve on request id (#11162)
### Summary

In parallel tool call, exec command approvals were not approved at
request level but at a turn level. i.e. when a single request is
approved, the system currently treats all requests in turn as approved.

### Before

https://github.com/user-attachments/assets/d50ed129-b3d2-4b2f-97fa-8601eb11f6a8

### After

https://github.com/user-attachments/assets/36528a43-a4aa-4775-9e12-f13287ef19fc
2026-02-10 09:38:00 -08:00
Max Johnson
47356ff83c
Revert "Add app-server transport layer with websocket support (#10693)" (#11323)
Suspected cause of deadlocking bug
2026-02-10 17:37:49 +00:00
Michael Bolin
44ebf4588f
feat: retain NetworkProxy, when appropriate (#11207)
As of this PR, `SessionServices` retains a
`Option<StartedNetworkProxy>`, if appropriate.

Now the `network` field on `Config` is `Option<NetworkProxySpec>`
instead of `Option<NetworkProxy>`.

Over in `Session::new()`, we invoke `NetworkProxySpec::start_proxy()` to
create the `StartedNetworkProxy`, which is a new struct that retains the
`NetworkProxy` as well as the `NetworkProxyHandle`. (Note that `Drop` is
implemented for `NetworkProxyHandle` to ensure the proxies are shutdown
when it is dropped.)

The `NetworkProxy` from the `StartedNetworkProxy` is threaded through to
the appropriate places.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/11207).
* #11285
* __->__ #11207
2026-02-10 02:09:23 -08:00
Matthew Zeng
005e040f97
[apps] Add thread_id param to optionally load thread config for apps feature check. (#11279)
- [x] Add thread_id param to optionally load thread config for apps
feature check
2026-02-09 23:10:26 -08:00
Owen Lin
53741013ab
fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
…ount_id and chatgpt_plan_type

### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012

Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).

The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
2026-02-09 20:48:58 -08:00
Dylan Hurd
168c359b71
Adjust shell command timeouts for Windows (#11247)
Summary
- add platform-aware defaults for shell command timeouts so Windows
tests get longer waits
- keep medium timeout longer on Windows to ensure flakiness is reduced

Testing
- Not run (not requested)
2026-02-09 20:03:32 -08:00
Josh McKinney
de59e550c0
test: deflake nextest child-process leak in MCP harnesses (#11263)
## Summary
- add deterministic child-process cleanup to both test `McpProcess`
helpers
- keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
polling in `Drop`
- document the failure mode and why this avoids nondeterministic `LEAK`
flakes

## Why
`cargo nextest` leak detection can intermittently report `LEAK` when a
spawned server outlives test teardown, making CI flaky.

## Testing
- `just fmt`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`


## Failing CI Reference
- Original failing job:
https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
2026-02-10 03:43:24 +00:00
xl-openai
a33ee46e3b
feat: extend skills/list to support additional roots. (#10835)
Add an optional perCwdExtraUserRoots
2026-02-09 13:30:38 -08:00
gt-oai
9fe925b15a
Load requirements on windows (#10770)
We support requirements on Unix, loading from
`/etc/codex/requirements.toml`. On MacOS, we also support MDM.

Now, on Windows, we'll load requirements from
`%ProgramData%\OpenAI\Codex\requirements.toml`
2026-02-09 16:05:38 +00:00
jif-oai
6cf61725d0
feat: do not close unified exec processes across turns (#10799)
With this PR we do not close the unified exec processes (i.e. background
terminals) at the end of a turn unless:
* The user interrupt the turn
* The user decide to clean the processes through `app-server` or
`/clean`

I made sure that `codex exec` correctly kill all the processes
2026-02-09 10:27:46 +00:00
Michael Bolin
383b45279e
feat: include NetworkConfig through ExecParams (#11105)
This PR adds the following field to `Config`:

```rust
pub network: Option<NetworkProxy>,
```

Though for the moment, it will always be initialized as `None` (this
will be addressed in a subsequent PR).

This PR does the work to thread `network` through to `execute_exec_env()`, `process_exec_tool_call()`, and `UnifiedExecRuntime.run()` to ensure it is available whenever we span a process.
2026-02-09 03:32:17 +00:00
Matthew Zeng
45b7763c3f
[apps] Improve app loading. (#10994)
There are two concepts of apps that we load in the harness:

- Directory apps, which is all the apps that the user can install.
- Accessible apps, which is what the user actually installed and can be
$ inserted and be used by the model. These are extracted from the tools
that are loaded through the gateway MCP.

Previously we wait for both sets of apps before returning the full apps
list. Which causes many issues because accessible apps won't be
available to the UI or the model if directory apps aren't loaded or
failed to load.

In this PR we are separating them so that accessible apps can be loaded
separately and are instantly available to be shown in the UI and to be
provided in model context. We also added an app-server event so that
clients can subscribe to also get accessible apps without being blocked
on the full app list.

- [x] Separate accessible apps and directory apps loading.
- [x] `app/list` request will also emit `app/list/updated` notifications
that app-server clients can subscribe. Which allows clients to get
accessible apps list to render in the $ menu without being blocked by
directory apps.
- [x] Cache both accessible and directory apps with 1 hour TTL to avoid
reloading them when creating new threads.
- [x] TUI improvements to redraw $ menu and /apps menu when app list is
updated.
2026-02-08 15:24:56 -08:00
Matthew Zeng
9f1009540b
Upgrade rmcp to 0.14 (#10718)
- [x] Upgrade rmcp to 0.14
2026-02-08 15:07:53 -08:00
Eric Traut
b3de6c7f2b
Defer persistence of rollout file (#11028)
- Defer rollout persistence for fresh threads (`InitialHistory::New`):
keep rollout events in memory and only materialize rollout file + state
DB row on first `EventMsg::UserMessage`.
- Keep precomputed rollout path available before materialization.
- Change `thread/start` to build thread response from live config
snapshot and optional precomputed path.
- Improve pre-materialization behavior in app-server/TUI: clearer
invalid-request errors for file-backed ops and a friendlier `/fork` “not
ready yet” UX.
- Update tests to match deferred semantics across
start/read/archive/unarchive/fork/resume/review flows.
- Improved resilience of user_shell test, which should be unrelated to
this change but must be affected by timing changes

For Reviewers:
* The primary change is in recorder.rs
* Most of the other changes were to fix up broken assumptions in
existing tests

Testing:
* Manually tested CLI
* Exercised app server paths by manually running IDE Extension with
rebuilt CLI binary
* Only user-visible change is that `/fork` in TUI generates visible
error if used prior to first turn
2026-02-07 23:05:03 -08:00
Charley Cunningham
e6662d6387
app-server: treat null mode developer instructions as built-in defaults (#10983)
## Summary
- make `turn/start` normalize
`collaborationMode.settings.developer_instructions: null` to the
built-in instructions for the selected mode
- prevent app-server clients from accidentally clearing mode-switch
developer instructions by sending `null`
- document this behavior in the v2 protocol and app-server docs

## What changed
- `codex-rs/app-server/src/codex_message_processor.rs`
  - added a small `normalize_turn_start_collaboration_mode` helper
  - in `turn_start`, apply normalization before `OverrideTurnContext`
- `codex-rs/app-server/tests/suite/v2/turn_start.rs`
- extended `turn_start_accepts_collaboration_mode_override_v2` to assert
the outgoing request includes default-mode instruction text when the
client sends `developer_instructions: null`
- `codex-rs/app-server-protocol/src/protocol/v2.rs`
- clarified `TurnStartParams.collaboration_mode` docs:
`settings.developer_instructions: null` means use built-in mode
instructions
- regenerated schema fixture:
- `codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts`
- docs:
  - `codex-rs/app-server/README.md`
  - `codex-rs/docs/codex_mcp_interface.md`
2026-02-07 12:59:41 -08:00
viyatb-oai
739908a12c
feat(core): add network constraints schema to requirements.toml (#10958)
## Summary

Add `requirements.toml` schema support for admin-defined network
constraints in the requirements layer

example config:

```
[experimental_network]
enabled = true
allowed_domains = ["api.openai.com"]
denied_domains = ["example.com"]
```
2026-02-07 19:48:24 +00:00
jif-oai
62605fa471
Add resume_agent collab tool (#10903)
Summary
- add the new resume_agent collab tool path through core, protocol, and
the app server API, including the resume events
- update the schema/TypeScript definitions plus docs so resume_agent
appears in generated artifacts and README
- note that resumed agents rehydrate rollout history without overwriting
their base instructions

Testing
- Not run (not requested)
2026-02-07 17:31:45 +01:00
Michael Bolin
a118494323
feat: add support for allowed_web_search_modes in requirements.toml (#10964)
This PR makes it possible to disable live web search via an enterprise
config even if the user is running in `--yolo` mode (though cached web
search will still be available). To do this, create
`/etc/codex/requirements.toml` as follows:

```toml
# "live" is not allowed; "disabled" is allowed even though not listed explicitly.
allowed_web_search_modes = ["cached"]
```

Or set `requirements_toml_base64` MDM as explained on
https://developers.openai.com/codex/security/#locations.

### Why
- Enforce admin/MDM/`requirements.toml` constraints on web-search
behavior, independent of user config and per-turn sandbox defaults.
- Ensure per-turn config resolution and review-mode overrides never
crash when constraints are present.

### What
- Add `allowed_web_search_modes` to requirements parsing and surface it
in app-server v2 `ConfigRequirements` (`allowedWebSearchModes`), with
fixtures updated.
- Define a requirements allowlist type (`WebSearchModeRequirement`) and
normalize semantics:
  - `disabled` is always implicitly allowed (even if not listed).
  - An empty list is treated as `["disabled"]`.
- Make `Config.web_search_mode` a `Constrained<WebSearchMode>` and apply
requirements via `ConstrainedWithSource<WebSearchMode>`.
- Update per-turn resolution (`resolve_web_search_mode_for_turn`) to:
- Prefer `Live → Cached → Disabled` when
`SandboxPolicy::DangerFullAccess` is active (subject to requirements),
unless the user preference is explicitly `Disabled`.
- Otherwise, honor the user’s preferred mode, falling back to an allowed
mode when necessary.
- Update TUI `/debug-config` and app-server mapping to display
normalized `allowed_web_search_modes` (including implicit `disabled`).
- Fix web-search integration tests to assert cached behavior under
`SandboxPolicy::ReadOnly` (since `DangerFullAccess` legitimately prefers
`live` when allowed).
2026-02-07 05:55:15 +00:00
Eric Traut
4d52428fa2
Fixed a flaky test (#10970)
## Summary

Stabilize v2 review integration tests by making them hermetic with
respect to model discovery.

`app-server` review tests were intermittently timing out in CI
(especially on Windows runners) because their test config allowed remote
model refresh. During `thread/start`, the test process could issue live
`/v1/models` requests, introducing external network latency and
nondeterministic timing before review flow assertions.

This change disables remote model fetching in the review test config
helper used by these tests.
2026-02-06 21:26:26 -08:00
Javi
87ce50f118
app-server: print help message to console when starting websockets server (#10943)
Follow-up to https://github.com/openai/codex/pull/10693

<img width="596" height="77" alt="image"
src="https://github.com/user-attachments/assets/9140df70-01d1-4c5a-85ee-ca15a09a0e77"
/>
2026-02-07 00:18:42 +00:00