Commit graph

107 commits

Author SHA1 Message Date
Jack Mousseau
e6b93841c5
Add request permissions tool (#13092)
Adds a built-in `request_permissions` tool and wires it through the
Codex core, protocol, and app-server layers so a running turn can ask
the client for additional permissions instead of relying on a static
session policy.

The new flow emits a `RequestPermissions` event from core, tracks the
pending request by call ID, forwards it through app-server v2 as an
`item/permissions/requestApproval` request, and resumes the tool call
once the client returns an approved subset of the requested permission
profile.
2026-03-08 20:23:06 -07:00
Ruslan Nigmatullin
e9bd8b20a1
app-server: Add streaming and tty/pty capabilities to command/exec (#13640)
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
2026-03-06 17:30:17 -08:00
sayan-oai
014a59fb0b
check app auth in plugin/install (#13685)
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.

checks are best effort based on `codex_apps` loading, much like
`app/list`.

#### Tests
Added integration tests, tested locally.
2026-03-06 06:45:00 +00:00
xl-openai
520ed724d2
support plugin/list. (#13540)
Introduce a plugin/list which reads from local marketplace.json.
Also update the signature for plugin/install.
2026-03-05 21:58:50 -05:00
sayan-oai
03d55f0e6f
chore: add web_search_tool_type for image support (#13538)
add `web_search_tool_type` on model_info that can be populated from
backend. will be used to filter which models can use `web_search` with
images and which cant.

added small unit test.
2026-03-05 07:02:27 +00:00
Curtis 'Fjord' Hawthorne
b92146d48b
Add under-development original-resolution view_image support (#13050)
## Summary

Add original-resolution support for `view_image` behind the
under-development `view_image_original_resolution` feature flag.

When the flag is enabled and the target model is `gpt-5.3-codex` or
newer, `view_image` now preserves original PNG/JPEG/WebP bytes and sends
`detail: "original"` to the Responses API instead of using the legacy
resize/compress path.

## What changed

- Added `view_image_original_resolution` as an under-development feature
flag.
- Added `ImageDetail` to the protocol models and support for serializing
`detail: "original"` on tool-returned images.
- Added `PromptImageMode::Original` to `codex-utils-image`.
  - Preserves original PNG/JPEG/WebP bytes.
  - Keeps legacy behavior for the resize path.
- Updated `view_image` to:
- use the shared `local_image_content_items_with_label_number(...)`
helper in both code paths
  - select original-resolution mode only when:
    - the feature flag is enabled, and
    - the model slug parses as `gpt-5.3-codex` or newer
- Kept local user image attachments on the existing resize path; this
change is specific to `view_image`.
- Updated history/image accounting so only `detail: "original"` images
use the docs-based GPT-5 image cost calculation; legacy images still use
the old fixed estimate.
- Added JS REPL guidance, gated on the same feature flag, to prefer JPEG
at 85% quality unless lossless is required, while still allowing other
formats when explicitly requested.
- Updated tests and helper code that construct
`FunctionCallOutputContentItem::InputImage` to carry the new `detail`
field.

## Behavior

### Feature off
- `view_image` keeps the existing resize/re-encode behavior.
- History estimation keeps the existing fixed-cost heuristic.

### Feature on + `gpt-5.3-codex+`
- `view_image` sends original-resolution images with `detail:
"original"`.
- PNG/JPEG/WebP source bytes are preserved when possible.
- History estimation uses the GPT-5 docs-based image-cost calculation
for those `detail: "original"` images.


#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/13050
-  `2` https://github.com/openai/codex/pull/13331
-  `3` https://github.com/openai/codex/pull/13049
2026-03-03 15:56:54 -08:00
joeytrasatti-openai
935754baa3
Add thread metadata update endpoint to app server (#13280)
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly

## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
    - `suite::grep_files::grep_files_tool_collects_matches`
    - `suite::grep_files::grep_files_tool_reports_empty_results`
2026-03-03 15:56:11 -08:00
Owen Lin
167158f93c
chore(app-server): delete v1 RPC methods and notifications (#13375)
## Summary
This removes the old app-server v1 methods and notifications we no
longer need, while keeping the small set the main codex app client still
depends on for now.

The remaining legacy surface is:
- `initialize`
- `getConversationSummary`
- `getAuthStatus`
- `gitDiffToRemote`
- `fuzzyFileSearch`
- `fuzzyFileSearch/sessionStart`
- `fuzzyFileSearch/sessionUpdate`
- `fuzzyFileSearch/sessionStop`

And the raw `codex/event/*` notifications emitted from core. These
notifications will be removed in a followup PR.

## What changed
- removed deprecated v1 request variants from the protocol and
app-server dispatcher
- removed deprecated typed notifications: `authStatusChange`,
`loginChatGptComplete`, and `sessionConfigured`
- updated the app-server test client to use v2 flows instead of deleted
v1 flows
- deleted legacy-only app-server test suites and added focused coverage
for `getConversationSummary`
- regenerated app-server schema fixtures and updated the MCP interface
docs to match the remaining compatibility surface

## Testing
- `just write-app-server-schema`
- `cargo test -p codex-app-server-protocol`
- `cargo test -p codex-app-server`
2026-03-03 13:18:25 -08:00
Owen Lin
d473e8d56d
feat(app-server): add tracing to all app-server APIs (#13285)
### Overview
This PR adds the first piece of tracing for app-server JSON-RPC
requests.

There are two main changes:
- JSON-RPC requests can now take an optional W3C trace context at the
top level via a `trace` field (`traceparent` / `tracestate`).
- app-server now creates a dedicated request span for every inbound
JSON-RPC request in `MessageProcessor`, and uses the request-level trace
context as the parent when present.

For compatibility with existing flows, app-server still falls back to
the TRACEPARENT env var when there is no request-level traceparent.

This PR is intentionally scoped to the app-server boundary. In a
followup, we'll actually propagate trace context through the async
handoff into core execution spans like run_turn, which will make
app-server traces much more useful.

### Spans
A few details on the app-server span shape:
- each inbound request gets its own server span
- span/resource names are based on the JSON-RPC method (`initialize`,
`thread/start`, `turn/start`, etc.)
- spans record transport (stdio vs websocket), request id, connection
id, and client name/version when available
- `initialize` stores client metadata in session state so later requests
on the same connection can reuse it
2026-03-02 16:01:41 -08:00
jif-oai
b649953845
feat: polluted memories (#13008)
Add a feature flag to disable memory creation for "polluted"
2026-03-02 11:57:32 +00:00
Ahmed Ibrahim
4d180ae428
Add model availability NUX metadata (#12972)
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
2026-02-26 22:02:57 -08:00
pakrym-oai
ba41e84a50
Use model catalog default for reasoning summary fallback (#12873)
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata

## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-26 09:31:13 -08:00
Owen Lin
21f7032dbb
feat(app-server): thread/unsubscribe API (#10954)
Adds a new v2 app-server API for a client to be able to unsubscribe to a
thread:
- New RPC method: `thread/unsubscribe`
- New server notification: `thread/closed`

Today clients can start/resume/archive threads, but there wasn’t a way
to explicitly unload a live thread from memory without archiving it.
With `thread/unsubscribe`, a client can indicate it is no longer
actively working with a live Thread. If this is the only client
subscribed to that given thread, the thread will be automatically closed
by app-server, at which point the server will send `thread/closed` and
`thread/status/changed` with `status: notLoaded` notifications.

This gives clients a way to prevent long-running app-server processes
from accumulating too many thread (and related) objects in memory.

Closed threads will also be removed from `thread/loaded/list`.
2026-02-25 13:14:30 -08:00
Ahmed Ibrahim
947092283a
Add app-server v2 thread realtime API (#12715)
Add experimental `thread/realtime/*` v2 requests and notifications, then
route app-server realtime events through that thread-scoped surface with
integration coverage.

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-25 09:59:10 -08:00
daveaitel-openai
dcab40123f
Agent jobs (spawn_agents_on_csv) + progress UI (#10935)
## Summary
- Add agent job support: spawn a batch of sub-agents from CSV, auto-run,
auto-export, and store results in SQLite.
- Simplify workflow: remove run/resume/get-status/export tools; spawn is
deterministic and completes in one call.
- Improve exec UX: stable, single-line progress bar with ETA; suppress
sub-agent chatter in exec.

## Why
Enables map-reduce style workflows over arbitrarily large repos using
the existing Codex orchestrator. This addresses review feedback about
overly complex job controls and non-deterministic monitoring.

## Demo (progress bar)
```
./codex-rs/target/debug/codex exec \
  --enable collab \
  --enable sqlite \
  --full-auto \
  --progress-cursor \
  -c agents.max_threads=16 \
  -C /Users/daveaitel/code/codex \
  - <<'PROMPT'
Create /tmp/agent_job_progress_demo.csv with columns: path,area and 30 rows:
path = item-01..item-30, area = test.

Then call spawn_agents_on_csv with:
- csv_path: /tmp/agent_job_progress_demo.csv
- instruction: "Run `python - <<'PY'` to sleep a random 0.3–1.2s, then output JSON with keys: path, score (int). Set score = 1."
- output_csv_path: /tmp/agent_job_progress_demo_out.csv
PROMPT
```

## Review feedback addressed
- Auto-start jobs on spawn; removed run/resume/status/export tools.
- Auto-export on success.
- More descriptive tool spec + clearer prompts.
- Avoid deadlocks on spawn failure; pending/running handled safely.
- Progress bar no longer scrolls; stable single-line redraw.

## Tests
- `cd codex-rs && cargo test -p codex-exec`
- `cd codex-rs && cargo build -p codex-cli`
2026-02-24 21:00:19 +00:00
sayan-oai
7e46e5b9c2
chore: rm hardcoded PRESETS list (#12650)
rm `PRESETS` list harcoded in `model_presets` as we now have bundled
`models.json` with equivalent info.

update logic to rely on bundled models instead, update tests.
2026-02-23 22:35:51 -08:00
natea-oai
936e744c93
Add field to Thread object for the latest rename set for a given thread (#12301)
Exposes through the app server updated names set for a thread. This
enables other surfaces to use the core as the source of truth for thread
naming. `threadName` is gathered using the helper functions used to
interact with `session_index.jsonl`, and is hydrated in:
- `thread/list`
- `thread/read`
- `thread/resume`
- `thread/unarchive`
- `thread/rollback`

We don't do this for `thread/start` and `thread/fork`.
2026-02-20 18:26:57 -08:00
jif-oai
0f9eed3a6f
feat: add nick name to sub-agents (#12320)
Adding random nick name to sub-agents. Used for UX

At the same time, also storing and wiring the role of the sub-agent
2026-02-20 14:39:49 +00:00
Michael Bolin
4fa304306b
tests: centralize in-flight turn cleanup helper (#12271)
## Why

Several tests intentionally exercise behavior while a turn is still
active. The cleanup sequence for those tests (`turn/interrupt` + waiting
for `codex/event/turn_aborted`) was duplicated across files, which made
the rationale easy to lose and the pattern easy to apply inconsistently.

This change centralizes that cleanup in one place with a single
explanatory doc comment.

## What Changed

### Added shared helper

In `codex-rs/app-server/tests/common/mcp_process.rs`:

- Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`.
- Added a doc comment explaining why explicit interrupt + terminal wait
is required for tests that intentionally leave a turn in-flight.

### Migrated call sites

Replaced duplicated interrupt/aborted blocks with the helper in:

- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
  - `thread_resume_rejects_history_when_thread_is_running`
  - `thread_resume_rejects_mismatched_path_when_thread_is_running`
- `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs`
  - `turn_start_shell_zsh_fork_executes_command_v2`
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- `codex-rs/app-server/tests/suite/v2/turn_steer.rs`
  - `turn_steer_returns_active_turn_id`

### Existing cleanup retained

In `codex-rs/app-server/tests/suite/v2/turn_start.rs`:

- `turn_start_accepts_local_image_input` continues to explicitly wait
for `turn/completed` so the turn lifecycle is fully drained before test
exit.

## Verification

- `cargo test -p codex-app-server`
2026-02-20 01:47:34 +00:00
Michael Bolin
2f3d0b186b
app-server tests: reduce intermittent nextest LEAK via graceful child shutdown (#12266)
## Why
`cargo nextest` was intermittently reporting `LEAK` for
`codex-app-server` tests even when assertions passed. This adds noise
and flakiness to local/CI signals.

Sample output used as the basis of this investigation:

```text
LEAK [   7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1
LEAK [   7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model
LEAK [   7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests
LEAK [   8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2
LEAK [   8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item
LEAK [   8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2
LEAK [   6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2
LEAK [   6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2
```

## How I Reproduced
I focused on the suspect tests and ran them under `nextest` stress mode
with leak reporting enabled.

```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) | test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) | test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) | test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) | test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) | test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) | test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)'
```

This reproduced intermittent `LEAK` statuses while tests still passed.

## What Changed
In `codex-rs/app-server/tests/common/mcp_process.rs`:

- Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown
can explicitly close stdin.
- In `Drop`, close stdin first to trigger EOF-based graceful shutdown.
- Wait briefly for graceful exit.
- If still running, fall back to `start_kill()` and the existing bounded
`try_wait()` loop.
- Updated send-path handling to bail if stdin is already closed.

## Why This Is the Right Fix
The leak signal was caused by child-process teardown timing, not
test-logic assertion failure. The helper previously relied mostly on
force-kill timing in `Drop`; that can race with nextest leak detection.

Closing stdin first gives `codex-app-server` a deterministic, graceful
shutdown path before force-kill. Keeping the force-kill fallback
preserves robustness if graceful shutdown does not complete in time.

## Verification
- `cargo test -p codex-app-server`
- Re-ran the stress repro above after this change: no `LEAK` statuses
observed.
- Additional high-signal stress run also showed no leaks:

```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)'
```
2026-02-19 20:19:42 +00:00
Charley Cunningham
abb018383f
Undo stack size Bazel test hack (#12258)
Undo hack from https://github.com/openai/codex/pull/12203/changes
2026-02-19 11:04:45 -08:00
Charley Cunningham
16c3c47535
Stabilize app-server detached review and running-resume tests (#12203)
## Summary
- stabilize
`thread_resume_rejoins_running_thread_even_with_override_mismatch` by
using a valid delayed second SSE response instead of an intentionally
truncated stream
- set `RUST_MIN_STACK=4194304` for spawned app-server test processes in
`McpProcess` to avoid stack-sensitive CI overflows in detached review
tests

## Why
- the thread-resume assertion could race with a mocked stream-disconnect
error and intermittently observe `systemError`
- detached review startup is stack-sensitive in some CI environments;
pinning a larger stack in the test harness removes that flake without
changing product behavior

## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
- `cargo test -p codex-app-server --test all
suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
2026-02-18 19:05:35 -08:00
iceweasel-oai
292542616a
app-server support for Windows sandbox setup. (#12025)
app-server support for initiating Windows sandbox setup.
server responds quickly to setup request and makes a future RPC call
back to client when the setup finishes.

The TUI implementation is unaffected but in a future PR I'll update the
TUI to use the shared setup helper
(`windows_sandbox.run_windows_sandbox_setup`)
2026-02-18 13:03:16 -08:00
sayan-oai
41800fc876
chore: rm remote models fflag (#11699)
rm `remote_models` feature flag.

We see issues like #11527 when a user has `remote_models` disabled, as
we always use the default fallback `ModelInfo`. This causes issues with
model performance.

Builds on #11690, which helps by warning the user when they are using
the default fallback. This PR will make that happen much less frequently
as an accidental consequence of disabling `remote_models`.
2026-02-17 11:43:16 -08:00
sayan-oai
060a320e7d
fix: show user warning when using default fallback metadata (#11690)
### What
It's currently unclear when the harness falls back to the default,
generic `ModelInfo`. This happens when the `remote_models` feature is
disabled or the model is truly unknown, and can lead to bad performance
and issues in the harness.

Add a user-facing warning when this happens so they are aware when their
setup is broken.

### Tests
Added tests, tested locally.
2026-02-15 18:46:05 -08:00
Jeremy Rose
66e0c3aaa3
app-server: add fuzzy search sessions for streaming file search (#10268) 2026-02-12 10:49:44 -08:00
Max Johnson
7053aa5457
Reapply "Add app-server transport layer with websocket support" (#11370)
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.

This reverts commit 47356ff83c.

## Summary

To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
   - inbound handling (`transport_event_rx`)
   - outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks

## Validation

Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests

<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>

---------

Co-authored-by: jif-oai <jif@openai.com>
2026-02-11 18:13:39 +00:00
Michael Bolin
476c1a7160
Remove test-support feature from codex-core and replace it with explicit test toggles (#11405)
## Why

`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.

## Net Change

- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.

## Outcome

- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
2026-02-10 22:44:02 -08:00
pakrym-oai
c68999ee6d
Prefer websocket transport when model opts in (#11386)
Summary
- add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
in all fixtures and constructors
- wire the new flag into websocket selection so models that opt in
always use websocket transport even when the feature gate is off

Testing
- Not run (not requested)
2026-02-10 18:50:48 -08:00
jif-oai
a364dd8b56
feat: opt-out of events in the app-server (#11319)
Add `optOutNotificationMethods` in the app-server to opt-out events
based on exact method matching
2026-02-10 18:04:52 +00:00
Owen Lin
53741013ab
fix(app-server): for external auth, replace id_token with chatgpt_acc… (#11240)
…ount_id and chatgpt_plan_type

### Summary
Following up on external auth mode which was introduced here:
https://github.com/openai/codex/pull/10012

Turns out some clients have a differently shaped ID token and don't have
a chosen workspace (aka chatgpt_account_id) encoded in their ID token.
So, let's replace `id_token` param with `chatgpt_account_id` and
`chatgpt_plan_type` (optional) when initializing the external ChatGPT
auth mode (`account/login/start` with `chatgptAuthTokens`).

The client was able to test end-to-end with a Codex build from this
branch and verified it worked!
2026-02-09 20:48:58 -08:00
Josh McKinney
de59e550c0
test: deflake nextest child-process leak in MCP harnesses (#11263)
## Summary
- add deterministic child-process cleanup to both test `McpProcess`
helpers
- keep Tokio `kill_on_drop(true)` but also reap via bounded `try_wait()`
polling in `Drop`
- document the failure mode and why this avoids nondeterministic `LEAK`
flakes

## Why
`cargo nextest` leak detection can intermittently report `LEAK` when a
spawned server outlives test teardown, making CI flaky.

## Testing
- `just fmt`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`


## Failing CI Reference
- Original failing job:
https://github.com/openai/codex/actions/runs/21845226299/job/63039443593?pr=11245
2026-02-10 03:43:24 +00:00
xl-openai
a33ee46e3b
feat: extend skills/list to support additional roots. (#10835)
Add an optional perCwdExtraUserRoots
2026-02-09 13:30:38 -08:00
Owen Lin
0d8b2b74c4
feat(app-server): turn/steer API (#10821)
This PR adds a dedicated `turn/steer` API for appending user input to an
in-flight turn.

## Motivation
Currently, steering in the app is implemented by just calling
`turn/start` while a turn is running. This has some really weird quirks:
- Client gets back a new `turn.id`, even though streamed
events/approvals remained tied to the original active turn ID.
- All the various turn-level override params on `turn/start` do not
apply to the "steer", and would only apply to the next real turn.
- There can also be a race condition where the client thinks the turn is
active but the server has already completed it, so there might be bugs
if the client has baked in some client-specific behavior thinking it's a
steer when in fact the server kicked off a new turn. This is
particularly possible when running a client against a remote app-server.

Having a dedicated `turn/steer` API eliminates all those quirks.

`turn/steer` behavior:
- Requires an active turn on threadId. Returns a JSON-RPC error if there
is no active turn.
- If expectedTurnId is provided, it must match the active turn (more
useful when connecting to a remote app-server).
- Does not emit `turn/started`.
- Does not accept turn overrides (`cwd`, `model`, `sandbox`, etc.) or
`outputSchema` to accurately reflect that these are not applied when
steering.
2026-02-06 00:35:04 +00:00
Matthew Zeng
7e81f63698
[app-server] Add a method to list experimental features. (#10721)
- [x] Add a method to list experimental features.
2026-02-05 20:04:01 +00:00
Ahmed Ibrahim
38a47700b5
Add thread/compact v2 (#10445)
- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.
2026-02-03 18:15:55 -08:00
Colin Young
7e07ec8f73
[Codex][CLI] Gate image inputs by model modalities (#10271)
###### Summary

- Add input_modalities to model metadata so clients can determine
supported input types.
- Gate image paste/attach in TUI when the selected model does not
support images.
- Block submits that include images for unsupported models and show a
clear warning.
- Propagate modality metadata through app-server protocol/model-list
responses.
  - Update related tests/fixtures.

  ###### Rationale

  - Models support different input modalities.
- Clients need an explicit capability signal to prevent unsupported
requests.
- Backward-compatible defaults preserve existing behavior when modality
metadata is absent.

  ###### Scope

  - codex-rs/protocol, codex-rs/core, codex-rs/tui
  - codex-rs/app-server-protocol, codex-rs/app-server
  - Generated app-server types / schema fixtures

  ###### Trade-offs

- Default behavior assumes text + image when field is absent for
compatibility.
  - Server-side validation remains the source of truth.

  ###### Follow-up

- Non-TUI clients should consume input_modalities to disable unsupported
attachments.
- Model catalogs should explicitly set input_modalities for text-only
models.

  ###### Testing

  - cargo fmt --all
  - cargo test -p codex-tui
  - env -u GITHUB_APP_KEY cargo test -p codex-core --lib
  - just write-app-server-schema
- cargo run -p codex-cli --bin codex -- app-server generate-ts --out
app-server-types
  - test against local backend
  
<img width="695" height="199" alt="image"
src="https://github.com/user-attachments/assets/d22dd04f-5eba-4db9-a7c5-a2506f60ec44"
/>

---------

Co-authored-by: Josh McKinney <joshka@openai.com>
2026-02-02 18:56:39 -08:00
Ahmed Ibrahim
b8addcddb9
Require models refresh on cli version mismatch (#10414) 2026-02-02 18:55:25 -08:00
jif-oai
3cc9122ee2
feat: experimental flags (#10231)
## Problem being solved
- We need a single, reliable way to mark app-server API surface as
experimental so that:
  1. the runtime can reject experimental usage unless the client opts in
2. generated TS/JSON schemas can exclude experimental methods/fields for
stable clients.

Right now that’s easy to drift or miss when done ad-hoc.

## How to declare experimental methods and fields
- **Experimental method**: add `#[experimental("method/name")]` to the
`ClientRequest` variant in `client_request_definitions!`.
- **Experimental field**: on the params struct, derive `ExperimentalApi`
and annotate the field with `#[experimental("method/name.field")]` + set
`inspect_params: true` for the method variant so
`ClientRequest::experimental_reason()` inspects params for experimental
fields.

## How the macro solves it
- The new derive macro lives in
`codex-rs/codex-experimental-api-macros/src/lib.rs` and is used via
`#[derive(ExperimentalApi)]` plus `#[experimental("reason")]`
attributes.
- **Structs**:
- Generates `ExperimentalApi::experimental_reason(&self)` that checks
only annotated fields.
  - The “presence” check is type-aware:
    - `Option<T>`: `is_some_and(...)` recursively checks inner.
    - `Vec`/`HashMap`/`BTreeMap`: must be non-empty.
    - `bool`: must be `true`.
    - Other types: considered present (returns `true`).
- Registers each experimental field in an `inventory` with `(type_name,
serialized field name, reason)` and exposes `EXPERIMENTAL_FIELDS` for
that type. Field names are converted from `snake_case` to `camelCase`
for schema/TS filtering.
- **Enums**:
- Generates an exhaustive `match` returning `Some(reason)` for annotated
variants and `None` otherwise (no wildcard arm).
- **Wiring**:
- Runtime gating uses `ExperimentalApi::experimental_reason()` in
`codex-rs/app-server/src/message_processor.rs` to reject requests unless
`InitializeParams.capabilities.experimental_api == true`.
- Schema/TS export filters use the inventory list and
`EXPERIMENTAL_CLIENT_METHODS` from `client_request_definitions!` to
strip experimental methods/fields when `experimental_api` is false.
2026-02-02 11:06:50 +00:00
Michael Bolin
e6d913af2d
chore: rename ChatGpt -> Chatgpt in type names (#10244)
When using ChatGPT in names of types, we should be consistent, so this
renames some types with `ChatGpt` in the name to `Chatgpt`. From
https://rust-lang.github.io/api-guidelines/naming.html:

> In `UpperCamelCase`, acronyms and contractions of compound words count
as one word: use `Uuid` rather than `UUID`, `Usize` rather than `USize`
or `Stdin` rather than `StdIn`. In `snake_case`, acronyms and
contractions are lower-cased: `is_xid_start`.

This PR updates existing uses of `ChatGpt` and changes them to
`Chatgpt`. Though in all cases where it could affect the wire format, I
visually inspected that we don't change anything there. That said, this
_will_ change the codegen because it will affect the spelling of type
names.

For example, this renames `AuthMode::ChatGPT` to `AuthMode::Chatgpt` in
`app-server-protocol`, but the wire format is still `"chatgpt"`.

This PR also updates a number of types in `codex-rs/core/src/auth.rs`.
2026-01-30 11:18:39 -08:00
Charley Cunningham
ec4a2d07e4
Plan mode: stream proposed plans, emit plan items, and render in TUI (#9786)
## Summary
- Stream proposed plans in Plan Mode using `<proposed_plan>` tags parsed
in core, emitting plan deltas plus a plan `ThreadItem`, while stripping
tags from normal assistant output.
- Persist plan items and rebuild them on resume so proposed plans show
in thread history.
- Wire plan items/deltas through app-server protocol v2 and render a
dedicated proposed-plan view in the TUI, including the “Implement this
plan?” prompt only when a plan item is present.

## Changes

### Core (`codex-rs/core`)
- Added a generic, line-based tag parser that buffers each line until it
can disprove a tag prefix; implements auto-close on `finish()` for
unterminated tags. `codex-rs/core/src/tagged_block_parser.rs`
- Refactored proposed plan parsing to wrap the generic parser.
`codex-rs/core/src/proposed_plan_parser.rs`
- In plan mode, stream assistant deltas as:
  - **Normal text** → `AgentMessageContentDelta`
  - **Plan text** → `PlanDelta` + `TurnItem::Plan` start/completion  
  (`codex-rs/core/src/codex.rs`)
- Final plan item content is derived from the completed assistant
message (authoritative), not necessarily the concatenated deltas.
- Strips `<proposed_plan>` blocks from assistant text in plan mode so
tags don’t appear in normal messages.
(`codex-rs/core/src/stream_events_utils.rs`)
- Persist `ItemCompleted` events only for plan items for rollout replay.
(`codex-rs/core/src/rollout/policy.rs`)
- Guard `update_plan` tool in Plan Mode with a clear error message.
(`codex-rs/core/src/tools/handlers/plan.rs`)
- Updated Plan Mode prompt to:  
  - keep `<proposed_plan>` out of non-final reasoning/preambles  
  - require exact tag formatting  
  - allow only one `<proposed_plan>` block per turn  
  (`codex-rs/core/templates/collaboration_mode/plan.md`)

### Protocol / App-server protocol
- Added `TurnItem::Plan` and `PlanDeltaEvent` to core protocol items.
(`codex-rs/protocol/src/items.rs`, `codex-rs/protocol/src/protocol.rs`)
- Added v2 `ThreadItem::Plan` and `PlanDeltaNotification` with
EXPERIMENTAL markers and note that deltas may not match the final plan
item. (`codex-rs/app-server-protocol/src/protocol/v2.rs`)
- Added plan delta route in app-server protocol common mapping.
(`codex-rs/app-server-protocol/src/protocol/common.rs`)
- Rebuild plan items from persisted `ItemCompleted` events on resume.
(`codex-rs/app-server-protocol/src/protocol/thread_history.rs`)

### App-server
- Forward plan deltas to v2 clients and map core plan items to v2 plan
items. (`codex-rs/app-server/src/bespoke_event_handling.rs`,
`codex-rs/app-server/src/codex_message_processor.rs`)
- Added v2 plan item tests.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)

### TUI
- Added a dedicated proposed plan history cell with special background
and padding, and moved “• Proposed Plan” outside the highlighted block.
(`codex-rs/tui/src/history_cell.rs`, `codex-rs/tui/src/style.rs`)
- Only show “Implement this plan?” when a plan item exists.
(`codex-rs/tui/src/chatwidget.rs`,
`codex-rs/tui/src/chatwidget/tests.rs`)

<img width="831" height="847" alt="Screenshot 2026-01-29 at 7 06 24 PM"
src="https://github.com/user-attachments/assets/69794c8c-f96b-4d36-92ef-c1f5c3a8f286"
/>

### Docs / Misc
- Updated protocol docs to mention plan deltas.
(`codex-rs/docs/protocol_v1.md`)
- Minor plumbing updates in exec/debug clients to tolerate plan deltas.
(`codex-rs/debug-client/src/reader.rs`, `codex-rs/exec/...`)

## Tests
- Added core integration tests:
  - Plan mode strips plan from agent messages.
  - Missing `</proposed_plan>` closes at end-of-message.  
  (`codex-rs/core/tests/suite/items.rs`)
- Added unit tests for generic tag parser (prefix buffering, non-tag
lines, auto-close). (`codex-rs/core/src/tagged_block_parser.rs`)
- Existing app-server plan item tests in v2.
(`codex-rs/app-server/tests/suite/v2/plan_item.rs`)

## Notes / Behavior
- Plan output no longer appears in standard assistant text in Plan Mode;
it streams via `PlanDelta` and completes as a `TurnItem::Plan`.
- The final plan item content is authoritative and may diverge from
streamed deltas (documented as experimental).
- Reasoning summaries are not filtered; prompt instructs the model not
to include `<proposed_plan>` outside the final plan message.

## Codex Author
`codex fork 019bec2d-b09d-7450-b292-d7bcdddcdbfb`
2026-01-30 18:59:30 +00:00
Shijie Rao
a0ccef9d5c
Chore: plan mode do not include free form question and always include isOther (#10210)
We should never ask a freeform question when planning and we should
always include isOther as an escape hatch.
2026-01-30 01:19:24 -08:00
Dylan Hurd
e3ab0bd973
chore(personality) new schema with fallbacks (#10147)
## Summary
Let's dial in this api contract in a bit more with more robust fallback
behavior when model_instructions_template is false.

Switches to a more explicit template / variables structure, with more
fallbacks.

## Testing
- [x] Adding unit tests
- [x] Tested locally
2026-01-30 00:10:12 -07:00
Celia Chen
7151387474
[feat] persist dynamic tools in session rollout file (#10130)
Add dynamic tools to rollout file for persistence & read from rollout on
resume. Ran a real example and spotted the following in the rollout
file:
```
{"timestamp":"2026-01-29T01:27:57.468Z","type":"session_meta","payload":{"id":"019c075d-3f0b-77e3-894e-c1c159b04b1e","timestamp":"2026-01-29T01:27:57.451Z","...."dynamic_tools":[{"name":"demo_tool","description":"Demo dynamic tool","inputSchema":{"additionalProperties":false,"properties":{"city":{"type":"string"}},"required":["city"],"type":"object"}}],"git":{"commit_hash":"ebc573f15c01b8af158e060cfedd401f043e9dfa","branch":"dev/cc/dynamic-tools","repository_url":"https://github.com/openai/codex.git"}}}
```
2026-01-30 01:10:00 +00:00
Owen Lin
81a17bb2c1
feat(app-server): support external auth mode (#10012)
This enables a new use case where `codex app-server` is embedded into a
parent application that will directly own the user's ChatGPT auth
lifecycle, which means it owns the user’s auth tokens and refreshes it
when necessary. The parent application would just want a way to pass in
the auth tokens for codex to use directly.

The idea is that we are introducing a new "auth mode" currently only
exposed via app server: **`chatgptAuthTokens`** which consist of the
`id_token` (stores account metadata) and `access_token` (the bearer
token used directly for backend API calls). These auth tokens are only
stored in-memory. This new mode is in addition to the existing `apiKey`
and `chatgpt` auth modes.

This PR reuses the shape of our existing app-server account APIs as much
as possible:
- Update `account/login/start` with a new `chatgptAuthTokens` variant,
which will allow the client to pass in the tokens and have codex
app-server use them directly. Upon success, the server emits
`account/login/completed` and `account/updated` notifications.
- A new server->client request called
`account/chatgptAuthTokens/refresh` which the server can use whenever
the access token previously passed in has expired and it needs a new one
from the parent application.

I leveraged the core 401 retry loop which typically triggers auth token
refreshes automatically, but made it pluggable:
- **chatgpt** mode refreshes internally, as usual.
- **chatgptAuthTokens** mode calls the client via
`account/chatgptAuthTokens/refresh`, the client responds with updated
tokens, codex updates its in-memory auth, then retries. This RPC has a
10s timeout and handles JSON-RPC errors from the client.

Also some additional things:
- chatgpt logins are blocked while external auth is active (have to log
out first. typically clients will pick one OR the other, not support
both)
- `account/logout` clears external auth in memory
- Ensures that if `forced_chatgpt_workspace_id` is set via the user's
config, we respect it in both:
- `account/login/start` with `chatgptAuthTokens` (returns a JSON-RPC
error back to the client)
- `account/chatgptAuthTokens/refresh` (fails the turn, and on next
request app-server will send another `account/chatgptAuthTokens/refresh`
request to the client).
2026-01-29 23:46:04 +00:00
Ahmed Ibrahim
52609c6f42
Add app-server compaction item notifications tests (#10123)
- add v2 tests covering local + remote auto-compaction item
started/completed notifications
2026-01-29 01:00:38 +00:00
jif-oai
247fb2de64
[app-server] feat: add filtering on thread list (#9897) 2026-01-26 21:54:19 +00:00
Charley Cunningham
62266b13f8
Add thread/unarchive to restore archived rollouts (#9843)
## Summary
- Adds a new `thread/unarchive` RPC to move archived thread rollouts
back into the active `sessions/` tree.

## What changed
- **Protocol**
  - Adds `thread/unarchive` request/response types and wiring.
- **Server**
  - Implements `thread_unarchive` in the app server.
  - Validates the archived rollout path and thread ID.
- Restores the rollout to `sessions/YYYY/MM/DD/...` based on the rollout
filename timestamp.
- **Core**
- Adds `find_archived_thread_path_by_id_str` helper for archived
rollouts.
- **Docs**
  - Documents the new RPC and usage example.
- **Tests**
  - Adds an end-to-end server test that:
    1) starts a thread,
    2) archives it,
    3) unarchives it,
    4) asserts the file is restored to `sessions/`.

## How to use
```json
{ "method": "thread/unarchive", "id": 24, "params": { "threadId": "<thread-id>" } }
```

## Author Codex Session

`codex resume 019bf158-54b6-7960-a696-9d85df7e1bc1` (soon I'll make this
kind of session UUID forkable by anyone with the right
`session_object_storage_url` line in their config, but for now just
pasting it here for my reference)
2026-01-26 11:24:36 -08:00
Shijie Rao
3ba702c5b6
Feat: add isOther to question returned by request user input tool (#9890)
### Summary
Add `isOther` to question object from request_user_input tool input and
remove `other` option from the tool prompt to better handle tool input.
2026-01-26 09:52:38 -08:00
Matthew Zeng
a2c829a808
[connectors] Support connectors part 1 - App server & MCP (#9667)
In order to make Codex work with connectors, we add a built-in gateway
MCP that acts as a transparent proxy between the client and the
connectors. The gateway MCP collects actions that are accessible to the
user and sends them down to the user, when a connector action is chosen
to be called, the client invokes the action through the gateway MCP as
well.

 - [x] Add the system built-in gateway MCP to list and run connectors.
 - [x] Add the app server methods and protocol
2026-01-22 16:48:43 -08:00