Emit the following events around the collab tools. On the `app-server`
this will be under `item/started` and `item/completed`
```
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
/// beginning.
pub prompt: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the newly spawned agent, if it was created.
pub new_thread_id: Option<ThreadId>,
/// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
/// beginning.
pub prompt: String,
/// Last known status of the new agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
/// leaking at the beginning.
pub prompt: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
/// leaking at the beginning.
pub prompt: String,
/// Last known status of the receiver agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingBeginEvent {
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// ID of the waiting call.
pub call_id: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingEndEvent {
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// ID of the waiting call.
pub call_id: String,
/// Last known status of the receiver agent reported to the sender agent.
pub status: AgentStatus,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseBeginEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
}
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseEndEvent {
/// Identifier for the collab tool call.
pub call_id: String,
/// Thread ID of the sender.
pub sender_thread_id: ThreadId,
/// Thread ID of the receiver.
pub receiver_thread_id: ThreadId,
/// Last known status of the receiver agent reported to the sender agent before
/// the close.
pub status: AgentStatus,
}
```
## Problem
Codex’s TUI quit behavior has historically been easy to trigger
accidentally and hard to reason
about.
- `Ctrl+C`/`Ctrl+D` could terminate the UI immediately, which is a
common key to press while trying
to dismiss a modal, cancel a command, or recover from a stuck state.
- “Quit” and “shutdown” were not consistently separated, so some exit
paths could bypass the
shutdown/cleanup work that should run before the process terminates.
This PR makes quitting both safer (harder to do by accident) and more
uniform across quit
gestures, while keeping the shutdown-first semantics explicit.
## Mental model
After this change, the system treats quitting as a UI request that is
coordinated by the app
layer.
- The UI requests exit via `AppEvent::Exit(ExitMode)`.
- `ExitMode::ShutdownFirst` is the normal user path: the app triggers
`Op::Shutdown`, continues
rendering while shutdown runs, and only ends the UI loop once shutdown
has completed.
- `ExitMode::Immediate` exists as an escape hatch (and as the
post-shutdown “now actually exit”
signal); it bypasses cleanup and should not be the default for
user-triggered quits.
User-facing quit gestures are intentionally “two-step” for safety:
- `Ctrl+C` and `Ctrl+D` no longer exit immediately.
- The first press arms a 1-second window and shows a footer hint (“ctrl
+ <key> again to quit”).
- Pressing the same key again within the window requests a
shutdown-first quit; otherwise the
hint expires and the next press starts a fresh window.
Key routing remains modal-first:
- A modal/popup gets first chance to consume `Ctrl+C`.
- If a modal handles `Ctrl+C`, any armed quit shortcut is cleared so
dismissing a modal cannot
prime a subsequent `Ctrl+C` to quit.
- `Ctrl+D` only participates in quitting when the composer is empty and
no modal/popup is active.
The design doc `docs/exit-confirmation-prompt-design.md` captures the
intended routing and the
invariants the UI should maintain.
## Non-goals
- This does not attempt to redesign modal UX or make modals uniformly
dismissible via `Ctrl+C`.
It only ensures modals get priority and that quit arming does not leak
across modal handling.
- This does not introduce a persistent confirmation prompt/menu for
quitting; the goal is to keep
the exit gesture lightweight and consistent.
- This does not change the semantics of core shutdown itself; it changes
how the UI requests and
sequences it.
## Tradeoffs
- Quitting via `Ctrl+C`/`Ctrl+D` now requires a deliberate second
keypress, which adds friction for
users who relied on the old “instant quit” behavior.
- The UI now maintains a small time-bounded state machine for the armed
shortcut, which increases
complexity and introduces timing-dependent behavior.
This design was chosen over alternatives (a modal confirmation prompt or
a long-lived “are you
sure” state) because it provides an explicit safety barrier while
keeping the flow fast and
keyboard-native.
## Architecture
- `ChatWidget` owns the quit-shortcut state machine and decides when a
quit gesture is allowed
(idle vs cancellable work, composer state, etc.).
- `BottomPane` owns rendering and local input routing for modals/popups.
It is responsible for
consuming cancellation keys when a view is active and for
showing/expiring the footer hint.
- `App` owns shutdown sequencing: translating
`AppEvent::Exit(ShutdownFirst)` into `Op::Shutdown`
and only terminating the UI loop when exit is safe.
This keeps “what should happen” decisions (quit vs interrupt vs ignore)
in the chat/widget layer,
while keeping “how it looks and which view gets the key” in the
bottom-pane layer.
## Observability
You can tell this is working by running the TUIs and exercising the quit
gestures:
- While idle: pressing `Ctrl+C` (or `Ctrl+D` with an empty composer and
no modal) shows a footer
hint for ~1 second; pressing again within that window exits via
shutdown-first.
- While streaming/tools/review are active: `Ctrl+C` interrupts work
rather than quitting.
- With a modal/popup open: `Ctrl+C` dismisses/handles the modal (if it
chooses to) and does not
arm a quit shortcut; a subsequent quick `Ctrl+C` should not quit unless
the user re-arms it.
Failure modes are visible as:
- Quits that happen immediately (no hint window) from `Ctrl+C`/`Ctrl+D`.
- Quits that occur while a modal is open and consuming `Ctrl+C`.
- UI termination before shutdown completes (cleanup skipped).
## Tests
- Updated/added unit and snapshot coverage in `codex-tui` and
`codex-tui2` to validate:
- The quit hint appears and expires on the expected key.
- Double-press within the window triggers a shutdown-first quit request.
- Modal-first routing prevents quit bypass and clears any armed shortcut
when a modal consumes
`Ctrl+C`.
These tests focus on the UI-level invariants and rendered output; they
do not attempt to validate
real terminal key-repeat timing or end-to-end process shutdown behavior.
---
Screenshot:
<img width="912" height="740" alt="Screenshot 2026-01-13 at 1 05 28 PM"
src="https://github.com/user-attachments/assets/18f3d22e-2557-47f2-a369-ae7a9531f29f"
/>
When an invalid config.toml key or value is detected, the CLI currently
just quits. This leaves the VSCE in a dead state.
This PR changes the behavior to not quit and bubble up the config error
to users to make it actionable. It also surfaces errors related to
"rules" parsing.
This allows us to surface these errors to users in the VSCE, like this:
<img width="342" height="129" alt="Screenshot 2026-01-13 at 4 29 22 PM"
src="https://github.com/user-attachments/assets/a79ffbe7-7604-400c-a304-c5165b6eebc4"
/>
<img width="346" height="244" alt="Screenshot 2026-01-13 at 4 45 06 PM"
src="https://github.com/user-attachments/assets/de874f7c-16a2-4a95-8c6d-15f10482e67b"
/>
User-facing symptom: On terminals that deliver pastes as rapid
KeyCode::Char/Enter streams (notably Windows), paste-burst transient
state
can leak into the next input. Users can see Enter insert a newline when
they meant to submit, or see characters appear late / handled through
the
wrong path.
System problem: PasteBurst is time-based. Clearing only the
classification window (e.g. via clear_window_after_non_char()) can erase
last_plain_char_time without emitting buffered text. If a buffer is
still
non-empty after that, flush_if_due() no longer has a timeout clock to
flush against, so the buffer can get "stuck" until another plain char
arrives.
This was surfaced while adding deterministic regression tests for
paste-burst behavior.
Fix: when disabling burst detection, defuse any in-flight burst state:
flush held/buffered text through handle_paste() (so it follows normal
paste integration), then clear timing and Enter suppression.
Document the rationale inline and update docs/tui-chat-composer.md so
"disable_paste_burst" matches the actual behavior.
Have only the following Methods:
- `list_models`: getting current available models
- `try_list_models`: sync version no refresh for tui use
- `get_default_model`: get the default model (should be tightened to
core and received on session configuration)
- `get_model_info`: get `ModelInfo` for a specific model (should be
tightened to core but used in tests)
- `refresh_if_new_etag`: trigger refresh on different etags
Also move the cache to its own struct
Adds an integration test for the new behavior introduced in
https://github.com/openai/codex/pull/9011. The work to create the test
setup was substantial enough that I thought it merited a separate PR.
This integration test spawns `codex` in TUI mode, which requires
spawning a PTY to run successfully, so I had to introduce quite a bit of
scaffolding in `run_codex_cli()`. I was surprised to discover that we
have not done this in our codebase before, so perhaps this should get
moved to a common location so it can be reused.
The test itself verifies that a malformed `rules` in `$CODEX_HOME`
prints a human-readable error message and exits nonzero.
The underlying issue is that when we encountered an error starting a
conversation (any sort of error, though making `$CODEX_HOME/rules` a
file rather than folder was the example in #8803), then we were writing
the message to stderr, but this could be printed over by our UI
framework so the user would not see it. In general, we disallow the use
of `eprintln!()` in this part of the code for exactly this reason,
though this was suppressed by an `#[allow(clippy::print_stderr)]`.
This attempts to clean things up by changing `handle_event()` and
`handle_tui_event()` to return a `Result<AppRunControl>` instead of a
`Result<bool>`, which is a new type introduced in this PR (and depends
on `ExitReason`, also a new type):
```rust
#[derive(Debug)]
pub(crate) enum AppRunControl {
Continue,
Exit(ExitReason),
}
#[derive(Debug, Clone)]
pub enum ExitReason {
UserRequested,
Fatal(String),
}
```
This makes it possible to exit the primary control flow of the TUI with
richer information. This PR adds `ExitReason` to the existing
`AppExitInfo` struct and updates `handle_app_exit()` to print the error
and exit code `1` in the event of `ExitReason::Fatal`.
I tried to create an integration test for this, but it was a bit
involved, so I published it as a separate PR:
https://github.com/openai/codex/pull/9166. For this PR, please have
faith in my manual testing!
Fixes https://github.com/openai/codex/issues/8803.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/9011).
* #9166
* __->__ #9011
## **Problem**
Codex’s TUI uses a single “task running” indicator (spinner + Esc interrupt hint)
to communicate “the UI is busy”. In practice, “busy” can mean two different
things: an agent turn is running, or MCP servers are still starting up. Without a
clear contract, those lifecycles can interfere: startup completion can clear the
spinner while a turn is still in progress, or the UI can appear idle while MCP is
still booting. This is user-visible confusion during the most important moments
(startup and the first turn), so it was worth making the contract explicit and
guarding it.
## **Mental model**
`ChatWidget` is the UI-side adapter for the `codex_core::protocol` event stream.
It receives `EventMsg` events and updates two major UI surfaces: the transcript
(history/streaming cells) and the bottom pane (composer + status indicator).
The key concept after this change is that the bottom pane’s “task running”
indicator is treated as **derived UI-busy state**, not “agent is running”. It is
considered active while either:
- an agent turn is in progress (`TurnStarted` → completion/abort), or
- MCP startup is in progress (`McpStartupUpdate` → `McpStartupComplete`).
Those lifecycles are tracked independently, and the bottom-pane indicator is
defined as their union.
## **Non-goals**
- This does not introduce separate UI indicators for “turn busy” vs “MCP busy”.
- This does not change MCP startup behavior, ordering guarantees, or core
protocol semantics.
- This does not rework unrelated status/header rendering or transcript layout.
## **Tradeoffs**
- The “one flag represents multiple lifecycles” approach remains lossy: it
preserves correct “busy vs idle” semantics but cannot express *which* kind of
busy is happening without further UI changes.
- The design keeps complexity low by keeping a single derived boolean, rather
than adding a more expressive bottom-pane state machine. That’s chosen because
it matches existing UX and minimizes churn while fixing the confusion.
## **Architecture**
- `codex-core` owns the actual lifecycles and emits `codex_core::protocol`
events.
- `ChatWidget` owns the UI interpretation of those lifecycles. It is responsible
for keeping the bottom pane’s derived “busy” state consistent with the event
stream, and for updating the status header when MCP progress updates arrive.
- The bottom pane remains a dumb renderer of the single “task running” flag; it
does not learn about MCP or agent turns directly.
## **Observability**
- When working: the spinner/Esc hint stays visible during MCP startup and does
not disappear mid-turn when `McpStartupComplete` arrives; startup status
headers can update without clearing “busy” for an active turn.
- When broken: you’ll see the spinner/hint flicker off while output is still
streaming, or the UI appears idle while MCP startup status is still changing.
## **Tests**
- Adds/strengthens a regression test that asserts MCP startup completion does
not clear the “task running” indicator for an active turn (in both `tui` and
`tui2` variants).
- These tests prove the **contract** (“busy is the union of turn + startup”) at
the UI boundary; they do not attempt to validate MCP startup ordering,
real-world startup timing, or backend integration behavior.
Fixes#7017
Signed-off-by: 2mawi2 <2mawi2@users.noreply.github.com>
Co-authored-by: 2mawi2 <2mawi2@users.noreply.github.com>
Co-authored-by: Josh McKinney <joshka@openai.com>
Replace the old timing-dependent non-ASCII paste test with deterministic
coverage by forcing an active `PasteBurst` and asserting the exact flush
payload.
Add focused unit tests for `PasteBurst` transitions, and add short
"Behavior:" rustdoc notes on chat composer tests to make the state
machine contracts explicit.
Add a narrative doc and inline rustdoc explaining how `ChatComposer`
and `PasteBurst` compose into a single state machine on terminals that
lack reliable bracketed paste (notably Windows).
This documents the key states, invariants, and integration points
(`handle_input_basic`, `handle_non_ascii_char`, tick-driven flush) so
future changes are easier to reason about.
Enterprises want to restrict the MCP servers their users can use.
Admins can now specify an allowlist of MCPs in `requirements.toml`. The
MCP servers are matched on both Name and Transport (local path or HTTP
URL) -- both must match to allow the MCP server. This prevents
circumventing the allowlist by renaming MCP servers in user config. (It
is still possible to replace the local path e.g. rewrite say
`/usr/local/github-mcp` with a nefarious MCP. We could allow hash
pinning in the future, but that would break updates. I also think this
represents a broader, out-of-scope problem.)
We introduce a new field to Constrained: "normalizer". In general, it is
a fn(T) -> T and applies when `Constrained<T>.set()` is called. In this
particular case, it disables MCP servers which do not match the
allowlist. An alternative solution would remove this and instead throw a
ConstraintError. That would stop Codex launching if any MCP server was
configured which didn't match. I think this is bad.
We currently reuse the enabled flag on MCP servers to disable them, but
don't propagate any information about why they are disabled. I'd like to
add that in a follow up PR, possibly by switching out enabled with an
enum.
In action:
```
# MCP server config has two MCPs. We are going to allowlist one of them.
➜ codex git:(gt/restrict-mcps) ✗ cat ~/.codex/config.toml | grep mcp_servers -A1
[mcp_servers.hello_world]
command = "hello-world-mcp"
--
[mcp_servers.docs]
command = "docs-mcp"
# Restrict the MCPs to the hello_world MCP.
➜ codex git:(gt/restrict-mcps) ✗ defaults read com.openai.codex requirements_toml_base64 | base64 -d
[mcp_server_allowlist.hello_world]
command = "hello-world-mcp"
# List the MCPs, observe hello_world is enabled and docs is disabled.
➜ codex git:(gt/restrict-mcps) ✗ just codex mcp list
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
Running `target/debug/codex mcp list`
Name Command Args Env Cwd Status Auth
docs docs-mcp - - - disabled Unsupported
hello_world hello-world-mcp - - - enabled Unsupported
# Remove the restrictions.
➜ codex git:(gt/restrict-mcps) ✗ defaults delete com.openai.codex requirements_toml_base64
# Observe both MCPs are enabled.
➜ codex git:(gt/restrict-mcps) ✗ just codex mcp list
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
Running `target/debug/codex mcp list`
Name Command Args Env Cwd Status Auth
docs docs-mcp - - - enabled Unsupported
hello_world hello-world-mcp - - - enabled Unsupported
# A new requirements that updates the command to one that does not match.
➜ codex git:(gt/restrict-mcps) ✗ cat ~/requirements.toml
[mcp_server_allowlist.hello_world]
command = "hello-world-mcp-v2"
# Use those requirements.
➜ codex git:(gt/restrict-mcps) ✗ defaults write com.openai.codex requirements_toml_base64 "$(base64 -i /Users/gt/requirements.toml)"
# Observe both MCPs are disabled.
➜ codex git:(gt/restrict-mcps) ✗ just codex mcp list
cargo run --bin codex -- "$@"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.75s
Running `target/debug/codex mcp list`
Name Command Args Env Cwd Status Auth
docs docs-mcp - - - disabled Unsupported
hello_world hello-world-mcp - - - disabled Unsupported
```
Sending a message during /review interrupts the review, whereas during
normal operation, sending a message while the agent is running will
queue the message. This is unexpected behavior, and since /review
usually takes a while, it takes away a potentially useful operation.
Summary
- Treat review mode as an active task for message queuing so inputs
don’t inject into the running review turn.
- Prevents user submissions from rendering immediately in the transcript
while the review continues streaming.
- Keeps review UX consistent with normal “task running” behavior and
avoids accidental interrupt/replacement.
Notes
- This change only affects UI queuing logic; core review flow and task
lifecycle remain unchanged.
This is an alternate PR to solving the same problem as
<https://github.com/openai/codex/pull/8227>.
In this PR, when Ollama is used via `--oss` (or via `model_provider =
"ollama"`), we default it to use the Responses format. At runtime, we do
an Ollama version check, and if the version is older than when Responses
support was added to Ollama, we print out a warning.
Because there's no way of configuring the wire api for a built-in
provider, we temporarily add a new `oss_provider`/`model_provider`
called `"ollama-chat"` that will force the chat format.
Once the `"chat"` format is fully removed (see
<https://github.com/openai/codex/discussions/7782>), `ollama-chat` can
be removed as well
---------
Co-authored-by: Eric Traut <etraut@openai.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
Handle image paste on empty paste events.
- Intent: make image paste work in terminals that emit empty paste
events.
- Approach: route paste events through an image-aware handler and read
the clipboard when text is empty.
- That's best effort to detect it. Some terminals don't send the empty
signal.
### Problem
Ctrl+T transcript overlay can omit in-flight coalesced tool calls because it
renders only committed transcript cells while the main viewport can render the
current in-flight ChatWidget.active_cell immediately.
### Mental model
The UI has both committed transcript cells (finalized HistoryCell entries) and
an in-flight active cell that can mutate in place while streaming, often
representing a coalesced exec/tool group. The transcript overlay renders
committed cells plus a render-only live tail derived from the current active
cell. The live tail is cached and only recomputed when its cache key changes,
which is derived from terminal width (wrapping), active-cell revision
(in-place mutations), stream continuation (spacing), and animation tick
(time-based visuals).
### Non-goals
This does not change coalescing rules, flush boundaries, or when active cells
become committed. It does not change tool-call semantics or transcript
persistence; it is a rendering-only improvement for the overlay.
### Tradeoffs
This adds cache invalidation complexity: correctness depends on bumping an
active-cell revision (and/or providing an animation tick) when the active cell
mutates in place. The mechanism is implemented in both codex-tui and codex-tui2,
which keeps behavior consistent but risks drift if future changes are not
applied in lockstep.
### Architecture
App special-cases transcript overlay draws to sync a live tail from ChatWidget
into TranscriptOverlay. TranscriptOverlay remains the owner of committed
transcript cells; the live tail is an optional appended renderable.
HistoryCell::transcript_animation_tick() allows time-dependent transcript output
(spinner/shimmer) to invalidate the cached tail without requiring data mutation.
### Observability
Manual verification is to open Ctrl+T while an exploring/coalesced active cell
is still in-flight and confirm the overlay includes the same in-flight tool-call
group the main viewport shows. The overlay is kept in sync by App passing an
active-cell key and transcript lines into TranscriptOverlay::sync_live_tail; the
key must change when the active cell mutates or animates.
### Tests
Snapshot tests validate that the transcript overlay renders a live tail appended
after committed cells and that identical keys short-circuit recomputation. Unit
tests validate that active-cell revision bumps occur on specific in-place
mutations (e.g. unified exec wait cell command display becoming known late) so
cached tails are invalidated.
## Documentation patches (module, type, function)
### Module-level docs (invariants + mechanisms)
- codex-rs/tui/src/app_backtrack.rs:1
- codex-rs/tui/src/chatwidget.rs:1
- codex-rs/tui/src/pager_overlay.rs:1
- codex-rs/tui/src/history_cell.rs:1
- codex-rs/tui2/src/app_backtrack.rs:1
- codex-rs/tui2/src/chatwidget.rs:1
- codex-rs/tui2/src/pager_overlay.rs:1
- codex-rs/tui2/src/history_cell.rs:1
### Type-level docs (cache key + invariants)
- codex-rs/tui/src/chatwidget.rs (ChatWidget.active_cell_revision, ActiveCellTranscriptKey)
- codex-rs/tui/src/pager_overlay.rs (TranscriptOverlay live tail storage model)
- codex-rs/tui/src/history_cell.rs (HistoryCell::transcript_animation_tick, UnifiedExecWaitCell::update_command_display)
- Mirrored in codex-rs/tui2/src/chatwidget.rs, codex-rs/tui2/src/pager_overlay.rs, codex-rs/tui2/src/history_cell.rs
### Function-level docs (why/when/guarantees/pitfalls)
- codex-rs/tui/src/app_backtrack.rs (overlay_forward_event)
- codex-rs/tui/src/chatwidget.rs (active_cell_transcript_key, active_cell_transcript_lines)
- codex-rs/tui/src/pager_overlay.rs (sync_live_tail, take_live_tail_renderable)
- codex-rs/tui/src/history_cell.rs (transcript_animation_tick, UnifiedExecWaitCell::update_command_display)
- Mirrored in codex-rs/tui2 equivalents where present
### Validation performed
- cd codex-rs && just fmt
- cd codex-rs && cargo test -p codex-tui
- cd codex-rs && cargo test -p codex-tui2
## Design inconsistencies / risks
- Cache invalidation is a distributed responsibility: any future in-place active
cell transcript mutation that forgets to bump active_cell_revision (or expose
an animation tick) can leave the transcript overlay live tail out of sync with
the main viewport.
- TranscriptOverlay tail handling assumes a structural invariant that the live
tail, when present, is exactly one trailing renderable after the committed cell
renderables; if renderable construction changes in a way that violates that
assumption, tail insertion/removal logic becomes incorrect.
- codex-tui and codex-tui2 duplicate the live-tail mechanism; the documentation
is aligned, but the implementation can still drift unless changes continue to
be applied in lockstep.
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>
In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).
We also tweak the view_file tool description which seemed to help a bit
Results on a simple eval set of images:
Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>
After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>
```json
[
{
"id": "single_describe",
"prompt": "Describe the attached image in one sentence.",
"images": ["image_a.png"]
},
{
"id": "single_color",
"prompt": "What is the dominant color in the image? Answer with a single color word.",
"images": ["image_b.png"]
},
{
"id": "orientation_check",
"prompt": "Is the image portrait or landscape? Answer in one sentence.",
"images": ["image_c.png"]
},
{
"id": "detail_request",
"prompt": "Look closely at the image and call out any small details you notice.",
"images": ["image_d.png"]
},
{
"id": "two_images_compare",
"prompt": "I attached two images. Are they the same or different? Briefly explain.",
"images": ["image_a.png", "image_b.png"]
},
{
"id": "two_images_captions",
"prompt": "Provide a short caption for each image (Image 1, Image 2).",
"images": ["image_c.png", "image_d.png"]
},
{
"id": "multi_image_rank",
"prompt": "Rank the attached images from most colorful to least colorful.",
"images": ["image_a.png", "image_b.png", "image_c.png"]
},
{
"id": "multi_image_choice",
"prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
"images": ["image_b.png", "image_d.png"]
}
]
```
Add model provider info to /status if non-default
Enterprises are running Codex and migrating between proxied / API key
auth and SIWC. If you accidentally run Codex with `OPENAI_BASE_URL=...`,
which is surprisingly easy to do, we don't tend to surface this anywhere
and it may lead to breakage. One suggestion was to include this
information in `/status`:
<img width="477" height="157" alt="Screenshot 2026-01-09 at 15 45 34"
src="https://github.com/user-attachments/assets/630ce68f-c856-4a2b-a004-7df2fbe5de93"
/>
This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).
If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:
- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:
```
failures:
suite::compact::manual_compact_twice_preserves_latest_user_messages
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```
- `just build-for-release` to build release binaries for all
platforms/architectures remotely
To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)
## CI
In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.
The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:
```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```
Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:
- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.
Release builds will also continue to be done by Cargo for now.
Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504
---------
Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
Fixes#2558
Codex uses alternate screen mode (CSI 1049) which, per xterm spec,
doesn't support scrollback. Zellij follows this strictly, so users can't
scroll back through output.
**Changes:**
- Add `tui.alternate_screen` config: `auto` (default), `always`, `never`
- Add `--no-alt-screen` CLI flag
- Auto-detect Zellij and skip alt screen (uses existing `ZELLIJ` env var
detection)
**Usage:**
```bash
# CLI flag
codex --no-alt-screen
# Or in config.toml
[tui]
alternate_screen = "never"
```
With default `auto` mode, Zellij users get working scrollback without
any config changes.
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
Some enterprises do not want their users to be able to `/feedback`.
<img width="395" height="325" alt="image"
src="https://github.com/user-attachments/assets/2dae9c0b-20c3-4a15-bcd3-0187857ebbd8"
/>
Adds to `config.toml`:
```toml
[feedback]
enabled = false
```
I've deliberately decided to:
1. leave other references to `/feedback` (e.g. in the interrupt message,
tips of the day) unchanged. I think we should continue to promote the
feature even if it is not usable currently.
2. leave the `/feedback` menu item selectable and display an error
saying it's disabled, rather than remove the menu item (which I believe
would raise more questions).
but happy to discuss these.
This will be followed by a change to requirements.toml that admins can
use to force the value of feedback.enabled.
**Motivation**
The `originator` header is important for codex-backend’s Responses API
proxy because it identifies the real end client (codex cli, codex vscode
extension, codex exec, future IDEs) and is used to categorize requests
by client for our enterprise compliance API.
Today the `originator` header is set by either:
- the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var (our VSCode extension
does this)
- calling `set_default_originator()` which sets a global immutable
singleton (`codex exec` does this)
For `codex app-server`, we want the `initialize` JSON-RPC request to set
that header because it is a natural place to do so. Example:
```json
{
"method": "initialize",
"id": 0,
"params": {
"clientInfo": {
"name": "codex_vscode",
"title": "Codex VS Code Extension",
"version": "0.1.0"
}
}
}
```
and when app-server receives that request, it can call
`set_default_originator()`. This is a much more natural interface than
asking third party developers to set an env var.
One hiccup is that `originator()` reads the global singleton and locks
in the value, preventing a later `set_default_originator()` call from
setting it. This would be fine but is brittle, since any codepath that
calls `originator()` before app-server can process an `initialize`
JSON-RPC call would prevent app-server from setting it. This was
actually the case with OTEL initialization which runs on boot, but I
also saw this behavior in certain tests.
Instead, what we now do is:
- [unchanged] If `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var is set,
`originator()` would return that value and `set_default_originator()`
with some other value does NOT override it.
- [new] If no env var is set, `originator()` would return the default
value which is `codex_cli_rs` UNTIL `set_default_originator()` is called
once, in which case it is set to the new value and becomes immutable.
Later calls to `set_default_originator()` returns
`SetOriginatorError::AlreadyInitialized`.
**Other notes**
- I updated `codex_core::otel_init::build_provider` to accepts a service
name override, and app-server sends a hardcoded `codex_app_server`
service name to distinguish it from `codex_cli_rs` used by default (e.g.
TUI).
**Next steps**
- Update VSCE to set the proper value for `clientInfo.name` on
`initialize` and drop the `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` env var.
- Delete support for `CODEX_INTERNAL_ORIGINATOR_OVERRIDE` in codex-rs.
Elevated Sandbox NUX:
* prompt for elevated sandbox setup when agent mode is selected (via
/approvals or at startup)
* prompt for degraded sandbox if elevated setup is declined or fails
* introduce /elevate-sandbox command to upgrade from degraded
experience.
Historically we started with a CodexAuth that knew how to refresh it's
own tokens and then added AuthManager that did a different kind of
refresh (re-reading from disk).
I don't think it makes sense for both `CodexAuth` and `AuthManager` to
be mutable and contain behaviors.
Move all refresh logic into `AuthManager` and keep `CodexAuth` as a data
object.
**Before:**
```
Error loading configuration: value `Never` is not in the allowed set [OnRequest]
```
**After:**
```
Error loading configuration: invalid value for `approval_policy`: `Never` is not in the
allowed set [OnRequest] (set by MDM com.openai.codex:requirements_toml_base64)
```
Done by introducing a new struct `ConfigRequirementsWithSources` onto
which we `merge_unset_fields` now. Also introduces a pair of requirement
value and its `RequirementSource` (inspired by `ConfigLayerSource`):
```rust
pub struct Sourced<T> {
pub value: T,
pub source: RequirementSource,
}
```
## Summary
This PR builds _heavily_ on the work from @occurrent in #8021 - I've
only added a small fix, added additional tests, and propagated the
changes to tui2.
From the original PR:
> On Windows, Codex relies on PasteBurst for paste detection because
bracketed paste is not reliably available via crossterm.
>
> When pasted content starts with non-ASCII characters, input is routed
through handle_non_ascii_char, which bypasses the normal paste burst
logic. This change extends the paste burst window for that path, which
should ensure that Enter is correctly grouped as part of the paste.
## Testing
- [x] tested locally cross-platform
- [x] added regression tests
---------
Co-authored-by: occur <occurring@outlook.com>
<img width="763" height="349" alt="Screenshot 2026-01-07 at 18 37 59"
src="https://github.com/user-attachments/assets/569d01cb-ea91-4113-889b-ba74df24adaf"
/>
It may not make sense to use the `/model` menu with a custom
OPENAI_BASE_URL. But some model proxies may support it, so we shouldn't
disable it completely. A warning is a reasonable compromise.
**Summary**
This PR makes “ApprovalDecision::AcceptForSession / don’t ask again this
session” actually work for `apply_patch` approvals by caching approvals
based on absolute file paths in codex-core, properly wiring it through
app-server v2, and exposing the choice in both TUI and TUI2.
- This brings `apply_patch` calls to be at feature-parity with general
shell commands, which also have a "Yes, and don't ask again" option.
- This also fixes VSCE's "Allow this session" button to actually work.
While we're at it, also split the app-server v2 protocol's
`ApprovalDecision` enum so execpolicy amendments are only available for
command execution approvals.
**Key changes**
- Core: per-session patch approval allowlist keyed by absolute file
paths
- Handles multi-file patches and renames/moves by recording both source
and destination paths for `Update { move_path: Some(...) }`.
- Extend the `Approvable` trait and `ApplyPatchRuntime` to work with
multiple keys, because an `apply_patch` tool call can modify multiple
files. For a request to be auto-approved, we will need to check that all
file paths have been approved previously.
- App-server v2: honor AcceptForSession for file changes
- File-change approval responses now map AcceptForSession to
ReviewDecision::ApprovedForSession (no longer downgraded to plain
Approved).
- Replace `ApprovalDecision` with two enums:
`CommandExecutionApprovalDecision` and `FileChangeApprovalDecision`
- TUI / TUI2: expose “don’t ask again for these files this session”
- Patch approval overlays now include a third option (“Yes, and don’t
ask again for these files this session (s)”).
- Snapshot updates for the approval modal.
**Tests added/updated**
- Core:
- Integration test that proves ApprovedForSession on a patch skips the
next patch prompt for the same file
- App-server:
- v2 integration test verifying
FileChangeApprovalDecision::AcceptForSession works properly
**User-visible behavior**
- When the user approves a patch “for session”, future patches touching
only those previously approved file(s) will no longer prompt gain during
that session (both via app-server v2 and TUI/TUI2).
**Manual testing**
Tested both TUI and TUI2 - see screenshots below.
TUI:
<img width="1082" height="355" alt="image"
src="https://github.com/user-attachments/assets/adcf45ad-d428-498d-92fc-1a0a420878d9"
/>
TUI2:
<img width="1089" height="438" alt="image"
src="https://github.com/user-attachments/assets/dd768b1a-2f5f-4bd6-98fd-e52c1d3abd9e"
/>
> // todo(aibrahim): why are we passing model here while it can change?
we update it on each turn with `.with_model`
> //TODO(aibrahim): run CI in release mode.
although it's good to have, release builds take double the time tests
take.
> // todo(aibrahim): make this async function
we figured out another way of doing this sync
- Merge ModelFamily into ModelInfo
- Remove logic for adding instructions to apply patch
- Add compaction limit and visible context window to `ModelInfo`