ensure status indicator present in all classifications of exec tool.
fixes indicator disappearing after preambles, will look into using
`phase` to avoid this class of error in a few hours.
commands parsed as unknown faced this issue
tested locally, added test for specific failure flow
## Problem
When a turn streamed a preamble line before any tool activity,
`ChatWidget` hid the status row while committing streamed lines and did
not restore it until a later event (commonly `ExecCommandBegin`). During
that idle gap, the UI looked finished even though the turn was still
active.
## Mental model
The bottom status row and transcript stream are separate progress
affordances:
- transcript stream shows committed output
- status row (spinner/shimmer + header) shows liveness of an active turn
While stream output is actively committing, hiding the status row is
acceptable to avoid redundant visual noise. Once stream controllers go
idle, an active turn must restore the status row immediately so liveness
remains visible across preamble-to-tool gaps.
## Non-goals
- No changes to streaming chunking policy or pacing.
- No changes to final completion behavior (status still hides when task
actually ends).
- No refactor of status lifecycle ownership between `ChatWidget` and
`BottomPane`.
## Tradeoffs
- We keep the existing behavior of hiding the status row during active
stream commits.
- We add explicit restoration on the idle boundary when the task is
still running.
- This introduces one extra status update on idle transitions, which is
small overhead but makes liveness semantics consistent.
## Architecture
`run_commit_tick_with_scope` in `chatwidget.rs` now documents and
enforces a two-phase contract:
1. For each committed streamed cell, hide status and append transcript
output.
2. If controllers are present and all idle, restore status iff task is
still running, preserving the current header.
This keeps status ownership in `ChatWidget` while relying on
`BottomPane` helpers:
- `hide_status_indicator()` during active stream commits
- `ensure_status_indicator()` +
`set_status_header(current_status_header)` at stream-idle boundary
Documentation pass additions:
- Clarified the function-level contract and lifecycle intent in
`run_commit_tick_with_scope`.
- Added an explicit regression snapshot test comment describing the
failing sequence.
## Observability
Signal that the fix is present:
- In the preamble-idle state, rendered output still includes `• Working
(… esc to interrupt)`.
- New snapshot:
`codex_tui__chatwidget__tests__preamble_keeps_working_status.snap`.
Debug path for future regressions:
- Start at `run_commit_tick_with_scope` for hide/restore transitions.
- Verify `bottom_pane.is_task_running()` at idle transition.
- Confirm `current_status_header` continuity when status is recreated.
- Use the new snapshot and targeted test sequence to reproduce
deterministic preamble-idle behavior.
## Tests
- Updated regression assertion:
- `streaming_final_answer_keeps_task_running_state` now expects status
widget to remain present while turn is running.
- Renamed/updated behavioral regression:
- `preamble_keeps_status_indicator_visible_until_exec_begin`.
- Added snapshot regression coverage:
- `preamble_keeps_working_status_snapshot`.
- Snapshot file:
`tui/src/chatwidget/snapshots/codex_tui__chatwidget__tests__preamble_keeps_working_status.snap`.
Commands run:
- `just fmt`
- `cargo test -p codex-tui
preamble_keeps_status_indicator_visible_until_exec_begin`
- `cargo test -p codex-tui preamble_keeps_working_status_snapshot`
## Risks / Inconsistencies
- Status visibility policy is still split across multiple event paths
(`commit tick`, `turn complete`, `exec begin`), so future regressions
can reintroduce ordering gaps.
- Restoration depends on `is_task_running()` correctness; if task
lifecycle flags drift, status behavior will drift too.
- Snapshot proves rendered state, not animation cadence; cadence still
relies on frame scheduling behavior elsewhere.
Adds a top-level `log_dir` config key (defaults to `$CODEX_HOME/log`) so
one-off runs can redirect `codex-tui.log` via `-c`, e.g.:
codex -c log_dir=./.codex-log
Also resolves relative paths in CLI `-c/--config` overrides for
`AbsolutePathBuf` values against the effective cwd (when available).
Tests:
- cargo test -p codex-core
Make ModelClient a session-scoped object.
Move state that is session level onto the client, and make state that is
per-turn explicit on corresponding methods.
Stop taking a huge Config object, instead only pass in values that are
actually needed.
---------
Co-authored-by: Josh McKinney <joshka@openai.com>
Took over the work that @aaronl-openai started here:
https://github.com/openai/codex/pull/10397
Now that app-server clients are able to set up custom tools (called
`dynamic_tools` in app-server), we should expose a way for clients to
pass in not just text, but also image outputs. This is something the
Responses API already supports for function call outputs, where you can
pass in either a string or an array of content outputs (text, image,
file):
https://platform.openai.com/docs/api-reference/responses/create#responses_create-input-input_item_list-item-function_tool_call_output-output-array-input_image
So let's just plumb it through in Codex (with the caveat that we only
support text and image for now). This is implemented end-to-end across
app-server v2 protocol types and core tool handling.
## Breaking API change
NOTE: This introduces a breaking change with dynamic tools, but I think
it's ok since this concept was only recently introduced
(https://github.com/openai/codex/pull/9539) and it's better to get the
API contract correct. I don't think there are any real consumers of this
yet (not even the Codex App).
Old shape:
`{ "output": "dynamic-ok", "success": true }`
New shape:
```
{
"contentItems": [
{ "type": "inputText", "text": "dynamic-ok" },
{ "type": "inputImage", "imageUrl": "data:image/png;base64,AAA" }
]
"success": true
}
```
Add a centralized FileWatcher in codex-core (using notify) that watches
skill roots from the config layer stack (recursive)
Send `SkillsChanged` events when relevant file system changes are
detected
On `SkillsChanged`:
* Invalidate the skills cache immediately in ThreadManager
* Emit EventMsg::SkillsUpdateAvailable to active sessions
~~* Broadcast a new app-server notification:
SkillsListUpdatedNotification~~
This change does not inject new items into the event stream. That means
the agent will not know about new skills, so it won't be able to
implicitly invoke new skills. It also won't know about changes to
existing skills, so if it has already read the contents of a modified
skill, it will not honor the new behavior.
This change also does not detect modifications to AGENTS.md.
I plan to address these limitations in a follow-on PR modeled after
#9985. Injection of new skills and AGENTS was deemed to risky, hence the
need to split the feature into two stages. The changes in this PR were
designed to easily accommodate the second stage once we have some other
foundational changes in place.
Testing: In addition to automated tests, I did manual testing to confirm
that newly-created skills, deleted skills, and renamed skills are
reflected in the TUI skill picker menu. Also confirmed that
modifications to behaviors for explicitly-invoked skills are honored.
---------
Co-authored-by: Xin Lin <xl@openai.com>
## Human summary
Sandboxing (specifically `LandlockRestrict`) is means that e.g. `sleep
10` fails immediately. Therefore it cannot be interrupted.
In suite::interrupt::test_shell_command_interruption, sleep 10 is issued
at 17:28:16.554 (ToolCall: shell_command {"command":"sleep 10"...}),
then fails at 17:28:16.589 with duration_ms=34, success=false,
exit_code=101, and
Sandbox(LandlockRestrict).
## Codex summary
- set `sandbox_mode = "danger-full-access"` in `interrupt` and
`v2/turn_interrupt` integration tests
- set `sandbox: Some(SandboxMode::DangerFullAccess)` in
`test_codex_jsonrpc_conversation_flow`
- set `sandbox_policy: Some(SandboxPolicy::DangerFullAccess)` in
`command_execution_notifications_include_process_id`
## Why
On some Linux CI environments, command execution fails immediately with
`LandlockRestrict` when sandboxed. These tests are intended to validate
JSON-RPC/task lifecycle behavior (interrupt semantics, command
notification shape/process id, request flow), but early sandbox startup
failure changes turn flow and can trigger extra follow-up requests,
causing flakes.
This change removes environment-specific sandbox startup dependency from
these tests while preserving their primary intent.
## Testing
- not run in this environment (per request)
MCP actions take a long time to load for users with lots of apps
installed. Adding a cache for these actions with 1hr expiration, given
that they are almost always aren't going to change unless people install
another app, which means they also need to restart codex to pick it up.
This PR fixes jitter in the TUI apps menu by making the description
column stable during rendering and height measurement.
Added a `stable_desc_col` option to
`SelectionViewParams`/`ListSelectionView`, introduced stable variants of
the shared row render/measure helpers in `selection_popup_common`, and
enabled the stable mode for the apps/connectors picker in `chatwidget`.
With these changes, only the apps/connectors picker uses this new
option, though it could be used elsewhere in the future.
Why: previously, the description column was computed from only currently
visible rows, so as you scrolled or filtered, the column could shift and
cause wrapping/height changes that looked jumpy. Computing it from all
rows in this popup keeps alignment and layout consistent as users scroll
through avaialble apps.
**Before:**
https://github.com/user-attachments/assets/3856cb72-5465-4b90-a993-65a2ffb09113
**After:**
https://github.com/user-attachments/assets/37b9d626-0b21-4c0f-8bb8-244c9ef971ff
- Schema: thread_id (PK, FK to threads.id with cascade delete),
trace_summary, memory_summary, updated_at.
- Migration: creates the table and an index on (updated_at DESC,
thread_id DESC) for efficient recent-first reads.
- Runtime API (DB-only):
- `get_thread_memory(thread_id)`: fetch one memory row.
- `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
insert/update by thread id and always advance updated_at.
- `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
threads and return newest n rows for an exact cwd match.
- Model layer: introduced ThreadMemory and row conversion types to keep
query decoding typed and consistent with existing state models.
## Summary
This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
curent Linux sandbox path relies on in-process restrictions (including
Landlock). Bubblewrap gives us a more uniform filesystem isolation
model, especially explicit writable roots with the option to make some
directories read-only and granular network controls.
This is behind a feature flag so we can validate behavior safely before
making it the default.
- Added temporary rollout flag:
- `features.use_linux_sandbox_bwrap`
- Preserved existing default path when the flag is off.
- In Bubblewrap mode:
- Added internal retry without /proc when /proc mount is not permitted
by the host/container.
This PR adds a new approval option for app/MCP tool calls: “Allow and
remember” (session-scoped).
When selected, Codex stores a temporary approval and auto-approves
matching future calls for the rest of the session.
Added a session-scoped approval key (`server`, `connector_id`,
`tool_name`) and persisted it in `tool_approvals` as
`ApprovedForSession`.
On subsequent matching calls, approval is skipped and treated as
accepted.
- Updated the approval question options to conditionally include:
- Accept
- Allow and remember (conditional)
- Decline
- Cancel
The new “Allow and remember” option is only shown when all of these are
true:
1. The call is routed through the Codex Apps MCP server (codex_apps).
2. The tool requires approval based on annotations:
- read_only_hint == false, and
- destructive_hint == true or open_world_hint == true.
3. The tool includes a connector_id in metadata (used to build the
remembered approval key).
If no `connector_id` is present, the prompt still appears (when approval
is required), but only with the existing choices (Accept / Decline /
Cancel). Approval prompting in this path has an explicit early return
unless server == `codex_apps`.
Summary:
- replace the `sse_completed` fixture and related JSON template with
direct `responses::ev_completed` payload builders
- cascade the new SSE helpers through all affected core tests for
consistency and clarity
- remove legacy fixtures that were no longer needed once the helpers are
in place
Testing:
- Not run (not requested)
Summary
- add versioned state sqlite filename helpers and re-export them from
the state crate
- remove legacy state files when initializing the runtime and update
consumers/tests to use the new helpers
- tweak logs client description and database resolution to match the new
path
When communicating over websockets, we can't rely on headers to deliver
rate limit information. This PR adds a `codex.rate_limits` event that
the server can pass to the client to inform them about rate limit usage.
The client parses this data the same way we parse rate limit headers in
HTTP mode.
This PR also wires up the etag and reasoning headers for websockets
If we want to build `/debug-config`, we'll need to know the requirements
sources that supplied the values.
This PR adds those sources such that we can render them in the UI.
Summary
- add Cursor/ThreadsPage conversions so state DB listings can be mapped
back into the rollout list model
- make recorder list helpers query the state DB first (archived flag
included) and only fall back to file traversal if needed, along with
populating head bytes lazily
- add extensive tests to ensure the DB path is honored for active and
archived threads and that the fallback works
Testing
- Not run (not requested)
<img width="1196" height="693" alt="Screenshot 2026-02-03 at 20 42 33"
src="https://github.com/user-attachments/assets/826b3c7a-ef11-4b27-802a-3c343695794a"
/>
- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.
## Summary
This PR updates the `request_user_input` TUI overlay so `Esc` is
context-aware:
- When notes are visible for an option question, `Esc` now clears notes
and exits notes mode.
- When notes are not visible (normal option selection UI), `Esc` still
interrupts as before.
It also updates footer guidance text to match behavior.
## Changes
- Added a shared notes-clear path for option questions:
- `Tab` and `Esc` now both clear notes and return focus to options when
notes are visible.
- Updated footer hint text in notes-visible state:
- from: `tab to clear notes | ... | esc to interrupt`
- to: `tab or esc to clear notes | ...`
- Hid `esc to interrupt` hint while notes are visible for option
questions.
- Kept `esc to interrupt` visible and functional in normal
option-selection mode.
- Updated tests to assert the new `Esc` behavior in notes mode.
- Updated snapshot output for the notes-visible footer row.
- Updated docs in `docs/tui-request-user-input.md` to reflect
mode-specific `Esc` behavior.
## Summary
- preserve baseline streaming behavior (smooth mode still commits one
line per 50ms tick)
- extract adaptive chunking policy and commit-tick orchestration from
ChatWidget into `streaming/chunking.rs` and `streaming/commit_tick.rs`
- add hysteresis-based catch-up behavior with bounded batch draining to
reduce queue lag without bursty single-frame jumps
- document policy behavior, tuning guidance, and debug flow in rustdoc +
docs
## Testing
- just fmt
- cargo test -p codex-tui
### Motivation
- Ensure `codex exec` exits when a running turn is interrupted (e.g.,
Ctrl-C) so the CLI is not "immortal" when websockets/streaming are used.
### Description
- Return `CodexStatus::InitiateShutdown` when handling
`EventMsg::TurnAborted` in
`exec/src/event_processor_with_human_output.rs` so human-output exec
mode shuts down after an interrupt.
- Treat `protocol::EventMsg::TurnAborted` as
`CodexStatus::InitiateShutdown` in
`exec/src/event_processor_with_jsonl_output.rs` so JSONL output mode
behaves the same.
- Applied formatting with `just fmt`.
### Testing
- Ran `just fmt` successfully.
- Ran `cargo test -p codex-exec`; many unit tests ran and the test
command completed, but the full test run in this environment produced
`35 passed, 11 failed` where the failures are due to Landlock sandbox
panics and 403 responses in the test harness (environmental/integration
issues) and are not caused by the interrupt/shutdown changes.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_698165cec4e083258d17702bd29014c1)
## Description
### What changed
- Switch the arg0 helper root from `~/.codex/tmp/path` to
`~/.codex/tmp/path2`
- Add `Arg0PathEntryGuard` to keep both the `TempDir` and an exclusive
`.lock` file alive for the process lifetime
- Add a startup janitor that scans `path2` and deletes only directories
whose lock can be acquired
### Tests
- `cargo clippy -p codex-arg0`
- `cargo clippy -p codex-core`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core`
Today, there is a single capability SID that allows the sandbox to write
to
* workspace (cwd)
* tmp directories if enabled
* additional writable roots
This change splits those up, so that each workspace has its own
capability SID, while tmp and additional roots, which are
installation-wide, are still governed by the "generic" capability SID
This isolates workspaces from each other in terms of sandbox write
access.
Also allows us to protect <cwd>/.codex when codex runs in a specific
<cwd>
Make Apps Gateway MCP blocking since otherwise app mentions may not work
when apps are not loaded. Messages sent before apps become available
will be queued.
This only affects when `apps` feature is enabled.