MCP actions take a long time to load for users with lots of apps
installed. Adding a cache for these actions with 1hr expiration, given
that they are almost always aren't going to change unless people install
another app, which means they also need to restart codex to pick it up.
This PR fixes jitter in the TUI apps menu by making the description
column stable during rendering and height measurement.
Added a `stable_desc_col` option to
`SelectionViewParams`/`ListSelectionView`, introduced stable variants of
the shared row render/measure helpers in `selection_popup_common`, and
enabled the stable mode for the apps/connectors picker in `chatwidget`.
With these changes, only the apps/connectors picker uses this new
option, though it could be used elsewhere in the future.
Why: previously, the description column was computed from only currently
visible rows, so as you scrolled or filtered, the column could shift and
cause wrapping/height changes that looked jumpy. Computing it from all
rows in this popup keeps alignment and layout consistent as users scroll
through avaialble apps.
**Before:**
https://github.com/user-attachments/assets/3856cb72-5465-4b90-a993-65a2ffb09113
**After:**
https://github.com/user-attachments/assets/37b9d626-0b21-4c0f-8bb8-244c9ef971ff
- Schema: thread_id (PK, FK to threads.id with cascade delete),
trace_summary, memory_summary, updated_at.
- Migration: creates the table and an index on (updated_at DESC,
thread_id DESC) for efficient recent-first reads.
- Runtime API (DB-only):
- `get_thread_memory(thread_id)`: fetch one memory row.
- `upsert_thread_memory(thread_id, trace_summary, memory_summary)`:
insert/update by thread id and always advance updated_at.
- `get_last_n_thread_memories_for_cwd(cwd, n)`: join thread_memory with
threads and return newest n rows for an exact cwd match.
- Model layer: introduced ThreadMemory and row conversion types to keep
query decoding typed and consistent with existing state models.
## Summary
This PR introduces a gated Bubblewrap (bwrap) Linux sandbox path. The
curent Linux sandbox path relies on in-process restrictions (including
Landlock). Bubblewrap gives us a more uniform filesystem isolation
model, especially explicit writable roots with the option to make some
directories read-only and granular network controls.
This is behind a feature flag so we can validate behavior safely before
making it the default.
- Added temporary rollout flag:
- `features.use_linux_sandbox_bwrap`
- Preserved existing default path when the flag is off.
- In Bubblewrap mode:
- Added internal retry without /proc when /proc mount is not permitted
by the host/container.
This PR adds a new approval option for app/MCP tool calls: “Allow and
remember” (session-scoped).
When selected, Codex stores a temporary approval and auto-approves
matching future calls for the rest of the session.
Added a session-scoped approval key (`server`, `connector_id`,
`tool_name`) and persisted it in `tool_approvals` as
`ApprovedForSession`.
On subsequent matching calls, approval is skipped and treated as
accepted.
- Updated the approval question options to conditionally include:
- Accept
- Allow and remember (conditional)
- Decline
- Cancel
The new “Allow and remember” option is only shown when all of these are
true:
1. The call is routed through the Codex Apps MCP server (codex_apps).
2. The tool requires approval based on annotations:
- read_only_hint == false, and
- destructive_hint == true or open_world_hint == true.
3. The tool includes a connector_id in metadata (used to build the
remembered approval key).
If no `connector_id` is present, the prompt still appears (when approval
is required), but only with the existing choices (Accept / Decline /
Cancel). Approval prompting in this path has an explicit early return
unless server == `codex_apps`.
Summary:
- replace the `sse_completed` fixture and related JSON template with
direct `responses::ev_completed` payload builders
- cascade the new SSE helpers through all affected core tests for
consistency and clarity
- remove legacy fixtures that were no longer needed once the helpers are
in place
Testing:
- Not run (not requested)
Summary
- add versioned state sqlite filename helpers and re-export them from
the state crate
- remove legacy state files when initializing the runtime and update
consumers/tests to use the new helpers
- tweak logs client description and database resolution to match the new
path
When communicating over websockets, we can't rely on headers to deliver
rate limit information. This PR adds a `codex.rate_limits` event that
the server can pass to the client to inform them about rate limit usage.
The client parses this data the same way we parse rate limit headers in
HTTP mode.
This PR also wires up the etag and reasoning headers for websockets
If we want to build `/debug-config`, we'll need to know the requirements
sources that supplied the values.
This PR adds those sources such that we can render them in the UI.
Summary
- add Cursor/ThreadsPage conversions so state DB listings can be mapped
back into the rollout list model
- make recorder list helpers query the state DB first (archived flag
included) and only fall back to file traversal if needed, along with
populating head bytes lazily
- add extensive tests to ensure the DB path is honored for active and
archived threads and that the fallback works
Testing
- Not run (not requested)
<img width="1196" height="693" alt="Screenshot 2026-02-03 at 20 42 33"
src="https://github.com/user-attachments/assets/826b3c7a-ef11-4b27-802a-3c343695794a"
/>
- add `thread/compact` as a trigger-only v2 RPC that submits
`Op::Compact` and returns `{}` immediately.
- add v2 compaction e2e coverage for success and invalid/unknown thread
ids, and update protocol schemas/docs.
## Summary
This PR updates the `request_user_input` TUI overlay so `Esc` is
context-aware:
- When notes are visible for an option question, `Esc` now clears notes
and exits notes mode.
- When notes are not visible (normal option selection UI), `Esc` still
interrupts as before.
It also updates footer guidance text to match behavior.
## Changes
- Added a shared notes-clear path for option questions:
- `Tab` and `Esc` now both clear notes and return focus to options when
notes are visible.
- Updated footer hint text in notes-visible state:
- from: `tab to clear notes | ... | esc to interrupt`
- to: `tab or esc to clear notes | ...`
- Hid `esc to interrupt` hint while notes are visible for option
questions.
- Kept `esc to interrupt` visible and functional in normal
option-selection mode.
- Updated tests to assert the new `Esc` behavior in notes mode.
- Updated snapshot output for the notes-visible footer row.
- Updated docs in `docs/tui-request-user-input.md` to reflect
mode-specific `Esc` behavior.
## Summary
- preserve baseline streaming behavior (smooth mode still commits one
line per 50ms tick)
- extract adaptive chunking policy and commit-tick orchestration from
ChatWidget into `streaming/chunking.rs` and `streaming/commit_tick.rs`
- add hysteresis-based catch-up behavior with bounded batch draining to
reduce queue lag without bursty single-frame jumps
- document policy behavior, tuning guidance, and debug flow in rustdoc +
docs
## Testing
- just fmt
- cargo test -p codex-tui
### Motivation
- Ensure `codex exec` exits when a running turn is interrupted (e.g.,
Ctrl-C) so the CLI is not "immortal" when websockets/streaming are used.
### Description
- Return `CodexStatus::InitiateShutdown` when handling
`EventMsg::TurnAborted` in
`exec/src/event_processor_with_human_output.rs` so human-output exec
mode shuts down after an interrupt.
- Treat `protocol::EventMsg::TurnAborted` as
`CodexStatus::InitiateShutdown` in
`exec/src/event_processor_with_jsonl_output.rs` so JSONL output mode
behaves the same.
- Applied formatting with `just fmt`.
### Testing
- Ran `just fmt` successfully.
- Ran `cargo test -p codex-exec`; many unit tests ran and the test
command completed, but the full test run in this environment produced
`35 passed, 11 failed` where the failures are due to Landlock sandbox
panics and 403 responses in the test harness (environmental/integration
issues) and are not caused by the interrupt/shutdown changes.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_698165cec4e083258d17702bd29014c1)
## Description
### What changed
- Switch the arg0 helper root from `~/.codex/tmp/path` to
`~/.codex/tmp/path2`
- Add `Arg0PathEntryGuard` to keep both the `TempDir` and an exclusive
`.lock` file alive for the process lifetime
- Add a startup janitor that scans `path2` and deletes only directories
whose lock can be acquired
### Tests
- `cargo clippy -p codex-arg0`
- `cargo clippy -p codex-core`
- `cargo test -p codex-arg0`
- `cargo test -p codex-core`
Today, there is a single capability SID that allows the sandbox to write
to
* workspace (cwd)
* tmp directories if enabled
* additional writable roots
This change splits those up, so that each workspace has its own
capability SID, while tmp and additional roots, which are
installation-wide, are still governed by the "generic" capability SID
This isolates workspaces from each other in terms of sandbox write
access.
Also allows us to protect <cwd>/.codex when codex runs in a specific
<cwd>
Make Apps Gateway MCP blocking since otherwise app mentions may not work
when apps are not loaded. Messages sent before apps become available
will be queued.
This only affects when `apps` feature is enabled.
## Summary
This PR updates `request_user_input` behavior and Default-mode guidance
to match current collaboration-mode semantics and reduce model
confusion.
## Why
- `request_user_input` should be explicitly documented as **Plan-only**.
- Tool description and runtime availability checks should be driven by
the **same centralized mode policy**.
- Default mode prompt needed stronger execution guidance and explicit
instruction that `request_user_input` is unavailable.
- Error messages should report the **actual mode name** (not aliases
that can read as misleading).
## What changed
- Centralized `request_user_input` mode policy in `core` handler logic:
- Added a single allowed-modes config (`Plan` only).
- Reused that policy for:
- runtime rejection messaging
- tool description text
- Updated tool description to include availability constraint:
- `"This tool is only available in Plan mode."`
- Updated runtime rejection behavior:
- `Default` -> `"request_user_input is unavailable in Default mode"`
- `Execute` -> `"request_user_input is unavailable in Execute mode"`
- `PairProgramming` -> `"request_user_input is unavailable in Pair
Programming mode"`
- Strengthened Default collaboration prompt:
- Added explicit execution-first behavior
- Added assumptions-first guidance
- Added explicit `request_user_input` unavailability instruction
- Added concise progress-reporting expectations
- Simplified formatting implementation:
- Inlined allowed-mode name collection into `format_allowed_modes()`
- Kept `format_allowed_modes()` output for 3+ modes as CSV style
(`modes: a,b,c`)
One of our partners flagged that they were seeing the wrong order of
events when running `review/start` with command exec approvals:
```
{"method":"item/commandExecution/requestApproval","id":0,"params":{"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0","itemId":"0","reason":"`/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'` requires approval: Xcode-required approval: Require explicit user confirmation for all commands.","proposedExecpolicyAmendment":null}}
{"method":"item/started","params":{"item":{"type":"commandExecution","id":"call_AEjlbHqLYNM7kbU3N6uw1CNi","command":"/bin/zsh -lc 'git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat'","cwd":"/Users/devingreen/Desktop/SampleProject","processId":null,"status":"inProgress","commandActions":[{"type":"unknown","command":"git show b7a92b4eacf262c575f26b1e1ed621a357642e55 --stat"}],"aggregatedOutput":null,"exitCode":null,"durationMs":null},"threadId":"019c0b6b-6a42-7c02-99c4-98c80e88ac27","turnId":"0"}}
```
**Key fix**: In the review sub‑agent delegate we were forwarding exec
(and patch) approvals using the parent turn id (`parent_ctx.sub_id`) as
the approval call_id. That made
`item/commandExecution/requestApproval.itemId` differ from the actual
`item/started` id. We now forward the sub‑agent’s `call_id` from the
approval event instead, so the approval item id matches the
commandExecution item id in review flows.
Here’s the expected event order for an inline `review/start` that
triggers an exec approval after this fix:
1. Response to review/start (JSON‑RPC response)
- Includes `turn` (status inProgress) and `review_thread_id` (same as
parent thread for inline).
2. `turn/started` notification
- turnId is the review turn id (e.g., "0").
3. `item/started` → EnteredReviewMode
- item.id == turnId, marks entry into review mode.
4. `item/started` → commandExecution
- item.id == <call_id> (e.g., "review-call-1"), status: inProgress.
5. `item/commandExecution/requestApproval` request
- JSON‑RPC request (not a notification).
- params.itemId == <call_id> and params.turnId == turnId.
6. Client replies to approval request (Approved / Declined / etc).
7. If approved:
- Optional `item/commandExecution/outputDelta` notifications.
- `item/completed` → commandExecution with status and exitCode.
8. Review finishes:
- `item/started` → ExitedReviewMode
- `item/completed` → ExitedReviewMode
- (Agent message items may also appear, depending on review output.)
9. `turn/completed` notification
The key being #4 and #5 are now in the proper order with the correct
item id.
This updates our generated TypeScript types to be more correct with how
the server actually behaves, **specifically for JSON-RPC requests**.
Before this PR, we'd generate `field: T | null`. After this PR, we will
have `field?: T | null`. The latter matches how the server actually
works, in that if an optional field is omitted, the server will treat it
as null. This also makes it less annoying in theory for clients to
upgrade to newer versions of Codex, since adding a new optional field to
a JSON-RPC request should not require a client change.
NOTE: This only applies to JSON-RPC requests. All other payloads (i.e.
responses, notifications) will return `field: T | null` as usual.
In light of https://github.com/openai/codex/pull/10317, because
`.agents` can include resources that Codex can run in a privileged way,
it should be read-only by default just as `.codex` is.
Summary
- mark the shell-related tools as supporting parallel tool calls so
exec_command, shell_command, etc. can run concurrently
- update expectations in tool parallelism tests to reflect the new
parallel behavior
- drop the unused serial duration helper from the suite
Testing
- Not run (not requested)
## Summary
This PR simplifies collaboration modes to the visible set `default |
plan`, while preserving backward compatibility for older partners that
may still send legacy mode
names.
Specifically:
- Renames the old Code behavior to **Default**.
- Keeps **Plan** as-is.
- Removes **Custom** mode behavior (fallbacks now resolve to Default).
- Keeps `PairProgramming` and `Execute` internally for compatibility
plumbing, while removing them from schema/API and UI visibility.
- Adds legacy input aliasing so older clients can still send old mode
names.
## What Changed
1. Mode enum and compatibility
- `ModeKind` now uses `Plan` + `Default` as active/public modes.
- `ModeKind::Default` deserialization accepts legacy values:
- `code`
- `pair_programming`
- `execute`
- `custom`
- `PairProgramming` and `Execute` variants remain in code but are hidden
from protocol/schema generation.
- `Custom` variant is removed; previous custom fallbacks now map to
`Default`.
2. Collaboration presets and templates
- Built-in presets now return only:
- `Plan`
- `Default`
- Template rename:
- `core/templates/collaboration_mode/code.md` -> `default.md`
- `execute.md` and `pair_programming.md` remain on disk but are not
surfaced in visible preset lists.
3. TUI updates
- Updated user-facing naming and prompts from “Code” to “Default”.
- Updated mode-cycle and indicator behavior to reflect only visible
`Plan` and `Default`.
- Updated corresponding tests and snapshots.
4. request_user_input behavior
- `request_user_input` remains allowed only in `Plan` mode.
- Rejection messaging now consistently treats non-plan modes as
`Default`.
5. Schemas
- Regenerated config and app-server schemas.
- Public schema types now advertise mode values as:
- `plan`
- `default`
## Backward Compatibility Notes
- Incoming legacy mode names (`code`, `pair_programming`, `execute`,
`custom`) are accepted and coerced to `default`.
- Outgoing/public schema surfaces intentionally expose only `plan |
default`.
- This allows tolerant ingestion of older partner payloads while
standardizing new integrations on the reduced mode set.
## Codex author
`codex fork 019c1fae-693b-7840-b16e-9ad38ea0bd00`
type clash; app-server generated types were still using the v1
snake_case `WebSearchAction`, so there was a mismatch between the
camelCase emitted types and the snake_case types we were trying to
parse.
Updated v2 `WebSearchAction` to export into the `v2/` type set and
updated `ThreadItem` to use that.
### Tests
Ran new `just write-app-server-schema` to surface changes to schema, the
import looks correct now.
Summary
- remove the extra transaction guard that checked for existing dynamic
tools per thread before inserting new ones
- insert each tool record with `ON CONFLICT(thread_id, position) DO
NOTHING` to ignore duplicates instead of pre-querying
- simplify execution to use the shared pool directly and avoid unneeded
commits
Testing
- Not run (not requested)