Commit graph

1255 commits

Author SHA1 Message Date
Michael Bolin
bfff0c729f
config: enforce enterprise feature requirements (#13388)
## Why

Enterprises can already constrain approvals, sandboxing, and web search
through `requirements.toml` and MDM, but feature flags were still only
configurable as managed defaults. That meant an enterprise could suggest
feature values, but it could not actually pin them.

This change closes that gap and makes enterprise feature requirements
behave like the other constrained settings. The effective feature set
now stays consistent with enterprise requirements during config load,
when config writes are validated, and when runtime code mutates feature
flags later in the session.

It also tightens the runtime API for managed features. `ManagedFeatures`
now follows the same constraint-oriented shape as `Constrained<T>`
instead of exposing panic-prone mutation helpers, and production code
can no longer construct it through an unconstrained `From<Features>`
path.

The PR also hardens the `compact_resume_fork` integration coverage on
Windows. After the feature-management changes,
`compact_resume_after_second_compaction_preserves_history` was
overflowing the libtest/Tokio thread stacks on Windows, so the test now
uses an explicit larger-stack harness as a pragmatic mitigation. That
may not be the ideal root-cause fix, and it merits a parallel
investigation into whether part of the async future chain should be
boxed to reduce stack pressure instead.

## What Changed

Enterprises can now pin feature values in `requirements.toml` with the
requirements-side `features` table:

```toml
[features]
personality = true
unified_exec = false
```

Only canonical feature keys are allowed in the requirements `features`
table; omitted keys remain unconstrained.

- Added a requirements-side pinned feature map to
`ConfigRequirementsToml`, threaded it through source-preserving
requirements merge and normalization in `codex-config`, and made the
TOML surface use `[features]` (while still accepting legacy
`[feature_requirements]` for compatibility).
- Exposed `featureRequirements` from `configRequirements/read`,
regenerated the JSON/TypeScript schema artifacts, and updated the
app-server README.
- Wrapped the effective feature set in `ManagedFeatures`, backed by
`ConstrainedWithSource<Features>`, and changed its API to mirror
`Constrained<T>`: `can_set(...)`, `set(...) -> ConstraintResult<()>`,
and result-returning `enable` / `disable` / `set_enabled` helpers.
- Removed the legacy-usage and bulk-map passthroughs from
`ManagedFeatures`; callers that need those behaviors now mutate a plain
`Features` value and reapply it through `set(...)`, so the constrained
wrapper remains the enforcement boundary.
- Removed the production loophole for constructing unconstrained
`ManagedFeatures`. Non-test code now creates it through the configured
feature-loading path, and `impl From<Features> for ManagedFeatures` is
restricted to `#[cfg(test)]`.
- Rejected legacy feature aliases in enterprise feature requirements,
and return a load error when a pinned combination cannot survive
dependency normalization.
- Validated config writes against enterprise feature requirements before
persisting changes, including explicit conflicting writes and
profile-specific feature states that normalize into invalid
combinations.
- Updated runtime and TUI feature-toggle paths to use the constrained
setter API and to persist or apply the effective post-constraint value
rather than the requested value.
- Updated the `core_test_support` Bazel target to include the bundled
core model-catalog fixtures in its runtime data, so helper code that
resolves `core/models.json` through runfiles works in remote Bazel test
environments.
- Renamed the core config test coverage to emphasize that effective
feature values are normalized at runtime, while conflicting persisted
config writes are rejected.
- Ran `compact_resume_after_second_compaction_preserves_history` inside
an explicit 8 MiB test thread and Tokio runtime worker stack, following
the existing larger-stack integration-test pattern, to keep the Windows
`compact_resume_fork` test slice from aborting while a parallel
investigation continues into whether some of the underlying async
futures should be boxed.

## Verification

- `cargo test -p codex-config`
- `cargo test -p codex-core feature_requirements_ -- --nocapture`
- `cargo test -p codex-core
load_requirements_toml_produces_expected_constraints -- --nocapture`
- `cargo test -p codex-core
compact_resume_after_second_compaction_preserves_history -- --nocapture`
- `cargo test -p codex-core compact_resume_fork -- --nocapture`
- Re-ran the built `codex-core` `tests/all` binary with
`RUST_MIN_STACK=262144` for
`compact_resume_after_second_compaction_preserves_history` to confirm
the explicit-stack harness fixes the deterministic low-stack repro.
- `cargo test -p codex-core`
- This still fails locally in unrelated integration areas that expect
the `codex` / `test_stdio_server` binaries or hit existing `search_tool`
wiremock mismatches.

## Docs

`developers.openai.com/codex` should document the requirements-side
`[features]` table for enterprise and MDM-managed configuration,
including that it only accepts canonical feature keys and that
conflicting config writes are rejected.
2026-03-04 04:40:22 +00:00
Celia Chen
e6773f856c
Feat: Preserve network access on read-only sandbox policies (#13409)
## Summary

`PermissionProfile.network` could not be preserved when additional or
compiled permissions resolved to
`SandboxPolicy::ReadOnly`, because `ReadOnly` had no network_access
field. This change makes read-only + network
enabled representable directly and threads that through the protocol,
app-server v2 mirror, and permission-
  merging logic.

## What changed

- Added `network_access: bool` to `SandboxPolicy::ReadOnly` in the core
protocol and app-server v2 protocol.
- Kept backward compatibility by defaulting the new field to false, so
legacy read-only payloads still
    deserialize unchanged.
- Updated `has_full_network_access()` and sandbox summaries to respect
read-only network access.
  - Preserved PermissionProfile.network when:
      - compiling skill permission profiles into sandbox policies
      - normalizing additional permissions
      - merging additional permissions into existing sandbox policies
- Updated the approval overlay to show network in the rendered
permission rule when requested.
  - Regenerated app-server schema fixtures for the new v2 wire shape.
2026-03-04 02:41:57 +00:00
rhan-oai
e951ef4374
[feedback] diagnostics (#13292)
- added header logic to display diagnostics on cli
- added logic for collecting env vars

<img width="606" height="327" alt="Screenshot 2026-03-03 at 3 49 31 PM"
src="https://github.com/user-attachments/assets/05e78c56-8cb3-47fa-abaf-3e57f1fdd8e2"
/>

<img width="690" height="353" alt="Screenshot 2026-03-02 at 6 47 54 PM"
src="https://github.com/user-attachments/assets/e470b559-13f4-44d9-897f-bc398943c6d1"
/>
2026-03-03 16:34:11 -08:00
Charley Cunningham
299b8ac445
tui: align pending steers with core acceptance (#12868)
## Summary
- submit `Enter` steers immediately while a turn is already running
instead of routing them through `queued_user_messages`
- keep those submitted steers visible in the footer as `pending_steers`
until core records them as a user message or aborts the turn
- reconcile pending steers on `ItemCompleted(UserMessage)`, not
`RawResponseItem`
- emit user-message item lifecycle for leftover pending input at task
finish, then remove the TUI `TurnComplete` fallback
- keep `queued_user_messages` for actual queued drafts, rendered below
pending steers

## Problem
While the assistant was generating, pressing `Enter` could send the
input into `queued_user_messages`. That queue only drains after the turn
ends, so ordinary steers behaved like queued drafts instead of landing
at the next core sampling boundary.

The first version of this fix also used `RawResponseItem` to decide when
a steer had landed. Review feedback was that this is the wrong
abstraction for client behavior.

There was also a late edge case in core: if pending steer input was
accepted after the final sampling decision but before `TurnComplete`,
core would record that user message into history at task finish without
emitting `ItemStarted(UserMessage)` / `ItemCompleted(UserMessage)`. TUI
had a fallback to paper over that gap locally.

## Approach
- `Enter` during an active turn now submits a normal `Op::UserTurn`
immediately
- TUI keeps a local pending-steer preview instead of rendering that user
message into history immediately
- when core records the steer as `ItemCompleted(UserMessage)`, TUI
matches and removes the corresponding pending preview, then renders the
committed user message
- core now emits the same user-message lifecycle when
`on_task_finished(...)` drains leftover pending user input, before
`TurnComplete`
- with that lifecycle gap closed in core, TUI no longer needs to flush
pending steers into history on `TurnComplete`
- if the turn is interrupted, pending steers and queued drafts are both
restored into the composer, with pending steers first

## Notes
- `Tab` still uses the real queued-message path
- `queued_user_messages` and `pending_steers` are separate state with
separate semantics
- the pending-steer matching key is built directly from `UserInput`
- this removes the new TUI dependency on `RawResponseItem`

## Validation
- `just fmt`
- `cargo test -p codex-core
task_finish_emits_turn_item_lifecycle_for_leftover_pending_user_input --
--nocapture`
- `cargo test -p codex-tui`
2026-03-03 15:31:52 -08:00
Charley Cunningham
c4bd0aa3b9
app-server: source /feedback logs from sqlite at trace level (#12969)
## Summary
- write app-server SQLite logs at TRACE level when SQLite is enabled
- source app-server `/feedback` log attachments from SQLite for the
requested thread when available
- flush buffered SQLite log writes before `/feedback` queries them so
newly emitted events are not lost behind the async inserter
- include same-process threadless SQLite rows in those `/feedback` logs
so the attachment matches the process-wide feedback buffer more closely
- keep the existing in-memory ring buffer fallback unchanged, including
when the SQLite query returns no rows

## Details
- add a byte-bounded `query_feedback_logs` helper in `codex-state` so
`/feedback` does not fetch all rows before truncating
- scope SQLite feedback logs to the requested thread plus threadless
rows from the same `process_uuid`
- format exported SQLite feedback lines with the log level prefix to
better match the in-memory feedback formatter
- add an explicit `LogDbLayer::flush()` control path and await it in
app-server before querying SQLite for feedback logs
- pass optional SQLite log bytes through `codex-feedback` as the
`codex-logs.log` attachment override
- leave TUI behavior unchanged apart from the updated `upload_feedback`
call signature
- add regression coverage for:
  - newest-within-budget ordering
  - excluding oversized newest rows
  - including same-process threadless rows
  - keeping the newest suffix across mixed thread and threadless rows
  - matching the feedback formatter shape aside from span prefixes
  - falling back to the in-memory snapshot when SQLite returns no logs
  - flushing buffered SQLite rows before querying

## Follow-up
- SQLite feedback exports still do not reproduce span prefixes like
`feedback-thread{thread_id=...}:`; there is a `TODO(ccunningham)` in
`codex-rs/state/src/log_db.rs` for that follow-up.

## Testing
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-app-server`
- `cd codex-rs && just fmt`
2026-03-03 11:17:06 -08:00
pash-openai
07e532dcb9
app-server service tier plumbing (plus some cleanup) (#13334)
followup to https://github.com/openai/codex/pull/13212 to expose fast
tier controls to app server
(majority of this PR is generated schema jsons - actual code is +69 /
-35 and +24 tests )

- add service tier fields to the app-server protocol surfaces used by
thread lifecycle, turn start, config, and session configured events
- thread service tier through the app-server message processor and core
thread config snapshots
- allow runtime config overrides to carry service tier for app-server
callers

cleanup:
- Removing useless "legacy" code supporting "standard" - we moved to
None | "fast", so "standard" is not needed.
2026-03-03 02:35:09 -08:00
bwanner-oai
6deb72c04b
Renaming Team to Business plan during TUI onboarding (#13313)
Team is referred to as "Business"
2026-03-02 23:13:29 -08:00
pash-openai
2f5b01abd6
add fast mode toggle (#13212)
- add a local Fast mode setting in codex-core (similar to how model id
is currently stored on disk locally)
- send `service_tier=priority` on requests when Fast is enabled
- add `/fast` in the TUI and persist it locally
- feature flag
2026-03-02 20:29:33 -08:00
rakan-oai
56cc2c71f4
tui: preserve kill buffer across submit and slash-command clears (#12006)
## Problem

Before this change, composer paths that cleared the textarea after
submit or slash-command dispatch
also cleared the textarea kill buffer. That meant a user could `Ctrl+K`
part of a draft, trigger a
composer action that cleared the visible draft, and then lose the
ability to `Ctrl+Y` the killed
text back.

This was especially awkward for workflows where the user wants to
temporarily remove text, run a
composer action such as changing reasoning level or dispatching a slash
command, and then restore
the killed text into the now-empty draft.

## Mental model

This change separates visible draft state from editing-history state.

The visible draft includes the current textarea contents and text
elements that should be cleared
when the composer submits or dispatches a command. The kill buffer is
different: it represents the
most recent killed text and should survive those composer-driven clears
so the user can still yank
it back afterward.

After this change, submit and slash-command dispatch still clear the
visible textarea contents, but
they no longer erase the most recent kill.

## Non-goals

This does not implement a multi-entry kill ring or change the semantics
of `Ctrl+K` and `Ctrl+Y`
beyond preserving the existing yank target across these clears.

It also does not change how submit, slash-command parsing, prompt
expansion, or attachment handling
work, except that those flows no longer discard the textarea kill buffer
as a side effect of
clearing the draft.

## Tradeoffs

The main tradeoff is that clearing the visible textarea is no longer
equivalent to fully resetting
all editing state. That is intentional here, because submit and
slash-command dispatch are composer
actions, not requests to forget the user's most recent kill.

The benefit is better editing continuity. The cost is that callers must
understand that full-buffer
replacement resets visible draft state but not the kill buffer.

## Architecture

The behavioral change is in `TextArea`: full-buffer replacement now
rebuilds text and elements
without clearing `kill_buffer`.

`ChatComposer` already clears the textarea after successful submit and
slash-command dispatch by
calling into those textarea replacement paths. With this change, those
existing composer flows
inherit the new behavior automatically: the visible draft is cleared,
but the last killed text
remains available for `Ctrl+Y`.

The tests cover both layers:

- `TextArea` verifies that the kill buffer survives full-buffer
replacement.
- `ChatComposer` verifies that it survives submit.
- `ChatComposer` also verifies that it survives slash-command dispatch.

## Observability

There is no dedicated logging for kill-buffer preservation. The most
direct way to reason about the
behavior is to inspect textarea-wide replacement paths and confirm
whether they treat the kill
buffer as visible-buffer state or as editing-history state.

If this regresses in the future, the likely failure mode is simple and
user-visible: `Ctrl+Y` stops
restoring text after submit or slash-command clears even though ordinary
kill/yank still works
within a single uninterrupted draft.

## Tests

Added focused regression coverage for the new contract:

- `kill_buffer_persists_across_set_text`
- `kill_buffer_persists_after_submit`
- `kill_buffer_persists_after_slash_command_dispatch`

Local verification:
- `just fmt`
- `cargo test -p codex-tui`

---------

Co-authored-by: Josh McKinney <joshka@openai.com>
2026-03-03 02:06:08 +00:00
Celia Chen
0bb152b01d
chore: remove SkillMetadata.permissions and derive skill sandboxing from permission_profile (#13061)
## Summary

This change removes the compiled permissions field from skill metadata
and keeps permission_profile as the single source of truth.

Skill loading no longer compiles skill permissions eagerly. Instead, the
zsh-fork skill escalation path compiles `skill.permission_profile` when
it needs to determine the sandbox to apply for a skill script.

  ## Behavior change

  For skills that declare:
```
  permissions: {}
```
we now treat that the same as having no skill permissions override,
instead of creating and using a default readonly sandbox. This change
makes the behavior more intuitive:

  - only non-empty skill permission profiles affect sandboxing
- omitting permissions and writing permissions: {} now mean the same
thing
- skill metadata keeps a single permissions representation instead of
storing derived state too

Overall, this makes skill sandbox behavior easier to understand and more
predictable.
2026-03-03 01:29:53 +00:00
Ahmed Ibrahim
b20b6aa46f
Update realtime websocket API (#13265)
- migrate the realtime websocket transport to the new session and
handoff flow
- make the realtime model configurable in config.toml and use API-key
auth for the websocket

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 16:05:40 -08:00
jif-oai
f8838fd6f3
feat: enable ma through /agent (#13246)
<img width="639" height="139" alt="Screenshot 2026-03-02 at 16 06 41"
src="https://github.com/user-attachments/assets/c006fcec-c1e7-41ce-bb84-c121d5ffb501"
/>

Then
<img width="372" height="37" alt="Screenshot 2026-03-02 at 16 06 49"
src="https://github.com/user-attachments/assets/aa4ad703-e7e7-4620-9032-f5cd4f48ff79"
/>
2026-03-02 18:37:29 +00:00
Charley Cunningham
7979ce453a
tui: restore draft footer hints (#13202)
## Summary
- restore `Tab to queue` when a draft is present and the agent is
running
- keep draft-idle footers passive by showing the normal footer or status
line instead of `? for shortcuts`
- align footer snapshot coverage with the updated draft footer behavior

## Codex author
`codex resume 019c7f1c-43aa-73e0-97c7-40f457396bb0`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 10:26:13 -08:00
jif-oai
9a42a56d8f
chore: /multiagent alias for /agent (#13249)
Add a `/mutli-agents` alias for `/agent` and update the wording
2026-03-02 16:51:54 +00:00
jif-oai
2a5bcc053f
fix: esc in /agent (#13131)
Fix https://github.com/openai/codex/issues/13093
2026-03-02 15:49:06 +00:00
jif-oai
b08bdd91e3
fix: /status when sub-agent (#13130)
Fix https://github.com/openai/codex/issues/13066
2026-03-02 11:57:15 +00:00
gabec-openai
9685e7d6d1
Improve subagent contrast in TUI (#13197)
## Summary
- raise contrast for subagent transcript labels and fallback states
- remove low-contrast dim styling from role tags and error details
- make the closed-agent picker dot readable in dark theme

## Validation
- just fmt
- just fix -p codex-tui
- cargo test -p codex-tui

Co-authored-by: Codex <noreply@openai.com>
2026-03-02 12:16:49 +01:00
Ahmed Ibrahim
0aeb55bf08
Record realtime close marker on replacement (#13058)
## Summary
- record a realtime close developer message when a new realtime session
replaces an active one
- assert the replacement marker through the mocked responses request
path

---------

Co-authored-by: Codex <noreply@openai.com>
Co-authored-by: Charles Cunningham <ccunningham@openai.com>
2026-03-01 13:54:12 -08:00
xl-openai
752402c4fe
feat: load from plugins (#12864)
Support loading plugins.

Plugins can now be enabled via [plugins.<name>] in config.toml. They are
loaded as first-class entities through PluginsManager, and their default
skills/ and .mcp.json contributions are integrated into the existing
skills and MCP flows.
2026-03-01 10:50:56 -08:00
jif-oai
74e5150b1e
fix: package models.json for Bazel tests (#13129) 2026-02-28 17:21:02 +01:00
jif-oai
3bfee6fcb5
nit: ignore `resume_startup_does_not_consume_model_availability_nux_c… (#13128) 2026-02-28 14:50:41 +01:00
jif-oai
2b38b4e03b
feat: approval for sub-agent in the TUI (#12995)
<img width="766" height="290" alt="Screenshot 2026-02-27 at 10 50 48"
src="https://github.com/user-attachments/assets/3bc96cd9-ed2c-4d67-a317-8f7b60abbbb1"
/>
2026-02-28 14:07:07 +01:00
Eric Traut
e6032eb0b7
Fix CLI feedback link (#13086)
Addresses #12967

About a month ago, I updated the Github bug report templates to
accommodate the (at the time) new Codex app. The `/feedback` code path
in the CLI was referencing one of the old templates, and I didn't
realize it at the time. This PR updates the link so users don't get an
empty bug template when using `/feedback`.
2026-02-27 19:02:40 -07:00
Ahmed Ibrahim
ec6f6aacbf
Add model availability NUX tooltips (#13021)
- override startup tooltips with model availability NUX and persist
per-model show counts in config
- stop showing each model after four exposures and fall back to normal
tooltips
2026-02-27 17:14:06 -08:00
Eric Traut
61c42396ab
Keep large-paste placeholders intact during file completion (#13070)
Addresses https://github.com/openai/codex/issues/13040

Fixes a regression in 0.106.0 introduced in
https://github.com/openai/codex/pull/9393

Summary
- replace only the active completion range so unrelated text elements
(e.g., large-paste placeholders) stay atomic and can still expand
- add a regression test verifying large paste placeholders persist
through completions and submit
- could not fetch issue details via GitHub API because network access is
disabled in this sandboxed environment
2026-02-27 17:19:11 -07:00
Felipe Coury
c3c75878e8
fix(tui): theme-aware diff backgrounds with fallback behavior (#13037)
## Problem

The TUI diff renderer uses hardcoded background palettes for
insert/delete lines that don't respect the user's chosen syntax theme.
When a theme defines `markup.inserted` / `markup.deleted` scope
backgrounds (the convention used by GitHub, Solarized, Monokai, and most
VS Code themes), those colors are ignored — the diff always renders with
the same green/red tints regardless of theme selection.

Separately, ANSI-16 terminals (and Windows Terminal sessions misreported
as ANSI-16) rendered diff backgrounds as full-saturation blocks that
obliterated syntax token colors, making highlighted diffs unreadable.

## Mental model

Diff backgrounds are resolved in three layers:

1. **Color level detection** — `diff_color_level_for_terminal()` maps
the raw `supports-color` probe + Windows Terminal heuristics to a
`DiffColorLevel` (TrueColor / Ansi256 / Ansi16). Windows Terminal gets
promoted from Ansi16 to TrueColor when `WT_SESSION` is present.

2. **Background resolution** — `resolve_diff_backgrounds()` queries the
active syntax theme for `markup.inserted`/`markup.deleted` (falling back
to `diff.inserted`/`diff.deleted`), then overlays those on top of the
hardcoded palette. For ANSI-256, theme RGB values are quantized to the
nearest xterm-256 index. For ANSI-16, backgrounds are `None`
(foreground-only).

3. **Style composition** — The resolved `ResolvedDiffBackgrounds` is
threaded through every call to `style_add`, `style_del`, `style_sign_*`,
and `style_line_bg_for`, which decide how to compose
foreground+background for each line kind and theme variant.

A new `RichDiffColorLevel` type (a subset of `DiffColorLevel` without
Ansi16) encodes the invariant "we have enough depth for tinted
backgrounds" at the type level, so background-producing functions have
exhaustive matches without unreachable arms.

## Non-goals

- No change to gutter (line number column) styling — gutter backgrounds
still use the hardcoded palette.
- No per-token scope background resolution — this is line-level
background only; syntax token colors come from the existing
`highlight_code_to_styled_spans` path.
- No dark/light theme auto-switching from scope backgrounds —
`DiffTheme` is still determined by querying the terminal's background
color.

## Tradeoffs

- **Theme trust vs. visual safety:** When a theme defines scope
backgrounds, we trust them unconditionally for rich color levels. A
badly authored theme could produce illegible combinations. The fallback
for `None` backgrounds (foreground-only) is intentionally conservative.
- **Quantization quality:** ANSI-256 quantization uses perceptual
distance across indices 16–255, skipping system colors. The result is
approximate — a subtle theme tint may land on a noticeably different
xterm index.
- **Single-query caching:** `resolve_diff_backgrounds` is called once
per `render_change` invocation (i.e., once per file in a diff). If the
theme changes mid-render (live preview), the next file picks up the new
backgrounds.

## Architecture

Files changed:

| File | Role |
|---|---|
| `tui/src/render/highlight.rs` | New: `DiffScopeBackgroundRgbs`,
`diff_scope_background_rgbs()`, scope extraction helpers |
| `tui/src/diff_render.rs` | New: `RichDiffColorLevel`,
`ResolvedDiffBackgrounds`, `resolve_diff_backgrounds*`,
`quantize_rgb_to_ansi256`, Windows Terminal promotion; modified: all
style helpers to accept/thread `ResolvedDiffBackgrounds` |

The scope-extraction code lives in `highlight.rs` because it uses
`syntect::highlighting::Highlighter` and the theme singleton. The
resolution and quantization logic lives in `diff_render.rs` because it
depends on diff-specific types (`DiffTheme`, `DiffColorLevel`, ratatui
`Color`).

## Observability

No runtime logging was added. The most useful debugging aid is the
`diff_color_level_for_terminal` function, which is pure and fully
unit-tested — to diagnose a color-depth mismatch, log its four inputs
(`StdoutColorLevel`, `TerminalName`, `WT_SESSION` presence,
`FORCE_COLOR` presence).

Scope resolution can be tested by loading a custom `.tmTheme` with known
`markup.inserted` / `markup.deleted` backgrounds and checking the diff
output in a truecolor terminal.

## Tests

- **Windows Terminal promotion:** 7 unit tests cover every branch of
`diff_color_level_for_terminal` (ANSI-16 promotion, `WT_SESSION`
unconditional promotion, `FORCE_COLOR` suppression, conservative
`Unknown` level).
- **ANSI-16 foreground-only:** Tests verify that `style_add`,
`style_del`, `style_sign_*`, `style_line_bg_for`, and `style_gutter_for`
all return `None` backgrounds on ANSI-16.
- **Scope resolution:** Tests verify `markup.*` preference over
`diff.*`, `None` when no scope matches, bundled theme resolution, and
custom `.tmTheme` round-trip.
- **Quantization:** Test verifies ANSI-256 quantization of a known RGB
triple.
- **Insta snapshots:** 2 new snapshot tests
(`ansi16_insert_delete_no_background`,
`theme_scope_background_resolution`) lock visual output.
2026-02-27 16:44:56 -07:00
Jeremy Rose
bc0a5843df
Align TUI voice transcription audio with 4o ASR (#13030)
## Summary
- switch TUI push-to-talk transcription requests to
`gpt-4o-mini-transcribe`
- prefer 24 kHz mono `i16` microphone configs and normalize voice input
to 24 kHz mono before upload/send
- add unit coverage for the new downmix/resample path

## Testing
- `just fmt`
- `cargo test -p codex-tui`
2026-02-27 18:22:48 +00:00
Felipe Coury
3b5996f988
fix(tui): promote windows terminal diff ansi16 to truecolor (#13016)
## Summary

- Promote ANSI-16 to truecolor for diff rendering when running inside
Windows Terminal
- Respect explicit `FORCE_COLOR` override, skipping promotion when set
- Extract a pure `diff_color_level_for_terminal` function for
testability
- Strip background tints from ANSI-16 diff output, rendering add/delete
lines with foreground color only
- Introduce `RichDiffColorLevel` to type-safely restrict background
fills to truecolor and ansi256

## Problem

Windows Terminal fully supports 24-bit (truecolor) rendering but often
does not provide the usual TERM metadata (`TERM`, `TERM_PROGRAM`,
`COLORTERM`) in `cmd.exe`/PowerShell sessions. In those environments,
`supports-color` can report only ANSI-16 support. The diff renderer
therefore falls back to a 16-color palette, producing washed-out,
hard-to-read diffs.

The screenshots below demonstrate that both PowerShell and cmd.exe don't
set any `*TERM*` environment variables.

| PowerShell | cmd.exe |
|---|---|
| <img width="2032" height="1162" alt="SCR-20260226-nfvy"
src="https://github.com/user-attachments/assets/59e968cc-4add-4c7b-a415-07163297e86a"
/> | <img width="2032" height="1162" alt="SCR-20260226-nfyc"
src="https://github.com/user-attachments/assets/d06b3e39-bf91-4ce3-9705-82bf9563a01b"
/> |


## Mental model

`StdoutColorLevel` (from `supports-color`) is the _detected_ capability.
`DiffColorLevel` is the _intended_ capability for diff rendering. A new
intermediary — `diff_color_level_for_terminal` — maps one to the other
and is the single place where terminal-specific overrides live.

Windows Terminal is detected two independent ways: the `TerminalName`
parsed by `terminal_info()` and the raw presence of `WT_SESSION`. When
`WT_SESSION` is present and `FORCE_COLOR` is not set, we promote
unconditionally to truecolor. When `WT_SESSION` is absent but
`TerminalName::WindowsTerminal` is detected, we promote only the ANSI-16
level (not `Unknown`).

A single override helper — `has_force_color_override()` — checks whether
`FORCE_COLOR` is set. When it is, both the `WT_SESSION` fast-path and
the `TerminalName`-based promotion are suppressed, preserving explicit
user intent.

| PowerShell | cmd.exe | WSL | Bash for Windows |
|---|---|---|---|
|
![SCR-20260226-msrh](https://github.com/user-attachments/assets/0f6297a6-4241-4dbf-b7ff-cf02da8941b0)
|
![SCR-20260226-nbao](https://github.com/user-attachments/assets/bb5ff8a9-903c-4677-a2de-1f6e1f34b18e)
|
![SCR-20260226-nbej](https://github.com/user-attachments/assets/26ecec2c-a7e9-410a-8702-f73995b490a6)
|
![SCR-20260226-nbkz](https://github.com/user-attachments/assets/80c4bf9a-3b41-40e1-bc87-f5c565f96075)
|

## Non-goals

- This does not change color detection for anything outside the diff
renderer (e.g. the chat widget, markdown rendering).
- This does not add a user-facing config knob; `FORCE_COLOR` already
serves that role.

## Tradeoffs

- The `has_wt_session` signal is intentionally kept separate from
`TerminalName::WindowsTerminal`. `terminal_info()` is derived with
`TERM_PROGRAM` precedence, so it can differ from raw `WT_SESSION`.
- Real-world validation in this issue: in both `cmd.exe` and PowerShell,
`TERM`/`TERM_PROGRAM`/`COLORTERM` were absent, so TERM-based capability
hints were unavailable in those sessions.
- Checking `FORCE_COLOR` for presence rather than parsing its value is a
simplification. In practice `supports-color` has already parsed it, so
our check is a coarse "did the user set _anything_?" gate. The effective
color level still comes from `supports-color`.
- When `WT_SESSION` is present without `FORCE_COLOR`, we promote to
truecolor regardless of `stdout_level` (including `Unknown`). This is
aggressive but correct: `WT_SESSION` is a strong signal that we're in
Windows Terminal.
- ANSI-16 add/delete backgrounds (bright green/red) overpower
syntax-highlighted token colors, making diffs harder to read.
Foreground-only cues (colored text, gutter signs) preserve readability
on low-color terminals.

## Architecture

```
stdout_color_level()  ──┐
terminal_info().name  ──┤
WT_SESSION presence   ──┼──▶ diff_color_level_for_terminal() ──▶ DiffColorLevel
FORCE_COLOR presence  ──┘                                            │
                                                                     ▼
                                                          RichDiffColorLevel::from_diff_color_level()
                                                                     │
                                                          ┌──────────┴──────────┐
                                                          │ Some(TrueColor|256) │ → bg tints
                                                          │ None (Ansi16)       │ → fg only
                                                          └─────────────────────┘
```

`diff_color_level()` is the environment-reading entry point; it gathers
the four runtime signals and delegates to the pure, testable
`diff_color_level_for_terminal()`.

## Observability

No new logs or metrics. Incorrect color selection is immediately visible
as broken diff rendering; the test suite covers the decision matrix
exhaustively.

## Tests

Six new unit tests exercise every branch of
`diff_color_level_for_terminal`:

| Test | Inputs | Expected |
|------|--------|----------|
| `windows_terminal_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
WindowsTerminal name | TrueColor |
| `wt_session_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
WT_SESSION only | TrueColor |
| `non_windows_terminal_keeps_ansi16_diff_palette` | Ansi16 + WezTerm |
Ansi16 |
| `wt_session_promotes_unknown_color_level_to_truecolor` | Unknown +
WT_SESSION | TrueColor |
| `explicit_force_override_keeps_ansi16_on_windows_terminal` | Ansi16 +
WindowsTerminal + FORCE_COLOR | Ansi16 |
| `explicit_force_override_keeps_ansi256_on_windows_terminal` | Ansi256
+ WT_SESSION + FORCE_COLOR | Ansi256 |
| `ansi16_add_style_uses_foreground_only` | Dark + Ansi16 | fg=Green,
bg=None |
| (and any other new snapshot/assertion tests from commits d757fee and
d7c78b3) | | |

## Test plan

- [x] Verify all new unit tests pass (`cargo test -p codex-tui --lib`)
- [x] On Windows Terminal: confirm diffs render with truecolor
backgrounds
- [x] On Windows Terminal with `FORCE_COLOR` set: confirm promotion is
disabled and output follows the forced `supports-color` level
- [x] On macOS/Linux terminals: confirm no behavior change

Fixes https://github.com/openai/codex/issues/12904 
Fixes https://github.com/openai/codex/issues/12890
Fixes https://github.com/openai/codex/issues/12912
Fixes https://github.com/openai/codex/issues/12840
2026-02-27 10:45:59 -07:00
Michael Bolin
d09a7535ed
fix: use AbsolutePathBuf for permission profile file roots (#12970)
## Why
`PermissionProfile` should describe filesystem roots as absolute paths
at the type level. Using `PathBuf` in `FileSystemPermissions` made the
shared type too permissive and blurred together three different
deserialization cases:

- skill metadata in `agents/openai.yaml`, where relative paths should
resolve against the skill directory
- app-server API payloads, where callers should have to send absolute
paths
- local tool-call payloads for commands like `shell_command` and
`exec_command`, where `additional_permissions.file_system` may
legitimately be relative to the command `workdir`

This change tightens the shared model without regressing the existing
local command flow.

## What Changed
- changed `protocol::models::FileSystemPermissions` and the app-server
`AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
- wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
relative permission roots in `agents/openai.yaml` resolve against the
containing skill directory
- kept app-server/API deserialization strict, so relative
`additionalPermissions.fileSystem.*` paths are rejected at the boundary
- restored cwd/workdir-relative deserialization for local tool-call
payloads by parsing `shell`, `shell_command`, and `exec_command`
arguments under an `AbsolutePathBufGuard` rooted at the resolved command
working directory
- simplified runtime additional-permission normalization so it only
canonicalizes and deduplicates absolute roots instead of trying to
recover relative ones later
- updated the app-server schema fixtures, `app-server/README.md`, and
the affected transport/TUI tests to match the final behavior
2026-02-27 17:42:52 +00:00
Michael Bolin
e6cd75a684
notify: include client in legacy hook payload (#12968)
## Why

The `notify` hook payload did not identify which Codex client started
the turn. That meant downstream notification hooks could not distinguish
between completions coming from the TUI and completions coming from
app-server clients such as VS Code or Xcode. Now that the Codex App
provides its own desktop notifications, it would be nice to be able to
filter those out.

This change adds that context without changing the existing payload
shape for callers that do not know the client name, and keeps the new
end-to-end test cross-platform.

## What changed

- added an optional top-level `client` field to the legacy `notify` JSON
payload
- threaded that value through `core` and `hooks`; the internal session
and turn state now carries it as `app_server_client_name`
- set the field to `codex-tui` for TUI turns
- captured `initialize.clientInfo.name` in the app server and applied it
to subsequent turns before dispatching hooks
- replaced the notify integration test hook with a `python3` script so
the test does not rely on Unix shell permissions or `bash`
- documented the new field in `docs/config.md`

## Testing

- `cargo test -p codex-hooks`
- `cargo test -p codex-tui`
- `cargo test -p codex-app-server
suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
-- --exact --nocapture`
- `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
still has unrelated existing failures in this environment)

## Docs

The public config reference on `developers.openai.com/codex` should
mention that the legacy `notify` payload may include a top-level
`client` field. The TUI reports `codex-tui`, and the app server reports
`initialize.clientInfo.name` when it is available.
2026-02-26 22:27:34 -08:00
Ahmed Ibrahim
4d180ae428
Add model availability NUX metadata (#12972)
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
2026-02-26 22:02:57 -08:00
Eric Traut
cee009d117
Add oauth_resource handling for MCP login flows (#12866)
Addresses bug https://github.com/openai/codex/issues/12589

Builds on community PR #12763.

This adds `oauth_resource` support for MCP `streamable_http` servers and
wires it through the relevant config and login paths. It fixes the bug
where the configured OAuth resource was not reliably included in the
authorization request, causing MCP login to omit the expected
`resource` parameter.
2026-02-26 20:10:12 -08:00
Ahmed Ibrahim
f90e97e414
Add realtime audio device picker (#12850)
## Summary
- add a dedicated /audio picker for realtime microphone and speaker
selection
- persist realtime audio choices and prompt to restart only local audio
when voice is live
- add snapshot coverage for the new picker surfaces

## Validation
- cargo test -p codex-tui
- cargo insta accept
- just fix -p codex-tui
- just fmt
2026-02-26 17:27:44 -08:00
Celia Chen
90cc4e79a2
feat: add local date/timezone to turn environment context (#12947)
## Summary

This PR includes the session's local date and timezone in the
model-visible environment context and persists that data in
`TurnContextItem`.

  ## What changed
- captures the current local date and IANA timezone when building a turn
context, with a UTC fallback if the timezone lookup fails
- includes current_date and timezone in the serialized
<environment_context> payload
- stores those fields on TurnContextItem so they survive rollout/history
handling, subagent review threads, and resume flows
- treats date/timezone changes as environment updates, so prompt caching
and context refresh logic do not silently reuse stale time context
- updates tests to validate the new environment fields without depending
on a single hardcoded environment-context string

## test

built a local build and saw it in the rollout file:
```
{"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n  <shell>zsh</shell>\n  <current_date>2026-02-26</current_date>\n  <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
```
2026-02-26 23:17:35 +00:00
Ahmed Ibrahim
a0e86c69fe
Add realtime audio device config (#12849)
## Summary
- add top-level realtime audio config for microphone and speaker
selection
- apply configured devices when starting realtime capture and playback
- keep missing-device behavior on the system default fallback path

## Validation
- just write-config-schema
- cargo test -p codex-core realtime_audio
- cargo test -p codex-tui
- just fix -p codex-core
- just fix -p codex-tui
- just fmt

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-26 15:08:21 -08:00
pakrym-oai
951a389654
Allow clients not to send summary as an option (#12950)
Summary is a required parameter on UserTurn. Ideally we'd like the core
to decide the appropriate summary level.

Make the summary optional and don't send it when not needed.
2026-02-26 14:37:38 -08:00
Charley Cunningham
c1afb8815a
tui: use thread_id for resume/fork cwd resolution (#12727)
## Summary
- make resume/fork targets explicit and typed as `SessionTarget { path,
thread_id }` (non-optional `thread_id`)
- resolve `thread_id` centrally via `resolve_session_thread_id(...)`:
- use CLI input directly when it is a UUID (`--resume <uuid>` / `--fork
<uuid>`)
- otherwise read `thread_id` from rollout `SessionMeta` for path-based
selections (picker, `--resume-last`, name-based resume/fork)
- use `thread_id` to read cwd from SQLite first during resume/fork cwd
resolution
- keep rollout fallback for cwd resolution when SQLite is unavailable or
does not return thread metadata (`TurnContext` tail, then `SessionMeta`)
- keep the resume picker open when a selected row has unreadable session
metadata, and show an inline recoverable error instead of aborting the
TUI

## Why
This removes ad-hoc rollout filename parsing and makes resume/fork
target identity explicit. The resume/fork cwd check can use indexed
SQLite lookup by `thread_id` in the common path, while preserving
rollout-based fallback behavior. It also keeps malformed legacy rows
recoverable in the picker instead of letting a selection failure unwind
the app.

## Notes
- minimal TUI-only change; no schema/protocol changes
- includes TUI test coverage for SQLite cwd precedence when `thread_id`
is available
- includes TUI regression coverage for picker inline error rendering /
non-fatal unreadable session rows

## Codex author
`codex resume 019c9205-7f8b-7173-a2a2-f082d4df3de3`
2026-02-26 12:52:31 -08:00
pakrym-oai
717cbe354f
Remove noisy log (#12929)
This log message floods logs on windows
2026-02-26 11:34:14 -08:00
pakrym-oai
ba41e84a50
Use model catalog default for reasoning summary fallback (#12873)
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata

## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-26 09:31:13 -08:00
jif-oai
f0a85ded18
fix: ctrl c sub agent (#12911) 2026-02-26 17:06:20 +00:00
pash-openai
6acede5a28
tui: restore visible line numbers for hidden file links (#12870)
we recently changed file linking so the model uses markdown links when
it wants something to be clickable.

This works well across the GUI surfaces because they can render markdown
cleanly and use the full absolute path in the anchor target.

A previous pass hid the absolute path in the TUI (and only showed the
label), but that also meant we could lose useful location info when the
model put the line number or range in the anchor target instead of the
label.

This follow-up keeps the TUI behavior simple while making local file
links feel closer to the old TUI file reference style.

key changes:
- Local markdown file links in the TUI keep the old file-ref feel: code
styling, no underline, no visible absolute path.
- If the hidden local anchor target includes a location suffix and the
label does not already include one, we append that suffix to the visible
label.
- This works for single lines, line/column references, and ranges.
- If the label already includes the location, we leave it alone.
- normal web links keep the old TUI markdown-link behavior

some examples:
- `[foo.rs](/abs/path/foo.rs)` renders as `foo.rs`
- `[foo.rs](/abs/path/foo.rs:45)` renders as `foo.rs:45`
- `[foo.rs](/abs/path/foo.rs:45:3-48:9)` renders as `foo.rs:45:3-48:9`
- `[foo.rs:45](/abs/path/foo.rs:45)` stays `foo.rs:45`
- `[docs](https://example.com/docs)` still renders like a normal web
link

how it looks:
<img width="732" height="813" alt="Screenshot 2026-02-26 at 9 27 55 AM"
src="https://github.com/user-attachments/assets/d51bf236-653a-4e83-96e4-9427f0804471"
/>
2026-02-26 10:29:54 +00:00
Eric Traut
28bfbb8f2b
Enforce user input length cap (#12823)
Currently there is no bound on the length of a user message submitted in
the TUI or through the app server interface. That means users can paste
many megabytes of text, which can lead to bad performance, hangs, and
crashes. In extreme cases, it can lead to a [kernel
panic](https://github.com/openai/codex/issues/12323).

This PR limits the length of a user input to 2**20 (about 1M)
characters. This value was chosen because it fills the entire context
window on the latest models, so accepting longer inputs wouldn't make
sense anyway.

Summary
- add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
and surface it in TUI and app server code
- block oversized submissions in the TUI submit flow and emit error
history cells when validation fails
- reject heavy app-server requests with JSON-RPC `-32602` and structured
`input_too_large` data, plus document the behavior

Testing
- ran the IDE extension with this change and verified that when I
attempt to paste a user message that's several MB long, it correctly
reports an error instead of crashing or making my computer hot.
2026-02-25 22:23:51 -08:00
pash-openai
9a96b6f509
Hide local file link destinations in TUI markdown (#12705)
## Summary
- hide appended destinations for local path-style markdown links in the
TUI renderer
- keep web links rendering with their visible destination and style link
labels consistently
- add markdown renderer tests and a snapshot for the new file-link
output

## Testing
- just fmt
- cargo test -p codex-tui
<img width="1120" height="968" alt="image"
src="https://github.com/user-attachments/assets/490e8eda-ae47-4231-89fa-b254a1f83eed"
/>
2026-02-26 05:28:37 +00:00
Michael Bolin
14116ade8d
feat: include available decisions in command approval requests (#12758)
Command-approval clients currently infer which choices to show from
side-channel fields like `networkApprovalContext`,
`proposedExecpolicyAmendment`, and `additionalPermissions`. That makes
the request shape harder to evolve, and it forces each client to
replicate the server's heuristics instead of receiving the exact
decision list for the prompt.

This PR introduces a mapping between `CommandExecutionApprovalDecision`
and `codex_protocol::protocol::ReviewDecision`:

```rust
impl From<CoreReviewDecision> for CommandExecutionApprovalDecision {
    fn from(value: CoreReviewDecision) -> Self {
        match value {
            CoreReviewDecision::Approved => Self::Accept,
            CoreReviewDecision::ApprovedExecpolicyAmendment {
                proposed_execpolicy_amendment,
            } => Self::AcceptWithExecpolicyAmendment {
                execpolicy_amendment: proposed_execpolicy_amendment.into(),
            },
            CoreReviewDecision::ApprovedForSession => Self::AcceptForSession,
            CoreReviewDecision::NetworkPolicyAmendment {
                network_policy_amendment,
            } => Self::ApplyNetworkPolicyAmendment {
                network_policy_amendment: network_policy_amendment.into(),
            },
            CoreReviewDecision::Abort => Self::Cancel,
            CoreReviewDecision::Denied => Self::Decline,
        }
    }
}
```

And updates `CommandExecutionRequestApprovalParams` to have a new field:

```rust
available_decisions: Option<Vec<CommandExecutionApprovalDecision>>
```

when, if specified, should make it easier for clients to display an
appropriate list of options in the UI.

This makes it possible for `CoreShellActionProvider::prompt()` in
`unix_escalation.rs` to specify the `Vec<ReviewDecision>` directly,
adding support for `ApprovedForSession` when approving a skill script,
which was previously missing in the TUI.

Note this results in a significant change to `exec_options()` in
`approval_overlay.rs`, as the displayed options are now derived from
`available_decisions: &[ReviewDecision]`.

## What Changed

- Add `available_decisions` to
[`ExecApprovalRequestEvent`](de00e932dd/codex-rs/protocol/src/approvals.rs (L111-L175)),
including helpers to derive the legacy default choices when older
senders omit the field.
- Map `codex_protocol::protocol::ReviewDecision` to app-server
`CommandExecutionApprovalDecision` and expose the ordered list as
experimental `availableDecisions` in
[`CommandExecutionRequestApprovalParams`](de00e932dd/codex-rs/app-server-protocol/src/protocol/v2.rs (L3798-L3807)).
- Thread optional `available_decisions` through the core approval path
so Unix shell escalation can explicitly request `ApprovedForSession` for
session-scoped approvals instead of relying on client heuristics.
[`unix_escalation.rs`](de00e932dd/codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs (L194-L214))
- Update the TUI approval overlay to build its buttons from the ordered
decision list, while preserving the legacy fallback when
`available_decisions` is missing.
- Update the app-server README, test client output, and generated schema
artifacts to document and surface the new field.

## Testing

- Add `approval_overlay.rs` coverage for explicit decision lists,
including the generic `ApprovedForSession` path and network approval
options.
- Update `chatwidget/tests.rs` and app-server protocol tests to populate
the new optional field and keep older event shapes working.

## Developers Docs

- If we document `item/commandExecution/requestApproval` on
[developers.openai.com/codex](https://developers.openai.com/codex), add
experimental `availableDecisions` as the preferred source of approval
choices and note that older servers may omit it.
2026-02-26 01:10:46 +00:00
Celia Chen
4f45668106
Revert "Add skill approval event/response (#12633)" (#12811)
This reverts commit https://github.com/openai/codex/pull/12633. We no
longer need this PR, because we favor sending normal exec command
approval server request with `additional_permissions` of skill
permissions instead
2026-02-26 01:02:42 +00:00
Ahmed Ibrahim
e76b1a2853
Remove steer feature flag (#12026)
All code should go in the direction that steer is enabled

---------

Co-authored-by: Codex <noreply@openai.com>
2026-02-25 15:41:42 -08:00
Charley Cunningham
2f4d6ded1d
Enable request_user_input in Default mode (#12735)
## Summary
- allow `request_user_input` in Default collaboration mode as well as
Plan
- update the Default-mode instructions to prefer assumptions first and
use `request_user_input` only when a question is unavoidable
- update request_user_input and app-server tests to match the new
Default-mode behavior
- refactor collaboration-mode availability plumbing into
`CollaborationModesConfig` for future mode-related flags

## Codex author
`codex resume 019c9124-ed28-7c13-96c6-b916b1c97d49`
2026-02-25 15:20:46 -08:00
sayan-oai
d45ffd5830
make 5.3-codex visible in cli for api users (#12808)
5.3-codex released in api, mark it visible for API users via bundled
`models.json`.
2026-02-25 13:01:40 -08:00
Eric Traut
f6fdfbeb98
Update Codex docs success link (#12805)
Fix a stale documentation link in the sign-in flow
2026-02-25 12:02:41 -08:00
Owen Lin
a0fd94bde6
feat(app-server): add ThreadItem::DynamicToolCall (#12732)
Previously, clients would call `thread/start` with dynamic_tools set,
and when a model invokes a dynamic tool, it would just make the
server->client `item/tool/call` request and wait for the client's
response to complete the tool call. This works, but it doesn't have an
`item/started` or `item/completed` event.

Now we are doing this:
- [new] emit `item/started` with `DynamicToolCall` populated with the
call arguments
- send an `item/tool/call` server request
- [new] once the client responds, emit `item/completed` with
`DynamicToolCall` populated with the response.

Also, with `persistExtendedHistory: true`, dynamic tool calls are now
reconstructable in `thread/read` and `thread/resume` as
`ThreadItem::DynamicToolCall`.
2026-02-25 12:00:10 -08:00