Replay pending client requests after `thread/resume` and emit resolved
notifications when those requests clear so approval/input UI state stays
in sync after reconnects and across subscribed clients.
Affected RPCs:
- `item/commandExecution/requestApproval`
- `item/fileChange/requestApproval`
- `item/tool/requestUserInput`
Motivation:
- Resumed clients need to see pending approval/input requests that were
already outstanding before the reconnect.
- Clients also need an explicit signal when a pending request resolves
or is cleared so stale UI can be removed on turn start, completion, or
interruption.
Implementation notes:
- Use pending client requests from `OutgoingMessageSender` in order to
replay them after `thread/resume` attaches the connection, using
original request ids.
- Emit `serverRequest/resolved` when pending requests are answered
or cleared by lifecycle cleanup.
- Update the app-server protocol schema, generated TypeScript bindings,
and README docs for the replay/resolution flow.
High-level test plan:
- Added automated coverage for replaying pending command execution and
file change approval requests on `thread/resume`.
- Added automated coverage for resolved notifications in command
approval, file change approval, request_user_input, turn start, and turn
interrupt flows.
- Verified schema/docs updates in the relevant protocol and app-server
tests.
Manual testing:
- Tested reconnect/resume with multiple connections.
- Confirmed state stayed in sync between connections.
## Why
CI has been intermittently failing in
`suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
because these running-thread resume tests treated `turn/started` as
proof that the thread was already active.
That signal is too early for this path. `turn/started` is emitted
optimistically from
[`turn_start`](1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L5757-L5767)).
In `single_client_mode`, the listener skips `current_turn_history`
tracking in
[`codex_message_processor.rs`](1103d0037e/codex-rs/app-server/src/codex_message_processor.rs (L6461-L6465)),
so running-thread resume still depends on `ThreadWatchManager` observing
the core `TurnStarted` event in
[`bespoke_event_handling.rs`](1103d0037e/codex-rs/app-server/src/bespoke_event_handling.rs (L152-L156)).
If `thread/resume` lands in that window, the thread can still look
`Idle` and the assertion flakes.
## What
- Add a helper in `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
that waits for `thread/status/changed` to report `Active` for the target
thread.
- Use that public v2 notification as the synchronization barrier in the
four running-thread resume tests instead of relying on `turn/started`.
## Follow-up
This PR keeps the fix at the test layer so we can remove the flake
without changing server behavior. A broader runtime fix should still be
considered separately, for example:
- make `turn/start` eagerly transition the thread to `Active` so
`turn/started` and `thread/status/changed` are coherent
- or revisit the `single_client_mode` guard that skips current-turn
tracking for running-thread resume
## Testing
- `cargo test -p codex-app-server thread_resume -- --nocapture`
- `for i in $(seq 1 10); do cargo test -p codex-app-server
'suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch'
-- --exact --nocapture; done`
## Summary
- Promote ANSI-16 to truecolor for diff rendering when running inside
Windows Terminal
- Respect explicit `FORCE_COLOR` override, skipping promotion when set
- Extract a pure `diff_color_level_for_terminal` function for
testability
- Strip background tints from ANSI-16 diff output, rendering add/delete
lines with foreground color only
- Introduce `RichDiffColorLevel` to type-safely restrict background
fills to truecolor and ansi256
## Problem
Windows Terminal fully supports 24-bit (truecolor) rendering but often
does not provide the usual TERM metadata (`TERM`, `TERM_PROGRAM`,
`COLORTERM`) in `cmd.exe`/PowerShell sessions. In those environments,
`supports-color` can report only ANSI-16 support. The diff renderer
therefore falls back to a 16-color palette, producing washed-out,
hard-to-read diffs.
The screenshots below demonstrate that both PowerShell and cmd.exe don't
set any `*TERM*` environment variables.
| PowerShell | cmd.exe |
|---|---|
| <img width="2032" height="1162" alt="SCR-20260226-nfvy"
src="https://github.com/user-attachments/assets/59e968cc-4add-4c7b-a415-07163297e86a"
/> | <img width="2032" height="1162" alt="SCR-20260226-nfyc"
src="https://github.com/user-attachments/assets/d06b3e39-bf91-4ce3-9705-82bf9563a01b"
/> |
## Mental model
`StdoutColorLevel` (from `supports-color`) is the _detected_ capability.
`DiffColorLevel` is the _intended_ capability for diff rendering. A new
intermediary — `diff_color_level_for_terminal` — maps one to the other
and is the single place where terminal-specific overrides live.
Windows Terminal is detected two independent ways: the `TerminalName`
parsed by `terminal_info()` and the raw presence of `WT_SESSION`. When
`WT_SESSION` is present and `FORCE_COLOR` is not set, we promote
unconditionally to truecolor. When `WT_SESSION` is absent but
`TerminalName::WindowsTerminal` is detected, we promote only the ANSI-16
level (not `Unknown`).
A single override helper — `has_force_color_override()` — checks whether
`FORCE_COLOR` is set. When it is, both the `WT_SESSION` fast-path and
the `TerminalName`-based promotion are suppressed, preserving explicit
user intent.
| PowerShell | cmd.exe | WSL | Bash for Windows |
|---|---|---|---|
|

|

|

|

|
## Non-goals
- This does not change color detection for anything outside the diff
renderer (e.g. the chat widget, markdown rendering).
- This does not add a user-facing config knob; `FORCE_COLOR` already
serves that role.
## Tradeoffs
- The `has_wt_session` signal is intentionally kept separate from
`TerminalName::WindowsTerminal`. `terminal_info()` is derived with
`TERM_PROGRAM` precedence, so it can differ from raw `WT_SESSION`.
- Real-world validation in this issue: in both `cmd.exe` and PowerShell,
`TERM`/`TERM_PROGRAM`/`COLORTERM` were absent, so TERM-based capability
hints were unavailable in those sessions.
- Checking `FORCE_COLOR` for presence rather than parsing its value is a
simplification. In practice `supports-color` has already parsed it, so
our check is a coarse "did the user set _anything_?" gate. The effective
color level still comes from `supports-color`.
- When `WT_SESSION` is present without `FORCE_COLOR`, we promote to
truecolor regardless of `stdout_level` (including `Unknown`). This is
aggressive but correct: `WT_SESSION` is a strong signal that we're in
Windows Terminal.
- ANSI-16 add/delete backgrounds (bright green/red) overpower
syntax-highlighted token colors, making diffs harder to read.
Foreground-only cues (colored text, gutter signs) preserve readability
on low-color terminals.
## Architecture
```
stdout_color_level() ──┐
terminal_info().name ──┤
WT_SESSION presence ──┼──▶ diff_color_level_for_terminal() ──▶ DiffColorLevel
FORCE_COLOR presence ──┘ │
▼
RichDiffColorLevel::from_diff_color_level()
│
┌──────────┴──────────┐
│ Some(TrueColor|256) │ → bg tints
│ None (Ansi16) │ → fg only
└─────────────────────┘
```
`diff_color_level()` is the environment-reading entry point; it gathers
the four runtime signals and delegates to the pure, testable
`diff_color_level_for_terminal()`.
## Observability
No new logs or metrics. Incorrect color selection is immediately visible
as broken diff rendering; the test suite covers the decision matrix
exhaustively.
## Tests
Six new unit tests exercise every branch of
`diff_color_level_for_terminal`:
| Test | Inputs | Expected |
|------|--------|----------|
| `windows_terminal_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
WindowsTerminal name | TrueColor |
| `wt_session_promotes_ansi16_to_truecolor_for_diffs` | Ansi16 +
WT_SESSION only | TrueColor |
| `non_windows_terminal_keeps_ansi16_diff_palette` | Ansi16 + WezTerm |
Ansi16 |
| `wt_session_promotes_unknown_color_level_to_truecolor` | Unknown +
WT_SESSION | TrueColor |
| `explicit_force_override_keeps_ansi16_on_windows_terminal` | Ansi16 +
WindowsTerminal + FORCE_COLOR | Ansi16 |
| `explicit_force_override_keeps_ansi256_on_windows_terminal` | Ansi256
+ WT_SESSION + FORCE_COLOR | Ansi256 |
| `ansi16_add_style_uses_foreground_only` | Dark + Ansi16 | fg=Green,
bg=None |
| (and any other new snapshot/assertion tests from commits d757fee and
d7c78b3) | | |
## Test plan
- [x] Verify all new unit tests pass (`cargo test -p codex-tui --lib`)
- [x] On Windows Terminal: confirm diffs render with truecolor
backgrounds
- [x] On Windows Terminal with `FORCE_COLOR` set: confirm promotion is
disabled and output follows the forced `supports-color` level
- [x] On macOS/Linux terminals: confirm no behavior change
Fixes https://github.com/openai/codex/issues/12904
Fixes https://github.com/openai/codex/issues/12890
Fixes https://github.com/openai/codex/issues/12912
Fixes https://github.com/openai/codex/issues/12840
## Why
`PermissionProfile` should describe filesystem roots as absolute paths
at the type level. Using `PathBuf` in `FileSystemPermissions` made the
shared type too permissive and blurred together three different
deserialization cases:
- skill metadata in `agents/openai.yaml`, where relative paths should
resolve against the skill directory
- app-server API payloads, where callers should have to send absolute
paths
- local tool-call payloads for commands like `shell_command` and
`exec_command`, where `additional_permissions.file_system` may
legitimately be relative to the command `workdir`
This change tightens the shared model without regressing the existing
local command flow.
## What Changed
- changed `protocol::models::FileSystemPermissions` and the app-server
`AdditionalFileSystemPermissions` mirror to use `AbsolutePathBuf`
- wrapped skill metadata deserialization in `AbsolutePathBufGuard`, so
relative permission roots in `agents/openai.yaml` resolve against the
containing skill directory
- kept app-server/API deserialization strict, so relative
`additionalPermissions.fileSystem.*` paths are rejected at the boundary
- restored cwd/workdir-relative deserialization for local tool-call
payloads by parsing `shell`, `shell_command`, and `exec_command`
arguments under an `AbsolutePathBufGuard` rooted at the resolved command
working directory
- simplified runtime additional-permission normalization so it only
canonicalizes and deduplicates absolute roots instead of trying to
recover relative ones later
- updated the app-server schema fixtures, `app-server/README.md`, and
the affected transport/TUI tests to match the final behavior
## Why
The `notify` hook payload did not identify which Codex client started
the turn. That meant downstream notification hooks could not distinguish
between completions coming from the TUI and completions coming from
app-server clients such as VS Code or Xcode. Now that the Codex App
provides its own desktop notifications, it would be nice to be able to
filter those out.
This change adds that context without changing the existing payload
shape for callers that do not know the client name, and keeps the new
end-to-end test cross-platform.
## What changed
- added an optional top-level `client` field to the legacy `notify` JSON
payload
- threaded that value through `core` and `hooks`; the internal session
and turn state now carries it as `app_server_client_name`
- set the field to `codex-tui` for TUI turns
- captured `initialize.clientInfo.name` in the app server and applied it
to subsequent turns before dispatching hooks
- replaced the notify integration test hook with a `python3` script so
the test does not rely on Unix shell permissions or `bash`
- documented the new field in `docs/config.md`
## Testing
- `cargo test -p codex-hooks`
- `cargo test -p codex-tui`
- `cargo test -p codex-app-server
suite::v2::initialize::turn_start_notify_payload_includes_initialize_client_name
-- --exact --nocapture`
- `cargo test -p codex-core` (`src/lib.rs` passed; `core/tests/all.rs`
still has unrelated existing failures in this environment)
## Docs
The public config reference on `developers.openai.com/codex` should
mention that the legacy `notify` payload may include a top-level
`client` field. The TUI reports `codex-tui`, and the app server reports
`initialize.clientInfo.name` when it is available.
- replace show_nux with structured availability_nux model metadata
- expose availability NUX data through the app-server model API
- update shared fixtures and tests for the new field
- Update the cloud requirements cache TTL to 30 minutes.
- Add a background job to refresh the cache every 5 minutes.
- Ensure there is only one refresh job per process.
Addresses bug https://github.com/openai/codex/issues/12589
Builds on community PR #12763.
This adds `oauth_resource` support for MCP `streamable_http` servers and
wires it through the relevant config and login paths. It fixes the bug
where the configured OAuth resource was not reliably included in the
authorization request, causing MCP login to omit the expected
`resource` parameter.
## Summary
This changes `custom_tool_call_output` to use the same output payload
shape as `function_call_output`, so freeform tools can return either
plain text or structured content items.
The main goal is to let `js_repl` return image content from nested
`view_image` calls in its own `custom_tool_call_output`, instead of
relying on a separate injected message.
## What changed
- Changed `custom_tool_call_output.output` from `string` to
`FunctionCallOutputPayload`
- Updated freeform tool plumbing to preserve structured output bodies
- Updated `js_repl` to aggregate nested tool content items and attach
them to the outer `js_repl` result
- Removed the old `js_repl` special case that injected `view_image`
results as a separate pending user image message
- Updated normalization/history/truncation paths to handle multimodal
`custom_tool_call_output`
- Regenerated app-server protocol schema artifacts
## Behavior
Direct `view_image` calls still return a `function_call_output` with
image content.
When `view_image` is called inside `js_repl`, the outer `js_repl`
`custom_tool_call_output` now carries:
- an `input_text` item if the JS produced text output
- one or more `input_image` items from nested tool results
So the nested image result now stays inside the `js_repl` tool output
instead of being injected as a separate message.
## Compatibility
This is intended to be backward-compatible for resumed conversations.
Older histories that stored `custom_tool_call_output.output` as a plain
string still deserialize correctly, and older histories that used the
previous injected-image-message flow also continue to resume.
Added regression coverage for resuming a pre-change rollout containing:
- string-valued `custom_tool_call_output`
- legacy injected image message history
#### [git stack](https://github.com/magus/git-stack-cli)
- 👉 `1` https://github.com/openai/codex/pull/12948
## Summary
- add a dedicated /audio picker for realtime microphone and speaker
selection
- persist realtime audio choices and prompt to restart only local audio
when voice is live
- add snapshot coverage for the new picker surfaces
## Validation
- cargo test -p codex-tui
- cargo insta accept
- just fix -p codex-tui
- just fmt
## Summary\n- add a websocket test-server request waiter so tests can
synchronize on recorded client messages\n- use that waiter in the
realtime delegation test instead of a fixed audio timeout\n- add
temporary timing logs in the test and websocket mock to inspect where
the flake stalls
## Summary
This PR includes the session's local date and timezone in the
model-visible environment context and persists that data in
`TurnContextItem`.
## What changed
- captures the current local date and IANA timezone when building a turn
context, with a UTC fallback if the timezone lookup fails
- includes current_date and timezone in the serialized
<environment_context> payload
- stores those fields on TurnContextItem so they survive rollout/history
handling, subagent review threads, and resume flows
- treats date/timezone changes as environment updates, so prompt caching
and context refresh logic do not silently reuse stale time context
- updates tests to validate the new environment fields without depending
on a single hardcoded environment-context string
## test
built a local build and saw it in the rollout file:
```
{"timestamp":"2026-02-26T21:39:50.737Z","type":"response_item","payload":{"type":"message","role":"user","content":[{"type":"input_text","text":"<environment_context>\n <shell>zsh</shell>\n <current_date>2026-02-26</current_date>\n <timezone>America/Los_Angeles</timezone>\n</environment_context>"}]}}
```
## Why
`unix_escalation.rs` had a large inline `mod tests` block that made the
implementation harder to scan. This change moves those tests into a
sibling file while keeping them as a child module, so they can still
exercise private items without widening visibility.
## What Changed
- replaced the inline `#[cfg(test)] mod tests` block in
`codex-rs/core/src/tools/runtimes/shell/unix_escalation.rs` with a
path-based test module declaration
- moved the existing unit tests into
`codex-rs/core/src/tools/runtimes/shell/unix_escalation_tests.rs`
- kept the extracted tests using `super::...` imports so they continue
to access private helpers and types from `unix_escalation.rs`
## Testing
- `cargo test -p codex-core unix_escalation::tests`
## Why
`codex features list` currently prints features in declaration order
from `codex_core::features::FEATURES`. That makes the output harder to
scan when looking for a specific flag, and the order can change for
reasons unrelated to the CLI.
## What changed
- Sort the `codex features list` rows by feature key before printing
them in `codex-rs/cli/src/main.rs`.
- Add an integration test in `codex-rs/cli/tests/features.rs` that runs
`codex features list` and asserts the feature-name column is
alphabetized.
## Verification
- Added `features_list_is_sorted_alphabetically_by_feature_name`.
- Ran `cargo test -p codex-cli`.
Summary is a required parameter on UserTurn. Ideally we'd like the core
to decide the appropriate summary level.
Make the summary optional and don't send it when not needed.
## Summary
- make resume/fork targets explicit and typed as `SessionTarget { path,
thread_id }` (non-optional `thread_id`)
- resolve `thread_id` centrally via `resolve_session_thread_id(...)`:
- use CLI input directly when it is a UUID (`--resume <uuid>` / `--fork
<uuid>`)
- otherwise read `thread_id` from rollout `SessionMeta` for path-based
selections (picker, `--resume-last`, name-based resume/fork)
- use `thread_id` to read cwd from SQLite first during resume/fork cwd
resolution
- keep rollout fallback for cwd resolution when SQLite is unavailable or
does not return thread metadata (`TurnContext` tail, then `SessionMeta`)
- keep the resume picker open when a selected row has unreadable session
metadata, and show an inline recoverable error instead of aborting the
TUI
## Why
This removes ad-hoc rollout filename parsing and makes resume/fork
target identity explicit. The resume/fork cwd check can use indexed
SQLite lookup by `thread_id` in the common path, while preserving
rollout-based fallback behavior. It also keeps malformed legacy rows
recoverable in the picker instead of letting a selection failure unwind
the app.
## Notes
- minimal TUI-only change; no schema/protocol changes
- includes TUI test coverage for SQLite cwd precedence when `thread_id`
is available
- includes TUI regression coverage for picker inline error rendering /
non-fatal unreadable session rows
## Codex author
`codex resume 019c9205-7f8b-7173-a2a2-f082d4df3de3`
## Why
Before this change, an escalation approval could say that a command
should be rerun, but it could not carry the sandbox configuration that
should still apply when the escalated command is actually spawned.
That left an unsafe gap in the `zsh-fork` skill path: skill scripts
under `scripts/` that did not declare permissions could be escalated
without a sandbox, and scripts that did declare permissions could lose
their bounded sandbox on rerun or cached session approval.
This PR extends the escalation protocol so approvals can optionally
carry sandbox configuration all the way through execution. That lets the
shell runtime preserve the intended sandbox instead of silently widening
access.
We likely want a single permissions type for this codepath eventually,
probably centered on `Permissions`. For now, the protocol needs to
represent both the existing `PermissionProfile` form and the fuller
`Permissions` form, so this introduces a temporary disjoint union,
`EscalationPermissions`, to carry either one.
Further, this means that today, a skill either:
- does not declare any permissions, in which case it is run using the
default sandbox for the turn
- specifies permissions, in which case the skill is run using that exact
sandbox, which might be more restrictive than the default sandbox for
the turn
We will likely change the skill's permissions to be additive to the
existing permissions for the turn.
## What Changed
- Added `EscalationPermissions` to `codex-protocol` so escalation
requests can carry either a `PermissionProfile` or a full `Permissions`
payload.
- Added an explicit `EscalationExecution` mode to the shell escalation
protocol so reruns distinguish between `Unsandboxed`, `TurnDefault`, and
`Permissions(...)` instead of overloading `None`.
- Updated `zsh-fork` shell reruns to resolve `TurnDefault` at execution
time, which keeps ordinary `UseDefault` commands on the turn sandbox and
preserves turn-level macOS seatbelt profile extensions.
- Updated the `zsh-fork` skill path so a skill with no declared
permissions inherits the conversation's effective sandbox instead of
escalating unsandboxed.
- Updated the `zsh-fork` skill path so a skill with declared permissions
reruns with exactly those permissions, including when a cached session
approval is reused.
## Testing
- Added unit coverage in
`core/src/tools/runtimes/shell/unix_escalation.rs` for the explicit
`UseDefault` / `RequireEscalated` / `WithAdditionalPermissions`
execution mapping.
- Added unit coverage in
`core/src/tools/runtimes/shell/unix_escalation.rs` for macOS seatbelt
extension preservation in both the `TurnDefault` and
explicit-permissions rerun paths.
- Added integration coverage in `core/tests/suite/skill_approval.rs` for
permissionless skills inheriting the turn sandbox and explicit skill
permissions remaining bounded across cached approval reuse.
## Summary
- add tracing-based diagnostics for nested `codex.tool(...)` calls made
from `js_repl`
- emit a bounded, sanitized summary at `info!`
- emit the exact raw serialized response object or error string seen by
JavaScript at `trace!`
- document how to enable these logs and where to find them, especially
for `codex app-server`
## Why
Nested `codex.tool(...)` calls inside `js_repl` are a debugging
boundary: JavaScript sees the tool result, but that result is otherwise
hard to inspect from outside the kernel.
This change adds explicit tracing for that path using the repo’s normal
observability pattern:
- `info` for compact summaries
- `trace` for exact raw payloads when deep debugging is needed
## What changed
- `js_repl` now summarizes nested tool-call results across the response
shapes it can receive:
- message content
- function-call outputs
- custom tool outputs
- MCP tool results and MCP error results
- direct error strings
- each nested `codex.tool(...)` completion logs:
- `exec_id`
- `tool_call_id`
- `tool_name`
- `ok`
- a bounded summary struct describing the payload shape
- at `trace`, the same path also logs the exact serialized response
object or error string that JavaScript received
- docs now include concrete logging examples for `codex app-server`
- unit coverage was added for multimodal function output summaries and
error summaries
## How to use it
### Summary-only logging
Set:
```sh
RUST_LOG=codex_core::tools::js_repl=info
```
For `codex app-server`, tracing output is written to the server process
`stderr`.
Example:
```sh
RUST_LOG=codex_core::tools::js_repl=info \
LOG_FORMAT=json \
codex app-server \
2> /tmp/codex-app-server.log
```
This emits bounded summary lines for nested `codex.tool(...)` calls.
### Full raw debugging
Set:
```sh
RUST_LOG=codex_core::tools::js_repl=trace
```
Example:
```sh
RUST_LOG=codex_core::tools::js_repl=trace \
LOG_FORMAT=json \
codex app-server \
2> /tmp/codex-app-server.log
```
At `trace`, you get:
- the same `info` summary line
- a `trace` line with the exact serialized response object seen by
JavaScript
- or the exact error string if the nested tool call failed
### Where the logs go
For `codex app-server`, these logs go to process `stderr`, so redirect
or capture `stderr` to inspect them.
Example:
```sh
RUST_LOG=codex_core::tools::js_repl=trace \
LOG_FORMAT=json \
/Users/fjord/code/codex/codex-rs/target/debug/codex app-server \
2> /tmp/codex-app-server.log
```
Then inspect:
```sh
rg "js_repl nested tool call" /tmp/codex-app-server.log
```
Without an explicit `RUST_LOG` override, these `js_repl` nested
tool-call logs are typically not visible.
## Summary
- make `Config.model_reasoning_summary` optional so unset means use
model default
- resolve the optional config value to a concrete summary when building
`TurnContext`
- add protocol support for `default_reasoning_summary` in model metadata
## Validation
- `cargo test -p codex-core --lib client::tests -- --nocapture`
---------
Co-authored-by: Codex <noreply@openai.com>
Attempt to reduce disk usage in mac ci.
>off - This is the default for platforms with ELF binaries and
windows-gnu (not Windows MSVC and not macOS). This typically means that
DWARF debug information can be found in the final artifact in sections
of the executable. This option is not supported on Windows MSVC. On
macOS this options prevents the final execution of dsymutil to generate
debuginfo.
we recently changed file linking so the model uses markdown links when
it wants something to be clickable.
This works well across the GUI surfaces because they can render markdown
cleanly and use the full absolute path in the anchor target.
A previous pass hid the absolute path in the TUI (and only showed the
label), but that also meant we could lose useful location info when the
model put the line number or range in the anchor target instead of the
label.
This follow-up keeps the TUI behavior simple while making local file
links feel closer to the old TUI file reference style.
key changes:
- Local markdown file links in the TUI keep the old file-ref feel: code
styling, no underline, no visible absolute path.
- If the hidden local anchor target includes a location suffix and the
label does not already include one, we append that suffix to the visible
label.
- This works for single lines, line/column references, and ranges.
- If the label already includes the location, we leave it alone.
- normal web links keep the old TUI markdown-link behavior
some examples:
- `[foo.rs](/abs/path/foo.rs)` renders as `foo.rs`
- `[foo.rs](/abs/path/foo.rs:45)` renders as `foo.rs:45`
- `[foo.rs](/abs/path/foo.rs:45:3-48:9)` renders as `foo.rs:45:3-48:9`
- `[foo.rs:45](/abs/path/foo.rs:45)` stays `foo.rs:45`
- `[docs](https://example.com/docs)` still renders like a normal web
link
how it looks:
<img width="732" height="813" alt="Screenshot 2026-02-26 at 9 27 55 AM"
src="https://github.com/user-attachments/assets/d51bf236-653a-4e83-96e4-9427f0804471"
/>
## Summary
- bundle contextual prompt injection into at most one developer message
plus one contextual user message in both:
- per-turn settings updates
- initial context insertion
- preserve `<model_switch>` across compaction by rebuilding it through
canonical initial-context injection, instead of relying on
strip/reattach hacks
- centralize contextual user fragment detection in one shared definition
table and reuse it for parsing/compaction logic
- keep `AGENTS.md` in its natural serialized format:
- `# AGENTS.md instructions for {dirname}`
- `<INSTRUCTIONS>...</INSTRUCTIONS>`
- simplify related tests/helpers and accept the expected snapshot/layout
updates from bundled multi-part messages
## Why
The goal is to converge toward a simpler, more intentional prompt shape
where contextual updates are consistently represented as one developer
envelope plus one contextual user envelope, while keeping parsing and
compaction behavior aligned with that representation.
## Notable details
- the temporary `SettingsUpdateEnvelope` wrapper was removed; these
paths now return `Vec<ResponseItem>` directly
- local/remote compaction no longer rely on model-switch strip/restore
helpers
- contextual user detection is now driven by shared fragment definitions
instead of ad hoc matcher assembly
- AGENTS/user instructions are still the same logical context; only the
synthetic `<user_instructions>` wrapper was replaced by the natural
AGENTS text format
## Testing
- `just fmt`
- `cargo test -p codex-app-server
codex_message_processor::tests::extract_conversation_summary_prefers_plain_user_messages
-- --exact`
- `cargo test -p codex-core
compact::tests::collect_user_messages_filters_session_prefix_entries
--lib -- --exact`
- `cargo test -p codex-core --test all
'suite::compact::snapshot_request_shape_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::compact_remote::snapshot_request_shape_remote_pre_turn_compaction_strips_incoming_model_switch'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_apps_guidance_as_developer_message_when_enabled'
-- --exact`
- `cargo test -p codex-core --test all
'suite::client::includes_developer_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::includes_user_instructions_message_in_request' --
--exact`
- `cargo test -p codex-core --test all
'suite::client::resume_includes_initial_messages_and_sends_prior_items'
-- --exact`
- `cargo test -p codex-core --test all
'suite::review::review_input_isolated_from_parent_history' -- --exact`
- `cargo test -p codex-exec --test all
'suite::resume::exec_resume_last_respects_cwd_filter_and_all_flag' --
--exact`
- `cargo test -p core_test_support
context_snapshot::tests::full_text_mode_preserves_unredacted_text --
--exact`
## Notes
- I also ran several targeted `compact`, `compact_remote`,
`prompt_caching`, `model_visible_layout`, and `event_mapping` tests
while iterating on prompt-shape changes.
- I have not claimed a clean full-workspace `cargo test` from this
environment because local sandbox/resource conditions have previously
produced unrelated failures in large workspace runs.
Currently there is no bound on the length of a user message submitted in
the TUI or through the app server interface. That means users can paste
many megabytes of text, which can lead to bad performance, hangs, and
crashes. In extreme cases, it can lead to a [kernel
panic](https://github.com/openai/codex/issues/12323).
This PR limits the length of a user input to 2**20 (about 1M)
characters. This value was chosen because it fills the entire context
window on the latest models, so accepting longer inputs wouldn't make
sense anyway.
Summary
- add a shared `MAX_USER_INPUT_TEXT_CHARS` constant in codex-protocol
and surface it in TUI and app server code
- block oversized submissions in the TUI submit flow and emit error
history cells when validation fails
- reject heavy app-server requests with JSON-RPC `-32602` and structured
`input_too_large` data, plus document the behavior
Testing
- ran the IDE extension with this change and verified that when I
attempt to paste a user message that's several MB long, it correctly
reports an error instead of crashing or making my computer hot.
## Summary
- hide appended destinations for local path-style markdown links in the
TUI renderer
- keep web links rendering with their visible destination and style link
labels consistently
- add markdown renderer tests and a snapshot for the new file-link
output
## Testing
- just fmt
- cargo test -p codex-tui
<img width="1120" height="968" alt="image"
src="https://github.com/user-attachments/assets/490e8eda-ae47-4231-89fa-b254a1f83eed"
/>