## Summary
- add device-code ChatGPT sign-in to `tui_app_server` onboarding and
reuse the existing `chatgptAuthTokens` login path
- fall back to browser login when device-code auth is unavailable on the
server
- treat `ChatgptAuthTokens` as an existing signed-in ChatGPT state
during onboarding
- add a local ChatGPT auth loader for handing local tokens to the app
server and serving refresh requests
- handle `account/chatgptAuthTokens/refresh` instead of marking it
unsupported, including workspace/account mismatch checks
- add focused coverage for onboarding success, existing auth handling,
local auth loading, and refresh request behavior
## Testing
- `cargo test -p codex-tui-app-server`
- `just fix -p codex-tui-app-server`
## Summary
This is PR 2 of the Windows sandbox runner split.
PR 1 introduced the framed IPC runner foundation and related Windows
sandbox infrastructure without changing the active elevated one-shot
execution path. This PR switches that elevated one-shot path over to the
new runner IPC transport and removes the old request-file bootstrap that
PR 1 intentionally left in place.
After this change, ordinary elevated Windows sandbox commands still
behave as one-shot executions, but they now run as the simple case of
the same helper/IPC transport that later unified_exec work will build
on.
## Why this is needed for unified_exec
Windows elevated sandboxed execution crosses a user boundary: the CLI
launches a helper as the sandbox user and has to manage command
execution from outside that security context. For one-shot commands, the
old request-file/bootstrap flow was sufficient. For unified_exec, it is
not.
Unified_exec needs a long-lived bidirectional channel so the parent can:
- send a spawn request
- receive structured spawn success/failure
- stream stdout and stderr incrementally
- eventually support stdin writes, termination, and other session
lifecycle events
This PR does not add long-lived sessions yet. It converts the existing
elevated one-shot path to use the same framed IPC transport so that PR 3
can add unified_exec session semantics on top of a transport that is
already exercised by normal elevated command execution.
## Scope
This PR:
- updates `windows-sandbox-rs/src/elevated_impl.rs` to launch the runner
with named pipes, send a framed `SpawnRequest`, wait for `SpawnReady`,
and collect framed `Output`/`Exit` messages
- removes the old `--request-file=...` execution path from
`windows-sandbox-rs/src/elevated/command_runner_win.rs`
- keeps the public behavior one-shot: no session reuse or interactive
unified_exec behavior is introduced here
This PR does not:
- add Windows unified_exec session support
- add background terminal reuse
- add PTY session lifecycle management
## Why Windows needs this and Linux/macOS do not
On Linux and macOS, the existing sandbox/process model composes much
more directly with long-lived process control. The parent can generally
spawn and own the child process (or PTY) directly inside the sandbox
model we already use.
Windows elevated sandboxing is different. The parent is not directly
managing the sandboxed process in the same way; it launches across a
different user/security context. That means long-lived control requires
an explicit helper process plus IPC for spawn, output, exit, and later
stdin/session control.
So the extra machinery here is not because unified_exec is conceptually
different on Windows. It is because the elevated Windows sandbox
boundary requires a helper-mediated transport to support it cleanly.
## Validation
- `cargo test -p codex-windows-sandbox`
### Why
i'm working on something that parses and analyzes codex rollout logs,
and i'd like to have a schema for generating a parser/validator.
`codex app-server generate-internal-json-schema` writes an
`RolloutLine.json` file
while doing this, i noticed we have a writer <> reader mismatch issue on
`FunctionCallOutputPayload` and reasoning item ID -- added some schemars
annotations to fix those
### Test
```
$ just codex app-server generate-internal-json-schema --out ./foo
```
generates an `RolloutLine.json` file, which i validated against jsonl
files on disk
`just codex app-server --help` doesn't expose the
`generate-internal-json-schema` option by default, but you can do `just
codex app-server generate-internal-json-schema --help` if you know the
command
everything else still works
---------
Co-authored-by: Codex <noreply@openai.com>
## What is flaky
`codex-rs/app-server/tests/suite/fuzzy_file_search.rs` intermittently
loses the expected `fuzzyFileSearch/sessionUpdated` and
`fuzzyFileSearch/sessionCompleted` notifications when multiple
fuzzy-search sessions are active and CI delivers notifications out of
order.
## Why it was flaky
The wait helpers were keyed only by JSON-RPC method name.
- `wait_for_session_updated` consumed the next
`fuzzyFileSearch/sessionUpdated` notification even when it belonged to a
different search session.
- `wait_for_session_completed` did the same for
`fuzzyFileSearch/sessionCompleted`.
- Once an unmatched notification was read, it was dropped permanently
instead of buffered.
- That meant a valid completion for the target search could arrive
slightly early, be consumed by the wrong waiter, and disappear before
the test started waiting for it.
The result depended on notification ordering and runner scheduling
instead of on the actual product behavior.
## How this PR fixes it
- Add a buffered notification reader in
`codex-rs/app-server/tests/common/mcp_process.rs`.
- Match fuzzy-search notifications on the identifying payload fields
instead of matching only on method name.
- Preserve unmatched notifications in the in-process queue so later
waiters can still consume them.
- Include pending notification methods in timeout failures to make
future diagnosis concrete.
## Why this fix fixes the flakiness
The test now behaves like a real consumer of an out-of-order event
stream: notifications for other sessions stay buffered until the correct
waiter asks for them. Reordering no longer loses the target event, so
the test result is determined by whether the server emitted the right
notifications, not by which one happened to be read first.
Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com>
Co-authored-by: Codex <noreply@openai.com>
## Problem
When the TUI connects to a **remote** app-server (via WebSocket), resume
and fork operations lost all conversation history.
`AppServerStartedThread` carried only the `SessionConfigured` event, not
the full `Thread` snapshot. After resume or fork, the chat transcript
was empty — prior turns were silently discarded.
A secondary issue: `primary_session_configured` was not cleared on
reset, causing stale session state after reconnection.
## Approach: TUI-side only, zero app-server changes
The app-server **already returns** the full `Thread` object (with
populated `turns: Vec<Turn>`) in its `ThreadStartResponse`,
`ThreadResumeResponse`, and `ThreadForkResponse`. The data was always
there — the TUI was simply throwing it away. The old
`AppServerStartedThread` struct only kept the `SessionConfiguredEvent`,
discarding the rich turn history that the server had already provided.
This PR fixes the problem entirely within `tui_app_server` (3 files
changed, 0 changes to `app-server`, `app-server-protocol`, or any other
crate). Rather than modifying the server to send history in a different
format or adding a new endpoint, the fix preserves the existing `Thread`
snapshot and replays it through the TUI's standard event pipeline —
making restored sessions indistinguishable from live ones.
## Solution
Add a **thread snapshot replay** path. When the server hands back a
`Thread` object (on start, resume, or fork),
`restore_started_app_server_thread` converts its historical turns into
the same core `Event` sequence the TUI already processes for live
interactions, then replays them into the event store so the chat widget
renders them.
Key changes:
- **`AppServerStartedThread` now carries the full `Thread`** —
`started_thread_from_{start,resume,fork}_response` clone the thread into
the struct alongside the existing `SessionConfiguredEvent`.
- **`thread_snapshot_events()`** walks the thread's turns and items,
producing `TurnStarted` → `ItemCompleted`* →
`TurnComplete`/`TurnAborted` event sequences that the TUI already knows
how to render.
- **`restore_started_app_server_thread()`** pushes the session event +
history events into the thread channel's store, activates the channel,
and replays the snapshot — used for initial startup, resume, and fork.
- **`primary_session_configured` cleared on reset** to prevent stale
session state after reconnection.
## Tradeoffs
- **`Thread` is cloned into `AppServerStartedThread`**: The full thread
snapshot (including all historical turns) is cloned at startup. For
long-lived threads this could be large, but it's a one-time cost and
avoids lifetime gymnastics with the response.
## Tests
- `restore_started_app_server_thread_replays_remote_history` —
end-to-end: constructs a `Thread` with one completed turn, restores it,
and asserts user/agent messages appear in the transcript.
- `bridges_thread_snapshot_turns_for_resume_restore` — unit: verifies
`thread_snapshot_events` produces the correct event sequence for
completed and interrupted turns.
## Test plan
- [ ] Verify `cargo check -p codex-tui-app-server` passes
- [ ] Verify `cargo test -p codex-tui-app-server` passes
- [ ] Manual: connect to a remote app-server, resume an existing thread,
confirm history renders in the chat widget
- [ ] Manual: fork a thread via remote, confirm prior turns appear
### Summary
The goal is for us to get the latest turn model and reasoning effort on
thread/resume is no override is provided on the thread/resume func call.
This is the part 1 which we write the model and reasoning effort for a
thread to the sqlite db and there will be a followup PR to consume the
two new fields on thread/resume.
[part 2 PR is currently WIP](https://github.com/openai/codex/pull/14888)
and this one can be merged independently.
## Description
This PR fixes a bad first-turn failure mode in app-server when the
startup websocket prewarm hangs. Before this change, `initialize ->
thread/start -> turn/start` could sit behind the prewarm for up to five
minutes, so the client would not see `turn/started`, and even
`turn/interrupt` would block because the turn had not actually started
yet.
Now, we:
- set a (configurable) timeout of 15s for websocket startup time,
exposed as `websocket_startup_timeout_ms` in config.toml
- `turn/started` is sent immediately on `turn/start` even if the
websocket is still connecting
- `turn/interrupt` can be used to cancel a turn that is still waiting on
the websocket warmup
- the turn task will wait for the full 15s websocket warming timeout
before falling back
## Why
The old behavior made app-server feel stuck at exactly the moment the
client expects turn lifecycle events to start flowing. That was
especially painful for external clients, because from their point of
view the server had accepted the request but then went silent for
minutes.
## Configuring the websocket startup timeout
Can set it in config.toml like this:
```
[model_providers.openai]
supports_websockets = true
websocket_connect_timeout_ms = 15000
```
## Summary
- make `report_agent_job_result` atomically transition an item from
running to completed while storing `result_json`
- remove brittle finalization grace-sleep logic and make finished-item
cleanup idempotent
- replace blind fixed-interval waiting with status-subscription-based
waiting for active worker threads
- add state runtime tests for atomic completion and late-report
rejection
## Why
This addresses the race and polling concerns in #13948 by removing
timing-based correctness assumptions and reducing unnecessary status
polling churn.
## Validation
- `cd codex-rs && just fmt`
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-core --test all suite::agent_jobs`
- `cd codex-rs && cargo test`
- fails in an unrelated app-server tracing test:
`message_processor::tracing_tests::thread_start_jsonrpc_span_exports_server_span_and_parents_children`
timed out waiting for response
## Notes
- This PR supersedes #14129 with the same agent-jobs fix on a clean
branch from `main`.
- The earlier PR branch was stacked on unrelated history, which made the
review diff include unrelated commits.
Fixes#13948
## Summary
- skip nonexistent `workspace-write` writable roots in the Linux
bubblewrap mount builder instead of aborting sandbox startup
- keep existing writable roots mounted normally so mixed Windows/WSL
configs continue to work
- add unit and Linux integration regression coverage for the
missing-root case
## Context
This addresses regression A from #14875. Regression B will be handled in
a separate PR.
The old bubblewrap integration added `ensure_mount_targets_exist` as a
preflight guard because bubblewrap bind targets must exist, and failing
early let Codex return a clearer error than a lower-level mount failure.
That policy turned out to be too strict once bubblewrap became the
default Linux sandbox: shared Windows/WSL or mixed-platform configs can
legitimately contain a well-formed writable root that does not exist on
the current machine. This PR keeps bubblewrap's existing-target
requirement, but changes Codex to skip missing writable roots instead of
treating them as fatal configuration errors.
PR https://github.com/openai/codex/pull/14512 added an in-process app
server and started to wire up the tui to use it. We were originally
planning to modify the `tui` code in place, converting it to use the app
server a bit at a time using a hybrid adapter. We've since decided to
create an entirely new parallel `tui_app_server` implementation and do
the conversion all at once but retain the existing `tui` while we work
the bugs out of the new implementation.
This PR undoes the changes to the `tui` made in the PR #14512 and
restores the old initialization to its previous state. This allows us to
modify the `tui_app_server` without the risk of regressing the old `tui`
code. For example, we can start to remove support for all legacy core
events, like the ones that PR https://github.com/openai/codex/pull/14892
needed to ignore.
Testing:
* I manually verified that the old `tui` starts and shuts down without a
problem.
The in-process app-server currently emits both typed
`ServerNotification`s and legacy `codex/event/*` notifications for the
same live turn updates. `tui_app_server` was consuming both paths, so
message deltas and completed items could be enqueued twice and rendered
as duplicated output in the transcript.
Ignore legacy notifications for event types that already have typed (app
server) notification handling, while keeping legacy fallback behavior
for events that still only arrive on the old path. This preserves
compatibility without duplicating streamed commentary or final agent
output.
We will remove all of the legacy event handlers over time; they're here
only during the short window where we're moving the tui to use the app
server.
## Problem
On Linux, Codex can be launched from a workspace path that is a symlink
(for example, a symlinked checkout or a symlinked parent directory).
Our sandbox policy intentionally canonicalizes writable/readable roots
to the real filesystem path before building the bubblewrap mounts. That
part is correct and needed for safety.
The remaining bug was that bubblewrap could still inherit the helper
process's logical cwd, which might be the symlinked alias instead of the
mounted canonical path. In that case, the sandbox starts in a cwd that
does not exist inside the sandbox namespace even though the real
workspace is mounted. This can cause sandboxed commands to fail in
symlinked workspaces.
## Fix
This PR keeps the sandbox policy behavior the same, but separates two
concepts that were previously conflated:
- the canonical cwd used to define sandbox mounts and permissions
- the caller's logical cwd used when launching the command
On the Linux bubblewrap path, we now thread the logical command cwd
through the helper explicitly and only add `--chdir <canonical path>`
when the logical cwd differs from the mounted canonical path.
That means:
- permissions are still computed from canonical paths
- bubblewrap starts the command from a cwd that definitely exists inside
the sandbox
- we do not widen filesystem access or undo the earlier symlink
hardening
## Why This Is Safe
This is a narrow Linux-only launch fix, not a policy change.
- Writable/readable root canonicalization stays intact.
- Protected metadata carveouts still operate on canonical roots.
- We only override bubblewrap's inherited cwd when the logical path
would otherwise point at a symlink alias that is not mounted in the
sandbox.
## Tests
- kept the existing protocol/core regression coverage for symlink
canonicalization
- added regression coverage for symlinked cwd handling in the Linux
bubblewrap builder/helper path
Local validation:
- `just fmt`
- `cargo test -p codex-protocol`
- `cargo test -p codex-core
normalize_additional_permissions_canonicalizes_symlinked_write_paths`
- `cargo clippy -p codex-linux-sandbox -p codex-protocol -p codex-core
--tests -- -D warnings`
- `cargo build --bin codex`
## Context
This is related to #14694. The earlier writable-root symlink fix
addressed the mount/permission side; this PR fixes the remaining
symlinked-cwd launch mismatch in the Linux sandbox path.
## Stack Position
4/4. Top-of-stack sibling built on #14830.
## Base
- #14830
## Sibling
- #14829
## Scope
- Gate low-level mic chunks while speaker playback is active, while
still allowing spoken barge-in.
---------
Co-authored-by: Codex <noreply@openai.com>
## Stack Position
3/4. Top-of-stack sibling built on #14830.
## Base
- #14830
## Sibling
- #14827
## Scope
- Extend the realtime startup context with a bounded summary of the
latest thread turns for continuity.
---------
Co-authored-by: Codex <noreply@openai.com>
This change adds Jason to codex-core's built-in subagent nickname pool
so spawned agents can pick it without any custom role configuration. The
default list was simply missing that predefined name (a grave mistake).
## Stack Position
2/4. Built on top of #14828.
## Base
- #14828
## Unblocks
- #14829
- #14827
## Scope
- Port the realtime v2 wire parsing, session, app-server, and
conversation runtime behavior onto the split websocket-method base.
- Branch runtime behavior directly on the current realtime session kind
instead of parser-derived flow flags.
- Keep regression coverage in the existing e2e suites.
---------
Co-authored-by: Codex <noreply@openai.com>
- Added forceRemoteSync to plugin/install and plugin/uninstall.
- With forceRemoteSync=true, we update the remote plugin status first,
then apply the local change only if the backend call succeeds.
- Kept plugin/list(forceRemoteSync=true) as the main recon path, and for
now it treats remote enabled=false as uninstall. We
will eventually migrate to plugin/installed for more precise state
handling.
### Motivation
- Prevent newly-created skills from being placed in unexpected locations
by prompting for an install path and defaulting to a discoverable
location so skills are usable immediately.
- Make the `skill-creator` instructions explicit about the recommended
default (`~/.codex/skills` / `$CODEX_HOME/skills`) so the agent and
users follow a consistent, discoverable convention.
### Description
- Updated `codex-rs/skills/src/assets/samples/skill-creator/SKILL.md` to
add a user prompt: "Where should I create this skill? If you do not have
a preference, I will place it in ~/.codex/skills so Codex can discover
it automatically.".
- Added guidance before running `init_skill.py` that if the user does
not specify a location, the agent should default to `~/.codex/skills`
(equivalently `$CODEX_HOME/skills`) for auto-discovery.
- Updated the `init_skill.py` examples in the same `SKILL.md` to use
`~/.codex/skills` as the recommended default while keeping one custom
path example.
### Testing
- Ran `cargo test -p codex-skills` and the crate's unit test suite
passed (`1 passed; 0 failed`).
- Verified relevant discovery behavior in code by checking
`codex-rs/utils/home-dir/src/lib.rs` (`find_codex_home` defaults to
`~/.codex`) and `codex-rs/core/src/skills/loader.rs` (user skill roots
include `$CODEX_HOME/skills`).
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b75a50bb008322a278e55eb0ddccd6)
## Why
Once the repo-local lint exists, `codex-rs` needs to follow the
checked-in convention and CI needs to keep it from drifting. This commit
applies the fallback `/*param*/` style consistently across existing
positional literal call sites without changing those APIs.
The longer-term preference is still to avoid APIs that require comments
by choosing clearer parameter types and call shapes. This PR is
intentionally the mechanical follow-through for the places where the
existing signatures stay in place.
After rebasing onto newer `main`, the rollout also had to cover newly
introduced `tui_app_server` call sites. That made it clear the first cut
of the CI job was too expensive for the common path: it was spending
almost as much time installing `cargo-dylint` and re-testing the lint
crate as a representative test job spends running product tests. The CI
update keeps the full workspace enforcement but trims that extra
overhead from ordinary `codex-rs` PRs.
## What changed
- keep a dedicated `argument_comment_lint` job in `rust-ci`
- mechanically annotate remaining opaque positional literals across
`codex-rs` with exact `/*param*/` comments, including the rebased
`tui_app_server` call sites that now fall under the lint
- keep the checked-in style aligned with the lint policy by using
`/*param*/` and leaving string and char literals uncommented
- cache `cargo-dylint`, `dylint-link`, and the relevant Cargo
registry/git metadata in the lint job
- split changed-path detection so the lint crate's own `cargo test` step
runs only when `tools/argument-comment-lint/*` or `rust-ci.yml` changes
- continue to run the repo wrapper over the `codex-rs` workspace, so
product-code enforcement is unchanged
Most of the code changes in this commit are intentionally mechanical
comment rewrites or insertions driven by the lint itself.
## Verification
- `./tools/argument-comment-lint/run.sh --workspace`
- `cargo test -p codex-tui-app-server -p codex-tui`
- parsed `.github/workflows/rust-ci.yml` locally with PyYAML
---
* -> #14652
* #14651
## Stack Position
1/4. Base PR in the realtime stack.
## Base
- `main`
## Unblocks
- #14830
## Scope
- Split the realtime websocket request builders into `common`, `v1`, and
`v2` modules.
- Keep runtime behavior unchanged in this PR.
---------
Co-authored-by: Codex <noreply@openai.com>
- **Summary**
- expose `exit` through the code mode bridge and module so scripts can
stop mid-flight
- surface the helper in the description documentation
- add a regression test ensuring `exit()` terminates execution cleanly
- **Testing**
- Not run (not requested)
# Summary
This PR introduces the Windows sandbox runner IPC foundation that later
unified_exec work will build on.
The key point is that this is intentionally infrastructure-only. The new
IPC transport, runner plumbing, and ConPTY helpers are added here, but
the active elevated Windows sandbox path still uses the existing
request-file bootstrap. In other words, this change prepares the
transport and module layout we need for unified_exec without switching
production behavior over yet.
Part of this PR is also a source-layout cleanup: some Windows sandbox
files are moved into more explicit `elevated/`, `conpty/`, and shared
locations so it is clearer which code is for the elevated sandbox flow,
which code is legacy/direct-spawn behavior, and which helpers are shared
between them. That reorganization is intentional in this first PR so
later behavioral changes do not also have to carry a large amount of
file-move churn.
# Why This Is Needed For unified_exec
Windows elevated sandboxed unified_exec needs a long-lived,
bidirectional control channel between the CLI and a helper process
running under the sandbox user. That channel has to support:
- starting a process and reporting structured spawn success/failure
- streaming stdout/stderr back incrementally
- forwarding stdin over time
- terminating or polling a long-lived process
- supporting both pipe-backed and PTY-backed sessions
The existing elevated one-shot path is built around a request-file
bootstrap and does not provide those primitives cleanly. Before we can
turn on Windows sandbox unified_exec, we need the underlying runner
protocol and transport layer that can carry those lifecycle events and
streams.
# Why Windows Needs More Machinery Than Linux Or macOS
Linux and macOS can generally build unified_exec on top of the existing
sandbox/process model: the parent can spawn the child directly, retain
normal ownership of stdio or PTY handles, and manage the lifetime of the
sandboxed process without introducing a second control process.
Windows elevated sandboxing is different. To run inside the sandbox
boundary, we cross into a different user/security context and then need
to manage a long-lived process from outside that boundary. That means we
need an explicit helper process plus an IPC transport to carry spawn,
stdin, output, and exit events back and forth. The extra code here is
mostly that missing Windows sandbox infrastructure, not a conceptual
difference in unified_exec itself.
# What This PR Adds
- the framed IPC message types and transport helpers for parent <->
runner communication
- the renamed Windows command runner with both the existing request-file
bootstrap and the dormant IPC bootstrap
- named-pipe helpers for the elevated runner path
- ConPTY helpers and process-thread attribute plumbing needed for
PTY-backed sessions
- shared sandbox/process helpers that later PRs will reuse when
switching live execution paths over
- early file/module moves so later PRs can focus on behavior rather than
layout churn
# What This PR Does Not Yet Do
- it does not switch the active elevated one-shot path over to IPC yet
- it does not enable Windows sandbox unified_exec yet
- it does not remove the existing request-file bootstrap yet
So while this code compiles and the new path has basic validation, it is
not yet the exercised production path. That is intentional for this
first PR: the goal here is to land the transport and runner foundation
cleanly before later PRs start routing real command execution through
it.
# Follow-Ups
Planned follow-up PRs will:
1. switch elevated one-shot Windows sandbox execution to the new runner
IPC path
2. layer Windows sandbox unified_exec sessions on top of the same
transport
3. remove the legacy request-file path once the IPC-based path is live
# Validation
- `cargo build -p codex-windows-sandbox`
###### Why/Context/Summary
- Exclude injected AGENTS.md instructions and standalone skill payloads
from memory stage 1 inputs so memory generation focuses on conversation
content instead of prompt scaffolding.
- Strip only the AGENTS fragment from mixed contextual user messages
during stage-1 serialization, which preserves environment context in the
same message.
- Keep subagent notifications in the memory input, and add focused unit
coverage for the fragment classifier, rollout policy, and stage-1
serialization path.
###### Test plan
- `just fmt`
- `cargo test -p codex-core --lib contextual_user_message`
- `cargo test -p codex-core --lib rollout::policy`
- `cargo test -p codex-core --lib memories::phase1`
This PR replicates the `tui` code directory and creates a temporary
parallel `tui_app_server` directory. It also implements a new feature
flag `tui_app_server` to select between the two tui implementations.
Once the new app-server-based TUI is stabilized, we'll delete the old
`tui` directory and feature flag.
The issue was due to a circular `Drop` schema where the embedded
app-server wait for some listeners that wait for this app-server
them-selves.
The fix is an explicit cleaning
**Repro:**
* Start codex
* Ask it to spawn a sub-agent
* Close Codex
* It takes 5s to exit
Make `interrupted` an agent state and make it not final. As a result, a
`wait` won't return on an interrupted agent and no notification will be
send to the parent agent.
The rationals are:
* If a user interrupt a sub-agent for any reason, you don't want the
parent agent to instantaneously ask the sub-agent to restart
* If a parent agent interrupt a sub-agent, no need to add a noisy
notification in the parent agen
Fix https://github.com/openai/codex/issues/14161
This fixes sub-agent [[skills.config]] overrides being ignored when
parent and child share the same cwd. The root cause was that turn skill
loading rebuilt from cwd-only state and reused a cwd-scoped cache, so
role-local skill enable/disable overrides did not reliably affect the
spawned agent's effective skill set.
This change switches turn construction to use the effective per-turn
config and adds a config-aware skills cache keyed by skill roots plus
final disabled paths.
## Summary
- reuse a guardian subagent session across approvals so reviews keep a
stable prompt cache key and avoid one-shot startup overhead
- clear the guardian child history before each review so prior guardian
decisions do not leak into later approvals
- include the `smart_approvals` -> `guardian_approval` feature flag
rename in the same PR to minimize release latency on a very tight
timeline
- add regression coverage for prompt-cache-key reuse without
prior-review prompt bleed
## Request
- Bug/enhancement request: internal guardian prompt-cache and latency
improvement request
---------
Co-authored-by: Codex <noreply@openai.com>
### Motivation
- Interrupting a running turn (Ctrl+C / Esc) currently also terminates
long‑running background shells, which is surprising for workflows like
local dev servers or file watchers.
- The existing cleanup command name was confusing; callers expect an
explicit command to stop background terminals rather than a UI clear
action.
- Make background‑shell termination explicit and surface a clearer
command name while preserving backward compatibility.
### Description
- Renamed the background‑terminal cleanup slash command from `Clean`
(`/clean`) to `Stop` (`/stop`) and kept `clean` as an alias in the
command parsing/visibility layer, updated the user descriptions and
command popup wiring accordingly.
- Updated the unified‑exec footer text and snapshots to point to `/stop`
(and trimmed corresponding snapshot output to match the new label).
- Changed interrupt behavior so `Op::Interrupt` (Ctrl+C / Esc interrupt)
no longer closes or clears tracked unified exec / background terminal
processes in the TUI or core cleanup path; background shells are now
preserved after an interrupt.
- Updated protocol/docs to clarify that `turn/interrupt` (or
`Op::Interrupt`) interrupts the active turn but does not terminate
background terminals, and that `thread/backgroundTerminals/clean` is the
explicit API to stop those shells.
- Updated unit/integration tests and insta snapshots in the TUI and core
unified‑exec suites to reflect the new semantics and command name.
### Testing
- Ran formatting with `just fmt` in `codex-rs` (succeeded).
- Ran `cargo test -p codex-protocol` (succeeded).
- Attempted `cargo test -p codex-tui` but the build could not complete
in this environment due to a native build dependency that requires
`libcap` development headers (the `codex-linux-sandbox` vendored build
step); install `libcap-dev` / make `libcap.pc` available in
`PKG_CONFIG_PATH` to run the TUI test suite locally.
- Updated and accepted the affected `insta` snapshots for the TUI
changes so visual diffs reflect the new `/stop` wording and preserved
interrupt behavior.
------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_69b39c44b6dc8323bd133ae206310fae)
- [x] Bypass tool search and stuff tool specs directly into model
context when either a. Tool search is not available for the model or b.
There are not that many tools to search for.
CXC-392
[With
401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream)
<img width="1909" height="555" alt="401 auth tags in Sentry"
src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3"
/>
- auth_401_*: preserved facts from the latest unauthorized response snapshot
- auth_*: latest auth-related facts from the latest request attempt
- auth_recovery_*: unauthorized recovery state and follow-up result
Without 401
<img width="1917" height="522" alt="happy-path auth tags in Sentry"
src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f"
/>
###### Summary
- Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation.
- Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
- Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
- Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate.
###### Rationale (from spec findings)
- The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
- Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
- Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
- The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.
###### Scope
- Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing.
- Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.
###### Trade-offs
- This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
- This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
- Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.
###### Client follow-up
- PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing.
- A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
- `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path.
###### Testing
- `cargo test -p codex-core refresh_available_models_sorts_by_priority`
- `cargo test -p codex-core emit_feedback_request_tags_`
- `cargo test -p codex-core emit_feedback_auth_recovery_tags_`
- `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
- `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers`
- `cargo test -p codex-core identity_auth_details`
- `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details`
- `cargo test -p codex-core --all-features --no-run`
- `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability`
- `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability`
- `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
## Summary
- normalize effective readable, writable, and unreadable sandbox roots
after resolving special paths so symlinked roots use canonical runtime
paths
- add a protocol regression test for a symlinked writable root with a
denied child and update protocol expectations to canonicalized effective
paths
- update macOS seatbelt tests to assert against effective normalized
roots produced by the shared policy helpers
## Testing
- just fmt
- cargo test -p codex-protocol
- cargo test -p codex-core explicit_unreadable_paths_are_excluded_
- cargo clippy -p codex-protocol -p codex-core --tests -- -D warnings
## Notes
- This is intended to fix the symlinked TMPDIR bind failure in
bubblewrap described in #14672.
Fixes#14672
This extends dynamic_tool_calls to allow us to hide a tool from the
model context but still use it as part of the general tool calling
runtime (for ex from js_repl/code_mode)
make plugins' `defaultPrompt` an array, but keep backcompat for strings.
the array is limited by app-server to 3 entries of up to 128 chars
(drops extra entries, `None`s-out ones that are too long) without
erroring if those invariants are violating.
added tests, tested locally.
## Summary
- regenerate `sdk/python` protocol-derived artifacts on latest
`origin/main`
- update `notification_registry.py` to match the regenerated
notification set
- fix the stale SDK test expectation for `GranularAskForApproval`
## Validation
- `cd sdk/python && python scripts/update_sdk_artifacts.py
generate-types`
- `cd sdk/python && python -m pytest`