## Summary
- add the guardian reviewer flow for `on-request` approvals in command,
patch, sandbox-retry, and managed-network approval paths
- keep guardian behind `features.guardian_approval` instead of exposing
a public `approval_policy = guardian` mode
- route ordinary `OnRequest` approvals to the guardian subagent when the
feature is enabled, without changing the public approval-mode surface
## Public model
- public approval modes stay unchanged
- guardian is enabled via `features.guardian_approval`
- when that feature is on, `approval_policy = on-request` keeps the same
approval boundaries but sends those approval requests to the guardian
reviewer instead of the user
- `/experimental` only persists the feature flag; it does not rewrite
`approval_policy`
- CLI and app-server no longer expose a separate `guardian` approval
mode in this PR
## Guardian reviewer
- the reviewer runs as a normal subagent and reuses the existing
subagent/thread machinery
- it is locked to a read-only sandbox and `approval_policy = never`
- it does not inherit user/project exec-policy rules
- it prefers `gpt-5.4` when the current provider exposes it, otherwise
falls back to the parent turn's active model
- it fail-closes on timeout, startup failure, malformed output, or any
other review error
- it currently auto-approves only when `risk_score < 80`
## Review context and policy
- guardian mirrors `OnRequest` approval semantics rather than
introducing a separate approval policy
- explicit `require_escalated` requests follow the same approval surface
as `OnRequest`; the difference is only who reviews them
- managed-network allowlist misses that enter the approval flow are also
reviewed by guardian
- the review prompt includes bounded recent transcript history plus
recent tool call/result evidence
- transcript entries and planned-action strings are truncated with
explicit `<guardian_truncated ... />` markers so large payloads stay
bounded
- apply-patch reviews include the full patch content (without
duplicating the structured `changes` payload)
- the guardian request layout is snapshot-tested using the same
model-visible Responses request formatter used elsewhere in core
## Guardian network behavior
- the guardian subagent inherits the parent session's managed-network
allowlist when one exists, so it can use the same approved network
surface while reviewing
- exact session-scoped network approvals are copied into the guardian
session with protocol/port scope preserved
- those copied approvals are now seeded before the guardian's first turn
is submitted, so inherited approvals are available during any immediate
review-time checks
## Out of scope / follow-ups
- the sandbox-permission validation split was pulled into a separate PR
and is not part of this diff
- a future follow-up can enable `serde_json` preserve-order in
`codex-core` and then simplify the guardian action rendering further
---------
Co-authored-by: Codex <noreply@openai.com>
## Why
`apply_patch` safety approval was still checking writable paths through
the legacy `SandboxPolicy` projection.
That can hide explicit `none` carveouts when a split filesystem policy
projects back to compatibility `ExternalSandbox`, which leaves one more
approval path that can auto-approve writes inside paths that are
intentionally blocked.
## What changed
- passed `turn.file_system_sandbox_policy` into `assess_patch_safety`
- changed writable-path checks to derive effective access from
`FileSystemSandboxPolicy` instead of the legacy `SandboxPolicy`
- made those checks reject explicit unreadable roots before considering
broad write access or writable roots
- added regression coverage showing that an `ExternalSandbox`
compatibility projection still asks for approval when the split
filesystem policy blocks a subpath
## Verification
- `cargo test -p codex-core safety::tests::`
- `cargo test -p codex-core test_sandbox_config_parsing`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13445).
* #13453
* #13452
* #13451
* #13449
* #13448
* __->__ #13445
* #13440
* #13439
---------
Co-authored-by: viyatb-oai <viyatb@openai.com>
Addresses feature request #13660
Adds new option to `/statusline` so the status line can display "fast
on" or "fast off"
Summary
- introduce a `FastMode` status-line item so `/statusline` can render
explicit `Fast on`/`Fast off` text for the service tier
- wire the item into the picker metadata and resolve its string from
`ChatWidget` without adding any unrelated `thread-name` logic or storage
changes
- ensure the refresh paths keep the cached footer in sync when the
service tier (fast mode) changes
Testing
- Manually tested
Here's what it looks like when enabled:
<img width="366" height="75" alt="image"
src="https://github.com/user-attachments/assets/7f992d2b-6dab-49ed-aa43-ad496f56f193"
/>
## Summary
- require windowsSandbox/setupStart.cwd to be an AbsolutePathBuf
- reject relative cwd values at request parsing instead of normalizing
them later in the setup flow
- add RPC-layer coverage for relative cwd rejection and update the
checked-in protocol schemas/docs
## Why
windowsSandbox/setupStart was carrying the client-provided cwd as a raw
PathBuf for command_cwd while config derivation normalized the same
value into an absolute policy_cwd.
That left room for relative-path ambiguity in the setup path, especially
for inputs like cwd: "repo". Making the RPC accept only absolute paths
removes that split entirely: the handler now receives one
already-validated absolute path and uses it for both config derivation
and setup.
This keeps the trust model unchanged. Trusted clients could already
choose the session cwd; this change is only about making the setup RPC
reject relative paths so command_cwd and policy_cwd cannot diverge.
## Testing
- cargo test -p codex-app-server windows_sandbox_setup (run locally by
user)
- cargo test -p codex-app-server-protocol windows_sandbox (run locally
by user)
## Summary
- distinguish reject-policy handling for prefix-rule approvals versus
sandbox approvals in Unix shell escalation
- keep prompting for skill-script execution when `rules=true` but
`sandbox_approval=false`, instead of denying the command up front
- add regression coverage for both skill-script reject-policy paths in
`codex-rs/core/tests/suite/skill_approval.rs`
## Why
`#13434` and `#13439` introduce split filesystem and network policies,
but the only code that could answer basic filesystem questions like "is
access effectively unrestricted?" or "which roots are readable and
writable for this cwd?" still lived on the legacy `SandboxPolicy` path.
That would force later backends to either keep projecting through
`SandboxPolicy` or duplicate path-resolution logic. This PR moves those
queries onto `FileSystemSandboxPolicy` itself so later runtime and
platform changes can consume the split policy directly.
## What changed
- added `FileSystemSandboxPolicy` helpers for full-read/full-write
checks, platform-default reads, readable roots, writable roots, and
explicit unreadable roots resolved against a cwd
- added a shared helper for the default read-only carveouts under
writable roots so the legacy and split-policy paths stay aligned
- added protocol coverage for full-access detection and derived
readable, writable, and unreadable roots
## Verification
- added protocol coverage in `protocol/src/protocol.rs` and
`protocol/src/permissions.rs` for full-root access and derived
filesystem roots
- verified the current PR state with `just clippy`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13440).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* __->__ #13440
* #13439
---------
Co-authored-by: viyatb-oai <viyatb@openai.com>
## Why
`#13434` introduces split `FileSystemSandboxPolicy` and
`NetworkSandboxPolicy`, but the runtime still made most execution-time
sandbox decisions from the legacy `SandboxPolicy` projection.
That projection loses information about combinations like unrestricted
filesystem access with restricted network access. In practice, that
means the runtime can choose the wrong platform sandbox behavior or set
the wrong network-restriction environment for a command even when config
has already separated those concerns.
This PR carries the split policies through the runtime so sandbox
selection, process spawning, and exec handling can consult the policy
that actually matters.
## What changed
- threaded `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` through
`TurnContext`, `ExecRequest`, sandbox attempts, shell escalation state,
unified exec, and app-server exec overrides
- updated sandbox selection in `core/src/sandboxing/mod.rs` and
`core/src/exec.rs` to key off `FileSystemSandboxPolicy.kind` plus
`NetworkSandboxPolicy`, rather than inferring behavior only from the
legacy `SandboxPolicy`
- updated process spawning in `core/src/spawn.rs` and the platform
wrappers to use `NetworkSandboxPolicy` when deciding whether to set
`CODEX_SANDBOX_NETWORK_DISABLED`
- kept additional-permissions handling and legacy `ExternalSandbox`
compatibility projections aligned with the split policies, including
explicit user-shell execution and Windows restricted-token routing
- updated callers across `core`, `app-server`, and `linux-sandbox` to
pass the split policies explicitly
## Verification
- added regression coverage in `core/tests/suite/user_shell_cmd.rs` to
verify `RunUserShellCommand` does not inherit
`CODEX_SANDBOX_NETWORK_DISABLED` from the active turn
- added coverage in `core/src/exec.rs` for Windows restricted-token
sandbox selection when the legacy projection is `ExternalSandbox`
- updated Linux sandbox coverage in
`linux-sandbox/tests/suite/landlock.rs` to exercise the split-policy
exec path
- verified the current PR state with `just clippy`
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13439).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* __->__ #13439
---------
Co-authored-by: viyatb-oai <viyatb@openai.com>
## Summary
- treat `requirements.toml` `allowed_domains` and `denied_domains` as
managed network baselines for the proxy
- in restricted modes by default, build the effective runtime policy
from the managed baseline plus user-configured allowlist and denylist
entries, so common hosts can be pre-approved without blocking later user
expansion
- add `experimental_network.managed_allowed_domains_only = true` to pin
the effective allowlist to managed entries, ignore user allowlist
additions, and hard-deny non-managed domains without prompting
- apply `managed_allowed_domains_only` anywhere managed network
enforcement is active, including full access, while continuing to
respect denied domains from all sources
- add regression coverage for merged-baseline behavior, managed-only
behavior, and full-access managed-only enforcement
## Behavior
Assuming `requirements.toml` defines both
`experimental_network.allowed_domains` and
`experimental_network.denied_domains`.
### Default mode
- By default, the effective allowlist is
`experimental_network.allowed_domains` plus user or persisted allowlist
additions.
- By default, the effective denylist is
`experimental_network.denied_domains` plus user or persisted denylist
additions.
- Allowlist misses can go through the network approval flow.
- Explicit denylist hits and local or private-network blocks are still
hard-denied.
- When `experimental_network.managed_allowed_domains_only = true`, only
managed `allowed_domains` are respected, user allowlist additions are
ignored, and non-managed domains are hard-denied without prompting.
- Denied domains continue to be respected from all sources.
### Full access
- With managed requirements present, the effective allowlist is pinned
to `experimental_network.allowed_domains`.
- With managed requirements present, the effective denylist is pinned to
`experimental_network.denied_domains`.
- There is no allowlist-miss approval path in full access.
- Explicit denylist hits are hard-denied.
- `experimental_network.managed_allowed_domains_only = true` now also
applies in full access, so managed-only behavior remains in effect
anywhere managed network enforcement is active.
## Summary
- resolve trust roots by inspecting `.git` entries on disk instead of
spawning `git rev-parse --git-common-dir`
- keep regular repo and linked-worktree trust inheritance behavior
intact
- add a synthetic regression test that proves worktree trust resolution
works without a real git command
## Testing
- `just fmt`
- `cargo test -p codex-core resolve_root_git_project_for_trust`
- `cargo clippy -p codex-core --all-targets -- -D warnings`
- `cargo test -p codex-core` (fails in this environment on unrelated
managed-config `DangerFullAccess` tests in `codex::tests`,
`tools::js_repl::tests`, and `unified_exec::tests`)
This fixes a schema export bug where two different `WebSearchAction`
types were getting merged under the same name in the app-server v2 JSON
schema bundle.
The problem was that v2 thread items use the app-server API's
`WebSearchAction` with camelCase variants like `openPage`, while
`ThreadResumeParams.history` and
`RawResponseItemCompletedNotification.item` pull in the upstream
`ResponseItem` graph, which uses the Responses API snake_case shape like
`open_page`. During bundle generation we were flattening nested
definitions into the v2 namespace by plain name, so the later definition
could silently overwrite the earlier one.
That meant clients generating code from the bundled schema could end up
with the wrong `WebSearchAction` definition for v2 thread history. In
practice this shows up on web search items reconstructed from rollout
files with persisted extended history.
This change does two things:
- Gives the upstream Responses API schema a distinct JSON schema name:
`ResponsesApiWebSearchAction`
- Makes namespace-level schema definition collisions fail loudly instead
of silently overwriting
* Add an ability to stream stdin, stdout, and stderr
* Streaming of stdout and stderr has a configurable cap for total amount
of transmitted bytes (with an ability to disable it)
* Add support for overriding environment variables
* Add an ability to terminate running applications (using
`command/exec/terminate`)
* Add TTY/PTY support, with an ability to resize the terminal (using
`command/exec/resize`)
Previously, we could only configure whether web search was on/off.
This PR enables sending along a web search config, which includes all
the stuff responsesapi supports: filters, location, etc.
## Summary
- Treat skill scripts with no permission profile, or an explicitly empty
one, as permissionless and run them with the turn's existing sandbox
instead of forcing an exec approval prompt.
- Keep the approval flow unchanged for skills that do declare additional
permissions.
- Update the skill approval tests to assert that permissionless skill
scripts do not prompt on either the initial run or a rerun.
## Why
Permissionless skills should inherit the current turn sandbox directly.
Prompting for exec approval in that case adds friction without granting
any additional capability.
1. Add a synced curated plugin marketplace and include it in marketplace
discovery.
2. Expose optional plugin.json interface metadata in plugin/list
3. Tighten plugin and marketplace path handling using validated absolute
paths.
4. Let manifests override skill, MCP, and app config paths.
5. Restrict plugin enablement/config loading to the user config layer so
plugin enablement is at global level
## Summary
This is a purely mechanical refactor of `OtelManager` ->
`SessionTelemetry` to better convey what the struct is doing. No
behavior change.
## Why
`OtelManager` ended up sounding much broader than what this type
actually does. It doesn't manage OTEL globally; it's the session-scoped
telemetry surface for emitting log/trace events and recording metrics
with consistent session metadata (`app_version`, `model`, `slug`,
`originator`, etc.).
`SessionTelemetry` is a more accurate name, and updating the call sites
makes that boundary a lot easier to follow.
## Validation
- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core`
I mainly use the devcontainer to be able to run `cargo clippy --tests`
locally for Linux.
We still need to make it possible to run clippy from Bazel so I don't
need to do this!
- add experimental_realtime_ws_startup_context to override or disable
realtime websocket startup context
- preserve generated startup context when unset and cover the new
override paths in tests
## Why
`SandboxPolicy` currently mixes together three separate concerns:
- parsing layered config from `config.toml`
- representing filesystem sandbox state
- carrying basic network policy alongside filesystem choices
That makes the existing config awkward to extend and blocks the new TOML
proposal where `[permissions]` becomes a table of named permission
profiles selected by `default_permissions`. (The idea is that if
`default_permissions` is not specified, we assume the user is opting
into the "traditional" way to configure the sandbox.)
This PR adds the config-side plumbing for those profiles while still
projecting back to the legacy `SandboxPolicy` shape that the current
macOS and Linux sandbox backends consume.
It also tightens the filesystem profile model so scoped entries only
exist for `:project_roots`, and so nested keys must stay within a
project root instead of using `.` or `..` traversal.
This drops support for the short-lived `[permissions.network]` in
`config.toml` because now that would be interpreted as a profile named
`network` within `[permissions]`.
## What Changed
- added `PermissionsToml`, `PermissionProfileToml`,
`FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config
can parse named profiles under `[permissions.<profile>.filesystem]`
- added top-level `default_permissions` selection, validation for
missing or unknown profiles, and compilation from a named profile into
split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values
- taught config loading to choose between the legacy `sandbox_mode` path
and the profile-based path without breaking legacy users
- introduced `codex-protocol::permissions` for the split filesystem and
network sandbox types, and stored those alongside the legacy projected
`sandbox_policy` in runtime `Permissions`
- modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a
nested `subpath`, matching the intended config syntax instead of
allowing invalid states for other special paths
- restricted scoped filesystem maps to `:project_roots`, with validation
that nested entries are non-empty descendant paths and cannot use `.` or
`..` to escape the project root
- kept existing runtime consumers working by projecting
`FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit
error for profiles that request writes outside the workspace root
- loaded proxy settings from top-level `[network]`
- regenerated `core/config.schema.json`
## Verification
- added config coverage for profile deserialization,
`default_permissions` selection, top-level `[network]` loading, network
enablement, rejection of writes outside the workspace root, rejection of
nested entries for non-`:project_roots` special paths, and rejection of
parent-directory traversal in `:project_roots` maps
- added protocol coverage for the legacy bridge rejecting non-workspace
writes
## Docs
- update the Codex config docs on developers.openai.com/codex to
document named `[permissions.<profile>]` entries, `default_permissions`,
scoped `:project_roots` syntax, the descendant-path restriction for
nested `:project_roots` entries, and top-level `[network]` proxy
configuration
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* #13439
* __->__ #13434
## Summary
Remove `docs/auth-login-logging-plan.md`.
## Why
The document was a temporary planning artifact. The durable rationale
for the
auth-login diagnostics work now lives in the code comments, tests, PR
context,
and existing implementation notes, so keeping the standalone plan doc
adds
duplicate maintenance surface.
## Testing
- not run (docs-only deletion)
Co-authored-by: Codex <noreply@openai.com>
## Summary
Clarify the `js_repl` prompt guidance around persistent bindings and
redeclaration recovery.
This updates the generated `js_repl` instructions in
`core/src/project_doc.rs` to prefer this order when a name is already
bound:
1. Reuse the existing binding
2. Reassign a previously declared `let`
3. Pick a new descriptive name
4. Use `{ ... }` only for short-lived scratch scope
5. Reset the kernel only when a clean state is actually needed
The prompt now also explicitly warns against wrapping an entire cell in
block scope when the goal is to reuse names across later cells.
## Why
The previous wording still left too much room for low-value workarounds
like whole-cell block wrapping. In downstream browser rollouts, that
pattern was adding tokens and preventing useful state reuse across
`js_repl` cells.
This change makes the preferred behavior more explicit without changing
runtime semantics.
## Scope
- Prompt/documentation change only
- No runtime behavior changes
- Updates the matching string-backed `project_doc` tests
Enhance pty utils:
* Support closing stdin
* Separate stderr and stdout streams to allow consumers differentiate them
* Provide compatibility helper to merge both streams back into combined one
* Support specifying terminal size for pty, including on-demand resizes while process is already running
* Support terminating the process while still consuming its outputs
## Problem
Browser login failures historically leave support with an incomplete
picture. HARs can show that the browser completed OAuth and reached the
localhost callback, but they do not explain why the native client failed
on the final `/oauth/token` exchange. Direct `codex login` also relied
mostly on terminal stderr and the browser error page, so even when the
login crate emitted better sign-in diagnostics through TUI or app-server
flows, the one-shot CLI path still did not leave behind an easy artifact
to collect.
## Mental model
This implementation treats the browser page, the returned `io::Error`,
and the normal structured log as separate surfaces with different safety
requirements. The browser page and returned error preserve the detail
that operators need to diagnose failures. The structured log stays
narrower: it records reviewed lifecycle events, parsed safe fields, and
redacted transport errors without becoming a sink for secrets or
arbitrary backend bodies.
Direct `codex login` now adds a fourth support surface: a small
file-backed log at `codex-login.log` under the configured `log_dir`.
That artifact carries the same login-target events as the other
entrypoints without changing the existing stderr/browser UX.
## Non-goals
This does not add auth logging to normal runtime requests, and it does
not try to infer precise transport root causes from brittle string
matching. The scope remains the browser-login callback flow in the
`login` crate plus a direct-CLI wrapper that persists those events to
disk.
This also does not try to reuse the TUI logging stack wholesale. The TUI
path initializes feedback, OpenTelemetry, and other session-oriented
layers that are useful for an interactive app but unnecessary for a
one-shot login command.
## Tradeoffs
The implementation favors fidelity for caller-visible errors and
restraint for persistent logs. Parsed JSON token-endpoint errors are
logged safely by field. Non-JSON token-endpoint bodies remain available
to the returned error so CLI and browser surfaces still show backend
detail. Transport errors keep their real `reqwest` message, but attached
URLs are surgically redacted. Custom issuer URLs are sanitized before
logging.
On the CLI side, the code intentionally duplicates a narrow slice of the
TUI file-logging setup instead of sharing the full initializer. That
keeps `codex login` easy to reason about and avoids coupling it to
interactive-session layers that the command does not need.
## Architecture
The core auth behavior lives in `codex-rs/login/src/server.rs`. The
callback path now logs callback receipt, callback validation,
token-exchange start, token-exchange success, token-endpoint non-2xx
responses, and transport failures. App-server consumers still use this
same login-server path via `run_login_server(...)`, so the same
instrumentation benefits TUI, Electron, and VS Code extension flows.
The direct CLI path in `codex-rs/cli/src/login.rs` now installs a small
file-backed tracing layer for login commands only. That writes
`codex-login.log` under `log_dir` with login-specific targets such as
`codex_cli::login` and `codex_login::server`.
## Observability
The main signals come from the `login` crate target and are
intentionally scoped to sign-in. Structured logs include redacted issuer
URLs, redacted transport errors, HTTP status, and parsed token-endpoint
fields when available. The callback-layer log intentionally avoids
`%err` on token-endpoint failures so arbitrary backend bodies do not get
copied into the normal log file.
Direct `codex login` now leaves a durable artifact for both failure and
success cases. Example output from the new file-backed CLI path:
Failing callback:
```text
2026-03-06T22:08:54.143612Z INFO codex_cli::login: starting browser login flow
2026-03-06T22:09:03.431699Z INFO codex_login::server: received login callback path=/auth/callback has_code=false has_state=true has_error=true state_valid=true
2026-03-06T22:09:03.431745Z WARN codex_login::server: oauth callback returned error error_code="access_denied" has_error_description=true
```
Succeeded callback and token exchange:
```text
2026-03-06T22:09:14.065559Z INFO codex_cli::login: starting browser login flow
2026-03-06T22:09:36.431678Z INFO codex_login::server: received login callback path=/auth/callback has_code=true has_state=true has_error=false state_valid=true
2026-03-06T22:09:36.436977Z INFO codex_login::server: starting oauth token exchange issuer=https://auth.openai.com/ redirect_uri=http://localhost:1455/auth/callback
2026-03-06T22:09:36.685438Z INFO codex_login::server: oauth token exchange succeeded status=200 OK
```
## Tests
- `cargo test -p codex-login`
- `cargo clippy -p codex-login --tests -- -D warnings`
- `cargo test -p codex-cli`
- `just bazel-lock-update`
- `just bazel-lock-check`
- manual direct `codex login` smoke tests for both a failing callback
and a successful browser login
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
This is a structural cleanup of `codex-otel` to make the ownership
boundaries a lot clearer.
For example, previously it was quite confusing that `OtelManager` which
emits log + trace event telemetry lived under
`codex-rs/otel/src/traces/`. Also, there were two places that defined
methods on OtelManager via `impl OtelManager` (`lib.rs` and
`otel_manager.rs`).
What changed:
- move the `OtelProvider` implementation into `src/provider.rs`
- move `OtelManager` and session-scoped event emission into
`src/events/otel_manager.rs`
- collapse the shared log/trace event helpers into
`src/events/shared.rs`
- pull target classification into `src/targets.rs`
- move `traceparent_context_from_env()` into `src/trace_context.rs`
- keep `src/otel_provider.rs` as a compatibility shim for existing
imports
- update the `codex-otel` README to reflect the new layout
## Why
`lib.rs` and `otel_provider.rs` were doing too many different jobs at
once: provider setup, export routing, trace-context helpers, and session
event emission all lived together.
This refactor separates those concerns without trying to change the
behavior of the crate. The goal is to make future OTEL work easier to
reason about and easier to review.
## Notes
- no intended behavior change
- `OtelManager` remains the session-scoped event emitter in this PR
- the `otel_provider` shim keeps downstream churn low while the
internals move around
## Validation
- `just fmt`
- `cargo test -p codex-otel`
- `just fix -p codex-otel`
## Summary
- reject the global `*` domain pattern in proxy allow/deny lists and
managed constraints introduced for testing earlier
- keep exact hosts plus scoped wildcards like `*.example.com` and
`**.example.com`
- update docs and regression tests for the new invalid-config behavior
At over 7,000 lines, `codex-rs/core/src/config/mod.rs` was getting a bit
unwieldy.
This PR does the same type of move as
https://github.com/openai/codex/pull/12957 to put unit tests in their
own file, though I decided `config_tests.rs` is a more intuitive name
than `mod_tests.rs`.
Ultimately, I'll codemod the rest of the codebase to follow suit, but I
want to do it in stages to reduce merge conflicts for people.
## Summary
- reduce the SQLite-backed log retention window from 90 days to 10 days
## Testing
- just fmt
- cargo test -p codex-state
Co-authored-by: Codex <noreply@openai.com>
#### What
Add structured `@plugin` parsing and TUI support for plugin mentions.
- Core: switch from plain-text `@display_name` parsing to structured
`plugin://...` mentions via `UserInput::Mention` and
`[$...](plugin://...)` links in text, same pattern as apps/skills.
- TUI: add plugin mention popup, autocomplete, and chips when typing
`$`. Load plugin capability summaries and feed them into the composer;
plugin mentions appear alongside skills and apps.
- Generalize mention parsing to a sigil parameter, still defaults to `$`
<img width="797" height="119" alt="image"
src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
/>
Builds on #13510. Currently clients have to build their own `id` via
`plugin@marketplace` and filter plugins to show by `enabled`, but we
will add `id` and `available` as fields returned from `plugin/list`
soon.
####Tests
Added tests, verified locally.
This branch:
* Avoid flushing DB when not necessary
* Filter events for which we perfom an `upsert` into the DB
* Add a dedicated update function of the `thread:updated_at` that is
lighter
This should significantly reduce the DB lock contention. If it is not
sufficient, we can de-sync the flush of the DB for `updated_at`
## Summary
- move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
database to reduce lock contention with the main state DB
- add a dedicated logs migrator and route `codex-state-logs` to the new
database path
- leave the old `logs` table in the existing state DB untouched for now
## Testing
- just fmt
- cargo test -p codex-state
---------
Co-authored-by: Codex <noreply@openai.com>
### Summary
This adds turn-level latency metrics for the first model output and the
first completed agent message.
- `codex.turn.ttft.duration_ms` starts at turn start and records on the
first output signal we see from the model. That includes normal
assistant text, reasoning deltas, and non-text outputs like tool-call
items.
- `codex.turn.ttfm.duration_ms` also starts at turn start, but it
records when the first agent message finishes streaming rather than when
its first delta arrives.
### Implementation notes
The timing is tracked in codex-core, not app-server, so the definition
stays consistent across CLI, TUI, and app-server clients.
I reused the existing turn lifecycle boundary that already drives
`codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
state, and record each metric once per turn.
I also wired the new metric names into the OTEL runtime metrics summary
so they show up in the same in-memory/debug snapshot path as the
existing timing metrics.
This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test.
The interrupt path can race with the follow-up `/responses` request that
reports the aborted tool call, so the test now allows that extra no-op
response instead of assuming there will only ever be one request. The
assertions still stay focused on the behavior the test actually cares
about: starting the zsh-forked command correctly.
Testing:
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2
-- --exact --nocapture`
## Summary
Today `SandboxPermissions::requires_additional_permissions()` does not
actually mean "is `WithAdditionalPermissions`". It returns `true` for
any non-default sandbox override, including `RequireEscalated`. That
broad behavior is relied on in multiple `main` callsites.
The naming is security-sensitive because `SandboxPermissions` is used on
shell-like tool calls to tell the executor how a single command should
relate to the turn sandbox:
- `UseDefault`: run with the turn sandbox unchanged
- `RequireEscalated`: request execution outside the sandbox
- `WithAdditionalPermissions`: stay sandboxed but widen permissions for
that command only
## Problem
The old helper name reads as if it only applies to the
`WithAdditionalPermissions` variant. In practice it means "this command
requested any explicit sandbox override."
That ambiguity made it easy to read production checks incorrectly and
made the guardian change look like a standalone `main` fix when it is
not.
On `main` today:
- `shell` and `unified_exec` intentionally reject any explicit
`sandbox_permissions` request unless approval policy is `OnRequest`
- `exec_policy` intentionally treats any explicit sandbox override as
prompt-worthy in restricted sandboxes
- tests intentionally serialize both `RequireEscalated` and
`WithAdditionalPermissions` as explicit sandbox override requests
So changing those callsites from the broad helper to a narrow
`WithAdditionalPermissions` check would be a behavior change, not a pure
cleanup.
## What This PR Does
- documents `SandboxPermissions` as a per-command sandbox override, not
a generic permissions bag
- adds `requests_sandbox_override()` for the broad meaning: anything
except `UseDefault`
- adds `uses_additional_permissions()` for the narrow meaning: only
`WithAdditionalPermissions`
- keeps `requires_additional_permissions()` as a compatibility alias to
the broad meaning for now
- updates the current broad callsites to use the accurately named broad
helper
- adds unit coverage that locks in the semantics of all three helpers
## What This PR Does Not Do
This PR does not change runtime behavior. That is intentional.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add one-time session recovery in `RmcpClient` for streamable HTTP MCP
`404` session expiry
- rebuild the transport and retry the failed operation once after
reinitializing the client state
- extend the test server and integration coverage for `404`, `401`,
single-retry, and non-session failure scenarios
## Testing
- just fmt
- cargo test -p codex-rmcp-client (the post-rebase run lost its final
summary in the terminal; the suite had passed earlier before the rebase)
- just fix -p codex-rmcp-client
`/feedback` uploads can include `codex-logs.log` from the in-memory
feedback logger path. That logger was emitting level + message without a
timestamp, which made some uploaded logs much harder to inspect. This
change makes the feedback logger use an explicit timer so
feedback-captured log lines include timestamps consistently.
This is not Windows-specific code. The bug showed up in Windows reports
because those uploads were hitting the feedback-buffer path more often,
while Linux/macOS reports were typically coming from the SQLite feedback
export, which already prefixes timestamps.
Here's an example of a log that is missing the timestamps:
```
TRACE app-server request: getAuthStatus
TRACE app-server request: model/list
INFO models cache: evaluating cache eligibility
INFO models cache: attempting load_fresh
INFO models cache: loaded cache file
INFO models cache: cache version mismatch
INFO models cache: no usable cache entry
DEBUG
INFO models cache: cache miss, fetching remote models
TRACE windows::current_platform is called
TRACE Returning Info { os_type: Windows, version: Semantic(10, 0, 26200), edition: Some("Windows 11 Professional"), codename: None, bitness: X64, architecture: Some("x86_64") }
```
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.
checks are best effort based on `codex_apps` loading, much like
`app/list`.
#### Tests
Added integration tests, tested locally.
## Summary
Simplify the trusted directory flow. This logic was originally designed
several months ago, to determine if codex should start in read-only or
workspace-write mode. However, that's no longer the purpose of directory
trust - and therefore we should get rid of this logic.
## Testing
- [x] Unit tests pass