## Why
`SandboxPolicy` currently mixes together three separate concerns:
- parsing layered config from `config.toml`
- representing filesystem sandbox state
- carrying basic network policy alongside filesystem choices
That makes the existing config awkward to extend and blocks the new TOML
proposal where `[permissions]` becomes a table of named permission
profiles selected by `default_permissions`. (The idea is that if
`default_permissions` is not specified, we assume the user is opting
into the "traditional" way to configure the sandbox.)
This PR adds the config-side plumbing for those profiles while still
projecting back to the legacy `SandboxPolicy` shape that the current
macOS and Linux sandbox backends consume.
It also tightens the filesystem profile model so scoped entries only
exist for `:project_roots`, and so nested keys must stay within a
project root instead of using `.` or `..` traversal.
This drops support for the short-lived `[permissions.network]` in
`config.toml` because now that would be interpreted as a profile named
`network` within `[permissions]`.
## What Changed
- added `PermissionsToml`, `PermissionProfileToml`,
`FilesystemPermissionsToml`, and `FilesystemPermissionToml` so config
can parse named profiles under `[permissions.<profile>.filesystem]`
- added top-level `default_permissions` selection, validation for
missing or unknown profiles, and compilation from a named profile into
split `FileSystemSandboxPolicy` and `NetworkSandboxPolicy` values
- taught config loading to choose between the legacy `sandbox_mode` path
and the profile-based path without breaking legacy users
- introduced `codex-protocol::permissions` for the split filesystem and
network sandbox types, and stored those alongside the legacy projected
`sandbox_policy` in runtime `Permissions`
- modeled `FileSystemSpecialPath` so only `ProjectRoots` can carry a
nested `subpath`, matching the intended config syntax instead of
allowing invalid states for other special paths
- restricted scoped filesystem maps to `:project_roots`, with validation
that nested entries are non-empty descendant paths and cannot use `.` or
`..` to escape the project root
- kept existing runtime consumers working by projecting
`FileSystemSandboxPolicy` back into `SandboxPolicy`, with an explicit
error for profiles that request writes outside the workspace root
- loaded proxy settings from top-level `[network]`
- regenerated `core/config.schema.json`
## Verification
- added config coverage for profile deserialization,
`default_permissions` selection, top-level `[network]` loading, network
enablement, rejection of writes outside the workspace root, rejection of
nested entries for non-`:project_roots` special paths, and rejection of
parent-directory traversal in `:project_roots` maps
- added protocol coverage for the legacy bridge rejecting non-workspace
writes
## Docs
- update the Codex config docs on developers.openai.com/codex to
document named `[permissions.<profile>]` entries, `default_permissions`,
scoped `:project_roots` syntax, the descendant-path restriction for
nested `:project_roots` entries, and top-level `[network]` proxy
configuration
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/13434).
* #13453
* #13452
* #13451
* #13449
* #13448
* #13445
* #13440
* #13439
* __->__ #13434
## Summary
Remove `docs/auth-login-logging-plan.md`.
## Why
The document was a temporary planning artifact. The durable rationale
for the
auth-login diagnostics work now lives in the code comments, tests, PR
context,
and existing implementation notes, so keeping the standalone plan doc
adds
duplicate maintenance surface.
## Testing
- not run (docs-only deletion)
Co-authored-by: Codex <noreply@openai.com>
## Summary
Clarify the `js_repl` prompt guidance around persistent bindings and
redeclaration recovery.
This updates the generated `js_repl` instructions in
`core/src/project_doc.rs` to prefer this order when a name is already
bound:
1. Reuse the existing binding
2. Reassign a previously declared `let`
3. Pick a new descriptive name
4. Use `{ ... }` only for short-lived scratch scope
5. Reset the kernel only when a clean state is actually needed
The prompt now also explicitly warns against wrapping an entire cell in
block scope when the goal is to reuse names across later cells.
## Why
The previous wording still left too much room for low-value workarounds
like whole-cell block wrapping. In downstream browser rollouts, that
pattern was adding tokens and preventing useful state reuse across
`js_repl` cells.
This change makes the preferred behavior more explicit without changing
runtime semantics.
## Scope
- Prompt/documentation change only
- No runtime behavior changes
- Updates the matching string-backed `project_doc` tests
Enhance pty utils:
* Support closing stdin
* Separate stderr and stdout streams to allow consumers differentiate them
* Provide compatibility helper to merge both streams back into combined one
* Support specifying terminal size for pty, including on-demand resizes while process is already running
* Support terminating the process while still consuming its outputs
## Problem
Browser login failures historically leave support with an incomplete
picture. HARs can show that the browser completed OAuth and reached the
localhost callback, but they do not explain why the native client failed
on the final `/oauth/token` exchange. Direct `codex login` also relied
mostly on terminal stderr and the browser error page, so even when the
login crate emitted better sign-in diagnostics through TUI or app-server
flows, the one-shot CLI path still did not leave behind an easy artifact
to collect.
## Mental model
This implementation treats the browser page, the returned `io::Error`,
and the normal structured log as separate surfaces with different safety
requirements. The browser page and returned error preserve the detail
that operators need to diagnose failures. The structured log stays
narrower: it records reviewed lifecycle events, parsed safe fields, and
redacted transport errors without becoming a sink for secrets or
arbitrary backend bodies.
Direct `codex login` now adds a fourth support surface: a small
file-backed log at `codex-login.log` under the configured `log_dir`.
That artifact carries the same login-target events as the other
entrypoints without changing the existing stderr/browser UX.
## Non-goals
This does not add auth logging to normal runtime requests, and it does
not try to infer precise transport root causes from brittle string
matching. The scope remains the browser-login callback flow in the
`login` crate plus a direct-CLI wrapper that persists those events to
disk.
This also does not try to reuse the TUI logging stack wholesale. The TUI
path initializes feedback, OpenTelemetry, and other session-oriented
layers that are useful for an interactive app but unnecessary for a
one-shot login command.
## Tradeoffs
The implementation favors fidelity for caller-visible errors and
restraint for persistent logs. Parsed JSON token-endpoint errors are
logged safely by field. Non-JSON token-endpoint bodies remain available
to the returned error so CLI and browser surfaces still show backend
detail. Transport errors keep their real `reqwest` message, but attached
URLs are surgically redacted. Custom issuer URLs are sanitized before
logging.
On the CLI side, the code intentionally duplicates a narrow slice of the
TUI file-logging setup instead of sharing the full initializer. That
keeps `codex login` easy to reason about and avoids coupling it to
interactive-session layers that the command does not need.
## Architecture
The core auth behavior lives in `codex-rs/login/src/server.rs`. The
callback path now logs callback receipt, callback validation,
token-exchange start, token-exchange success, token-endpoint non-2xx
responses, and transport failures. App-server consumers still use this
same login-server path via `run_login_server(...)`, so the same
instrumentation benefits TUI, Electron, and VS Code extension flows.
The direct CLI path in `codex-rs/cli/src/login.rs` now installs a small
file-backed tracing layer for login commands only. That writes
`codex-login.log` under `log_dir` with login-specific targets such as
`codex_cli::login` and `codex_login::server`.
## Observability
The main signals come from the `login` crate target and are
intentionally scoped to sign-in. Structured logs include redacted issuer
URLs, redacted transport errors, HTTP status, and parsed token-endpoint
fields when available. The callback-layer log intentionally avoids
`%err` on token-endpoint failures so arbitrary backend bodies do not get
copied into the normal log file.
Direct `codex login` now leaves a durable artifact for both failure and
success cases. Example output from the new file-backed CLI path:
Failing callback:
```text
2026-03-06T22:08:54.143612Z INFO codex_cli::login: starting browser login flow
2026-03-06T22:09:03.431699Z INFO codex_login::server: received login callback path=/auth/callback has_code=false has_state=true has_error=true state_valid=true
2026-03-06T22:09:03.431745Z WARN codex_login::server: oauth callback returned error error_code="access_denied" has_error_description=true
```
Succeeded callback and token exchange:
```text
2026-03-06T22:09:14.065559Z INFO codex_cli::login: starting browser login flow
2026-03-06T22:09:36.431678Z INFO codex_login::server: received login callback path=/auth/callback has_code=true has_state=true has_error=false state_valid=true
2026-03-06T22:09:36.436977Z INFO codex_login::server: starting oauth token exchange issuer=https://auth.openai.com/ redirect_uri=http://localhost:1455/auth/callback
2026-03-06T22:09:36.685438Z INFO codex_login::server: oauth token exchange succeeded status=200 OK
```
## Tests
- `cargo test -p codex-login`
- `cargo clippy -p codex-login --tests -- -D warnings`
- `cargo test -p codex-cli`
- `just bazel-lock-update`
- `just bazel-lock-check`
- manual direct `codex login` smoke tests for both a failing callback
and a successful browser login
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
This is a structural cleanup of `codex-otel` to make the ownership
boundaries a lot clearer.
For example, previously it was quite confusing that `OtelManager` which
emits log + trace event telemetry lived under
`codex-rs/otel/src/traces/`. Also, there were two places that defined
methods on OtelManager via `impl OtelManager` (`lib.rs` and
`otel_manager.rs`).
What changed:
- move the `OtelProvider` implementation into `src/provider.rs`
- move `OtelManager` and session-scoped event emission into
`src/events/otel_manager.rs`
- collapse the shared log/trace event helpers into
`src/events/shared.rs`
- pull target classification into `src/targets.rs`
- move `traceparent_context_from_env()` into `src/trace_context.rs`
- keep `src/otel_provider.rs` as a compatibility shim for existing
imports
- update the `codex-otel` README to reflect the new layout
## Why
`lib.rs` and `otel_provider.rs` were doing too many different jobs at
once: provider setup, export routing, trace-context helpers, and session
event emission all lived together.
This refactor separates those concerns without trying to change the
behavior of the crate. The goal is to make future OTEL work easier to
reason about and easier to review.
## Notes
- no intended behavior change
- `OtelManager` remains the session-scoped event emitter in this PR
- the `otel_provider` shim keeps downstream churn low while the
internals move around
## Validation
- `just fmt`
- `cargo test -p codex-otel`
- `just fix -p codex-otel`
## Summary
- reject the global `*` domain pattern in proxy allow/deny lists and
managed constraints introduced for testing earlier
- keep exact hosts plus scoped wildcards like `*.example.com` and
`**.example.com`
- update docs and regression tests for the new invalid-config behavior
At over 7,000 lines, `codex-rs/core/src/config/mod.rs` was getting a bit
unwieldy.
This PR does the same type of move as
https://github.com/openai/codex/pull/12957 to put unit tests in their
own file, though I decided `config_tests.rs` is a more intuitive name
than `mod_tests.rs`.
Ultimately, I'll codemod the rest of the codebase to follow suit, but I
want to do it in stages to reduce merge conflicts for people.
## Summary
- reduce the SQLite-backed log retention window from 90 days to 10 days
## Testing
- just fmt
- cargo test -p codex-state
Co-authored-by: Codex <noreply@openai.com>
#### What
Add structured `@plugin` parsing and TUI support for plugin mentions.
- Core: switch from plain-text `@display_name` parsing to structured
`plugin://...` mentions via `UserInput::Mention` and
`[$...](plugin://...)` links in text, same pattern as apps/skills.
- TUI: add plugin mention popup, autocomplete, and chips when typing
`$`. Load plugin capability summaries and feed them into the composer;
plugin mentions appear alongside skills and apps.
- Generalize mention parsing to a sigil parameter, still defaults to `$`
<img width="797" height="119" alt="image"
src="https://github.com/user-attachments/assets/f0fe2658-d908-4927-9139-73f850805ceb"
/>
Builds on #13510. Currently clients have to build their own `id` via
`plugin@marketplace` and filter plugins to show by `enabled`, but we
will add `id` and `available` as fields returned from `plugin/list`
soon.
####Tests
Added tests, verified locally.
This branch:
* Avoid flushing DB when not necessary
* Filter events for which we perfom an `upsert` into the DB
* Add a dedicated update function of the `thread:updated_at` that is
lighter
This should significantly reduce the DB lock contention. If it is not
sufficient, we can de-sync the flush of the DB for `updated_at`
## Summary
- move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
database to reduce lock contention with the main state DB
- add a dedicated logs migrator and route `codex-state-logs` to the new
database path
- leave the old `logs` table in the existing state DB untouched for now
## Testing
- just fmt
- cargo test -p codex-state
---------
Co-authored-by: Codex <noreply@openai.com>
### Summary
This adds turn-level latency metrics for the first model output and the
first completed agent message.
- `codex.turn.ttft.duration_ms` starts at turn start and records on the
first output signal we see from the model. That includes normal
assistant text, reasoning deltas, and non-text outputs like tool-call
items.
- `codex.turn.ttfm.duration_ms` also starts at turn start, but it
records when the first agent message finishes streaming rather than when
its first delta arrives.
### Implementation notes
The timing is tracked in codex-core, not app-server, so the definition
stays consistent across CLI, TUI, and app-server clients.
I reused the existing turn lifecycle boundary that already drives
`codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
state, and record each metric once per turn.
I also wired the new metric names into the OTEL runtime metrics summary
so they show up in the same in-memory/debug snapshot path as the
existing timing metrics.
This fixes a flaky `turn_start_shell_zsh_fork_executes_command_v2` test.
The interrupt path can race with the follow-up `/responses` request that
reports the aborted tool call, so the test now allows that extra no-op
response instead of assuming there will only ever be one request. The
assertions still stay focused on the behavior the test actually cares
about: starting the zsh-forked command correctly.
Testing:
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::turn_start_zsh_fork::turn_start_shell_zsh_fork_executes_command_v2
-- --exact --nocapture`
## Summary
Today `SandboxPermissions::requires_additional_permissions()` does not
actually mean "is `WithAdditionalPermissions`". It returns `true` for
any non-default sandbox override, including `RequireEscalated`. That
broad behavior is relied on in multiple `main` callsites.
The naming is security-sensitive because `SandboxPermissions` is used on
shell-like tool calls to tell the executor how a single command should
relate to the turn sandbox:
- `UseDefault`: run with the turn sandbox unchanged
- `RequireEscalated`: request execution outside the sandbox
- `WithAdditionalPermissions`: stay sandboxed but widen permissions for
that command only
## Problem
The old helper name reads as if it only applies to the
`WithAdditionalPermissions` variant. In practice it means "this command
requested any explicit sandbox override."
That ambiguity made it easy to read production checks incorrectly and
made the guardian change look like a standalone `main` fix when it is
not.
On `main` today:
- `shell` and `unified_exec` intentionally reject any explicit
`sandbox_permissions` request unless approval policy is `OnRequest`
- `exec_policy` intentionally treats any explicit sandbox override as
prompt-worthy in restricted sandboxes
- tests intentionally serialize both `RequireEscalated` and
`WithAdditionalPermissions` as explicit sandbox override requests
So changing those callsites from the broad helper to a narrow
`WithAdditionalPermissions` check would be a behavior change, not a pure
cleanup.
## What This PR Does
- documents `SandboxPermissions` as a per-command sandbox override, not
a generic permissions bag
- adds `requests_sandbox_override()` for the broad meaning: anything
except `UseDefault`
- adds `uses_additional_permissions()` for the narrow meaning: only
`WithAdditionalPermissions`
- keeps `requires_additional_permissions()` as a compatibility alias to
the broad meaning for now
- updates the current broad callsites to use the accurately named broad
helper
- adds unit coverage that locks in the semantics of all three helpers
## What This PR Does Not Do
This PR does not change runtime behavior. That is intentional.
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add one-time session recovery in `RmcpClient` for streamable HTTP MCP
`404` session expiry
- rebuild the transport and retry the failed operation once after
reinitializing the client state
- extend the test server and integration coverage for `404`, `401`,
single-retry, and non-session failure scenarios
## Testing
- just fmt
- cargo test -p codex-rmcp-client (the post-rebase run lost its final
summary in the terminal; the suite had passed earlier before the rebase)
- just fix -p codex-rmcp-client
`/feedback` uploads can include `codex-logs.log` from the in-memory
feedback logger path. That logger was emitting level + message without a
timestamp, which made some uploaded logs much harder to inspect. This
change makes the feedback logger use an explicit timer so
feedback-captured log lines include timestamps consistently.
This is not Windows-specific code. The bug showed up in Windows reports
because those uploads were hitting the feedback-buffer path more often,
while Linux/macOS reports were typically coming from the SQLite feedback
export, which already prefixes timestamps.
Here's an example of a log that is missing the timestamps:
```
TRACE app-server request: getAuthStatus
TRACE app-server request: model/list
INFO models cache: evaluating cache eligibility
INFO models cache: attempting load_fresh
INFO models cache: loaded cache file
INFO models cache: cache version mismatch
INFO models cache: no usable cache entry
DEBUG
INFO models cache: cache miss, fetching remote models
TRACE windows::current_platform is called
TRACE Returning Info { os_type: Windows, version: Semantic(10, 0, 26200), edition: Some("Windows 11 Professional"), codename: None, bitness: X64, architecture: Some("x86_64") }
```
# External (non-OpenAI) Pull Request Requirements
Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md
If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.
Include a link to a bug report or enhancement request.
#### What
on `plugin/install`, check if installed apps are already authed on
chatgpt, and return list of all apps that are not. clients can use this
list to trigger auth workflows as needed.
checks are best effort based on `codex_apps` loading, much like
`app/list`.
#### Tests
Added integration tests, tested locally.
## Summary
Simplify the trusted directory flow. This logic was originally designed
several months ago, to determine if codex should start in read-only or
workspace-write mode. However, that's no longer the purpose of directory
trust - and therefore we should get rid of this logic.
## Testing
- [x] Unit tests pass
We do this for codex-command-runner.exe as well for the same reason.
Windows sandbox users cannot execute binaries in the WindowsApp/
installed directory for the Codex App. This causes apply-patch to fail
because it tries to execute codex.exe as the sandbox user.
## Summary
- delete the network proxy admin server and its runtime listener/task
plumbing
- remove the admin endpoint config, runtime, requirement, protocol,
schema, and debug-surface fields
- update proxy docs to reflect the remaining HTTP and SOCKS listeners
only
## Summary
This PR:
1. fixes a deserialization mismatch for macOS automation permissions in
approval payloads by making core parsing accept both supported wire
shapes for bundle IDs.
2. added `#[serde(default)]` to `MacOsSeatbeltProfileExtensions` so
omitted fields deserialize to secure defaults.
## Why this change is needed
`MacOsAutomationPermission` uses `#[serde(try_from =
"MacOsAutomationPermissionDe")]`, so deserialization is controlled by
`MacOsAutomationPermissionDe`. After we aligned v2
`additionalPermissions.macos.automations` to the core shape, approval
payloads started including `{ "bundle_ids": [...] }` in some paths.
`MacOsAutomationPermissionDe` previously accepted only `"none" | "all"`
or a plain array, so object-shaped bundle IDs failed with `data did not
match any variant of untagged enum MacOsAutomationPermissionDe`. This
change restores compatibility by accepting both forms while preserving
existing normalization behavior (trim values and map empty bundle lists
to `None`).
## Validation
saw this error went away when running
```
cargo run -p codex-app-server-test-client -- \
--codex-bin ./target/debug/codex \
-c 'approval_policy="on-request"' \
-c 'features.shell_zsh_fork=true' \
-c 'zsh_path="/tmp/codex-zsh-fork/package/vendor/aarch64-apple-darwin/zsh/macos-15/zsh"' \
send-message-v2 --experimental-api \
'Use $apple-notes and run scripts/notes_info now.'
```
:
```
Error: failed to deserialize ServerRequest from JSONRPCRequest
Caused by:
data did not match any variant of untagged enum MacOsAutomationPermissionDe
```
## Summary
This PR removes legacy macOS permission model types from
`codex-rs/protocol/src/models.rs`:
- `MacOsPermissions`
- `MacOsPreferencesValue`
- `MacOsAutomationValue`
The protocol now relies on the current `MacOsSeatbeltProfileExtensions`
model for macOS permission data.
## Summary
- default the resume picker sort key to UpdatedAt instead of CreatedAt
- keep Tab sort toggling behavior and update the test expectation for
the new default
## Testing
- just fmt
- cargo test -p codex-tui
Co-authored-by: Codex <noreply@openai.com>
## Summary
- keep the SQLite schema unchanged (no migrations)
- add timestamps to SQLite-backed `/feedback` log exports
- keep the existing SQL-side byte cap behavior and newline handling
- document the remaining fidelity gap (span prefixes + structured
fields) with TODOs
## Details
- update `query_feedback_logs` to format each exported line as:
- `YYYY-MM-DDTHH:MM:SS.ffffffZ {level} {message}`
- continue scoping rows to requested-thread + same-process threadless
logs
- continue capping in SQL before returning rows
- keep the existing fallback behavior unchanged when SQLite returns no
rows
- update parity tests to normalize away the new timestamp prefix while
we still only store `message`
## Follow-up
- TODO already in code: persist enough span/event metadata in SQLite to
reproduce span prefixes and structured fields in `/feedback` exports
## Testing
- `cargo test -p codex-state`
- `just fmt`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- render pending steer previews with a single `pending steer:` prefix
instead of repeating it for each source line
- reuse the same truncation path for pending steers and queued drafts so
multiline previews behave consistently
- add snapshot coverage for the multiline pending steer case
Before
<img width="969" height="219" alt="Screenshot 2026-03-05 at 3 55 11 PM"
src="https://github.com/user-attachments/assets/b062c9c8-43d3-4a52-98e0-3c7643d1697b"
/>
After
<img width="965" height="203" alt="Screenshot 2026-03-05 at 3 56 08 PM"
src="https://github.com/user-attachments/assets/40935863-55b3-444f-9e14-1ac63126b2e1"
/>
## Codex author
`codex resume 019cc054-385e-79a3-bb85-ec9499623bd8`
Co-authored-by: Codex <noreply@openai.com>
- Replay thread rollback from the persisted rollout history instead of
truncating in-memory state.\n- Add rollback coverage, including
rollback-behind-compaction snapshot coverage.
## Summary
- group recent work by git repo when available, otherwise by directory
- render recent work as bounded user asks with per-thread cwd context
- exclude hidden files and directories from workspace trees
### Motivation
Today config.toml has three different OTEL knobs under `[otel]`:
- `exporter` controls where OTEL logs go
- `trace_exporter` controls where OTEL traces go
- `metrics_exporter` controls where metrics go
Those often (pretty much always?) serve different purposes.
For example, for OpenAI internal usage, the **log exporter** is already
being used for IT/security telemetry, and that use case is intentionally
content-rich: tool calls, arguments, outputs, MCP payloads, and in some
cases user content are all useful there. `log_user_prompt` is a good
example of that distinction. When it’s enabled, we include raw prompt
text in OTEL logs, which is acceptable for the security use case.
The **trace exporter** is a different story. The goal there is to give
OpenAI engineers visibility into latency and request behavior when they
run Codex locally, without sending sensitive prompt or tool data as
trace event data. In other words, traces should help answer “what was
slow?” or “where did time go?”, not “what did the user say?” or “what
did the tool return?”
The complication is that Rust’s `tracing` crate does not make a hard
distinction between “logs” and “trace events.” It gives us one
instrumentation API for logs and trace events (via `tracing::event!`),
and subscribers decide what gets treated as logs, trace events, or both.
Before this change, our OTEL trace layer was effectively attached to the
general tracing stream, which meant turning on `trace_exporter` could
pick up content-rich events that were originally written with logging
(and the `log_exporter`) in mind. That made it too easy for sensitive
data to end up in exported traces by accident.
### Concrete example
In `otel_manager.rs`, this `tracing::event!` call would be exported in
both logs AND traces (as a trace event).
```
pub fn user_prompt(&self, items: &[UserInput]) {
let prompt = items
.iter()
.flat_map(|item| match item {
UserInput::Text { text, .. } => Some(text.as_str()),
_ => None,
})
.collect::<String>();
let prompt_to_log = if self.metadata.log_user_prompts {
prompt.as_str()
} else {
"[REDACTED]"
};
tracing::event!(
tracing::Level::INFO,
event.name = "codex.user_prompt",
event.timestamp = %timestamp(),
// ...
prompt = %prompt_to_log,
);
}
```
Instead of `tracing::event!`, we should now be using `log_event!` and
`trace_event!` instead to more clearly indicate which sink (logs vs.
traces) that event should be exported to.
### What changed
This PR makes the log and trace export distinct instead of treating them
as two sinks for the same data.
On the provider side, OTEL logs and traces now have separate
routing/filtering policy. The log exporter keeps receiving the existing
`codex_otel` events, while trace export is limited to spans and trace
events.
On the event side, `OtelManager` now emits two flavors of telemetry
where needed:
- a log-only event with the current rich payloads
- a tracing-safe event with summaries only
It also has a convenience `log_and_trace_event!` macro for emitting to
both logs and traces when it's safe to do so, as well as log- and
trace-specific fields.
That means prompts, tool args, tool output, account email, MCP metadata,
and similar content stay in the log lane, while traces get the pieces
that are actually useful for performance work: durations, counts, sizes,
status, token counts, tool origin, and normalized error classes.
This preserves current IT/security logging behavior while making it safe
to turn on trace export for employees.
### Full list of things removed from trace export
- raw user prompt text from `codex.user_prompt`
- raw tool arguments and output from `codex.tool_result`
- MCP server metadata from `codex.tool_result` (mcp_server,
mcp_server_origin)
- account identity fields like `user.email` and `user.account_id` from
trace-safe OTEL events
- `host.name` from trace resources
- generic `codex.tool_decision` events from traces
- generic `codex.sse_event` events from traces
- the full ToolCall debug payload from the `handle_tool_call` span
What traces now keep instead is mostly:
- spans
- trace-safe OTEL events
- counts, lengths, durations, status, token counts, and tool origin
summaries
- Update `models.json` to surface the new model entry.
- Refresh the TUI model picker snapshot to match the updated catalog
ordering.
---------
Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
## Note-- added plugin mentions via @, but that conflicts with file
mentions
depends and builds upon #13433.
- introduces explicit `@plugin` mentions. this injects the plugin's mcp
servers, app names, and skill name format into turn context as a dev
message.
- we do not yet have UI for these mentions, so we currently parse raw
text (as opposed to skills and apps which have UI chips, autocomplete,
etc.) this depends on a `plugins/list` app-server endpoint we can feed
the UI with, which is upcoming
- also annotate mcp and app tool descriptions with the plugin(s) they
come from. this gives the model a first class way of understanding what
tools come from which plugins, which will help implicit invocation.
### Tests
Added and updated tests, unit and integration. Also confirmed locally a
raw `@plugin` injects the dev message, and the model knows about its
apps, mcps, and skills.
## Summary
This updates the `js_repl` prompt and docs to make the image guidance
less confusing.
## What changed
- Clarified that `codex.emitImage(...)` adds one image per call and can
be called multiple times to emit multiple images.
- Reworded the image-encoding guidance to be general `js_repl` advice
instead of `ImageDetailOriginal`-specific behavior.
- Updated the guidance to recommend JPEG at about quality 85 when lossy
compression is acceptable, and PNG when transparency or lossless detail
matters.
- Mirrored the same wording in the public `js_repl` docs.
This improves macOS Seatbelt handling for sandboxed tool processes.
## Changes
- Allow dual-stack local binding in proxy-managed sessions, while still
keeping traffic limited to loopback and configured proxy endpoints.
- Replace the old generic unix-socket path rule with explicit AF_UNIX
permissions for socket creation, bind, and outbound connect.
- Keep explicitly approved wrapper sockets connect-only.
Local helper servers are less likely to fail when binding on macOS.
Tools using local unix-socket IPC should work more reliably under the
sandbox.
Full-network sessions, proxy fail-closed behavior, and proxy lifecycle
are unchanged.
## Summary
- always pass `--unshare-user` in the Linux bubblewrap argv builders
- stop relying on bubblewrap's auto-userns behavior, which is skipped
for `uid 0`
- update argv expectations in tests and document the explicit user
namespace behavior
The installed Codex binary reproduced the same issue with:
- `codex -c features.use_linux_sandbox_bwrap=true sandbox linux -- true`
- `bwrap: Creating new namespace failed: Operation not permitted`
This happens because Codex asked bubblewrap for mount/pid/network
namespaces without explicitly asking for a user namespace. In a
root-inside-container environment without ambient `CAP_SYS_ADMIN`, that
fails. Adding `--unshare-user` makes bubblewrap create the user
namespace first and then the remaining namespaces succeed.
This PR adds a durable trace linkage for each turn by storing the active
trace ID on the rollout TurnContext record stored in session rollout
files.
Before this change, we propagated trace context at runtime but didn’t
persist a stable per-turn trace key in rollout history. That made
after-the-fact debugging harder (for example, mapping a historical turn
to the corresponding trace in datadog). This sets us up for much easier
debugging in the future.
### What changed
- Added an optional `trace_id` to TurnContextItem (rollout schema).
- Added a small OTEL helper to read the current span trace ID.
- Captured `trace_id` when creating `TurnContext` and included it in
`to_turn_context_item()`.
- Updated tests and fixtures that construct TurnContextItem so
older/no-trace cases still work.
### Why this approach
TurnContext is already the canonical durable per-turn metadata in
rollout. This keeps ownership clean: trace linkage lives with other
persisted turn metadata.