core-agent-ide

Author	SHA1	Message	Date
Ahmed Ibrahim	2aa4873802	Move auth code into login crate (#15150 ) - Move the auth implementation and token data into codex-login. - Keep codex-core re-exporting that surface from codex-login for existing callers. --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-19 18:58:17 -07:00
gabec-openai	fe287ac467	Log automated reviewer approval sources distinctly (#15201 ) ## Summary - log guardian-reviewed tool approvals as `source=automated_reviewer` in `codex.tool_decision` - keep direct user approvals as `source=user` and config-driven approvals as `source=config` ## Testing - `/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-fmt-quiet.sh` - `/Users/gabec/.codex/skills/codex-oss-fastdev/scripts/codex-rs-test-quiet.sh -p codex-otel` (fails in sandboxed loopback bind tests under `otel/tests/suite/otlp_http_loopback.rs`) - `cargo test -p codex-core guardian -- --nocapture` (original-tree run reached Guardian tests and only hit sandbox-related listener/proxy failures) Co-authored-by: Codex <noreply@openai.com>	2026-03-19 12:10:41 -07:00
jif-oai	2cf4d5ef35	chore: add metrics for profile (#15180 )	2026-03-19 15:48:02 +00:00
xl-openai	86982ca1f9	Revert "fix: harden plugin feature gating" (#15102 ) Reverts openai/codex#15020 I messed up the commit in my PR and accidentally merged changes that were still under review.	2026-03-18 15:19:29 -07:00
xl-openai	580f32ad2a	fix: harden plugin feature gating (#15020 ) 1. Use requirement-resolved config.features as the plugin gate. 2. Guard plugin/list, plugin/read, and related flows behind that gate. 3. Skip bad marketplace.json files instead of failing the whole list. 4. Simplify plugin state and caching.	2026-03-18 10:11:43 -07:00
Colin Young	0d2ff40a58	Add auth env observability (#14905 ) CXC-410 Emit Env Var Status with `/feedback` report Add more observability on top of #14611 [Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream) [Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream) <img width="1063" height="610" alt="image" src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e" /> ###### Summary - Adds auth-env telemetry that records whether key auth-related env overrides were present on session start and request paths. - Threads those auth-env fields through `/responses`, websocket, and `/models` telemetry and feedback metadata. - Buckets custom provider `env_key` configuration to a safe `"configured"` value instead of emitting raw config text. - Keeps the slice observability-only: no raw token values or raw URLs are emitted. ###### Rationale (from spec findings) - 401 and auth-path debugging needs a way to distinguish env-driven auth paths from sessions with no auth env override. - Startup and model-refresh failures need the same auth-env diagnostics as normal request failures. - Feedback and Sentry tags need the same auth-env signal as OTel events so reports can be triaged consistently. - Custom provider config is user-controlled text, so the telemetry contract must stay presence-only / bucketed. ###### Scope - Adds a small `AuthEnvTelemetry` bundle for env presence collection and threads it through the main request/session telemetry paths. - Does not add endpoint/base-url/provider-header/geo routing attribution or broader telemetry API redesign. ###### Trade-offs - `provider_env_key_name` is bucketed to `"configured"` instead of preserving the literal configured env var name. - `/models` is included because startup/model-refresh auth failures need the same diagnostics, but broader parity work remains out of scope. - This slice keeps the existing telemetry APIs and layers auth-env fields onto them rather than redesigning the metadata model. ###### Client follow-up - Add the separate endpoint/base-url attribution slice if routing-source diagnosis is still needed. - Add provider-header or residency attribution only if auth-env presence proves insufficient in real reports. - Revisit whether any additional auth-related env inputs need safe bucketing after more 401 triage data. ###### Testing - `cargo test -p codex-core emit_feedback_request_tags -- --nocapture` - `cargo test -p codex-core collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture` - `cargo test -p codex-core models_request_telemetry_emits_auth_env_feedback_tags_on_failure -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability -- --nocapture` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability -- --nocapture` - `cargo test -p codex-core --no-run --message-format short` - `cargo test -p codex-otel --no-run --message-format short` --------- Co-authored-by: Codex <noreply@openai.com>	2026-03-17 14:26:27 -07:00
Owen Lin	6ea041032b	fix(core): prevent hanging turn/start due to websocket warming issues (#14838 ) ## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ```	2026-03-17 10:07:46 -07:00
Michael Bolin	b77fe8fefe	Apply argument comment lint across codex-rs (#14652 ) ## Why Once the repo-local lint exists, `codex-rs` needs to follow the checked-in convention and CI needs to keep it from drifting. This commit applies the fallback `/param/` style consistently across existing positional literal call sites without changing those APIs. The longer-term preference is still to avoid APIs that require comments by choosing clearer parameter types and call shapes. This PR is intentionally the mechanical follow-through for the places where the existing signatures stay in place. After rebasing onto newer `main`, the rollout also had to cover newly introduced `tui_app_server` call sites. That made it clear the first cut of the CI job was too expensive for the common path: it was spending almost as much time installing `cargo-dylint` and re-testing the lint crate as a representative test job spends running product tests. The CI update keeps the full workspace enforcement but trims that extra overhead from ordinary `codex-rs` PRs. ## What changed - keep a dedicated `argument_comment_lint` job in `rust-ci` - mechanically annotate remaining opaque positional literals across `codex-rs` with exact `/param/` comments, including the rebased `tui_app_server` call sites that now fall under the lint - keep the checked-in style aligned with the lint policy by using `/param/` and leaving string and char literals uncommented - cache `cargo-dylint`, `dylint-link`, and the relevant Cargo registry/git metadata in the lint job - split changed-path detection so the lint crate's own `cargo test` step runs only when `tools/argument-comment-lint/` or `rust-ci.yml` changes - continue to run the repo wrapper over the `codex-rs` workspace, so product-code enforcement is unchanged Most of the code changes in this commit are intentionally mechanical comment rewrites or insertions driven by the lint itself. ## Verification - `./tools/argument-comment-lint/run.sh --workspace` - `cargo test -p codex-tui-app-server -p codex-tui` - parsed `.github/workflows/rust-ci.yml` locally with PyYAML --- -> #14652 * #14651	2026-03-16 16:48:15 -07:00
Colin Young	d692b74007	Add auth 401 observability to client bug reports (#14611 ) CXC-392 [With 401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream) <img width="1909" height="555" alt="401 auth tags in Sentry" src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3" /> - auth_401_: preserved facts from the latest unauthorized response snapshot - auth_: latest auth-related facts from the latest request attempt - auth_recovery_: unauthorized recovery state and follow-up result Without 401 <img width="1917" height="522" alt="happy-path auth tags in Sentry" src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f" /> ###### Summary - Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation. - Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior. - Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry. - Preserve original unauthorized evidence under `auth_401_` while keeping follow-up result tags separate. ###### Rationale (from spec findings) - The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth. - Client uploads needed to show whether unauthorized recovery ran and what the client tried next. - Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate. - The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful. ###### Scope - Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing. - Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR. ###### Trade-offs - This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies. - This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects. - Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket. ###### Client follow-up - PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing. - A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence. - `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path. ###### Testing - `cargo test -p codex-core refresh_available_models_sorts_by_priority` - `cargo test -p codex-core emit_feedback_request_tags_` - `cargo test -p codex-core emit_feedback_auth_recovery_tags_` - `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase` - `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers` - `cargo test -p codex-core identity_auth_details` - `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details` - `cargo test -p codex-core --all-features --no-run` - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability` - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`	2026-03-14 15:38:51 -07:00
Anton Panasenko	77b0c75267	feat: search_tool migrate to bring you own tool of Responses API (#14274 ) ## Why to support a new bring your own search tool in Responses API(https://developers.openai.com/api/docs/guides/tools-tool-search#client-executed-tool-search) we migrating our bm25 search tool to use official way to execute search on client and communicate additional tools to the model. ## What - replace the legacy `search_tool_bm25` flow with client-executed `tool_search` - add protocol, SSE, history, and normalization support for `tool_search_call` and `tool_search_output` - return namespaced Codex Apps search results and wire namespaced follow-up tool calls back into MCP dispatch	2026-03-11 17:51:51 -07:00
viyatb-oai	52a3bde6cc	feat(core): emit turn metric for network proxy state (#14250 ) ## Summary - add a per-turn `codex.turn.network_proxy` metric constant - emit the metric from turn completion using the live managed proxy enabled state - add focused tests for active and inactive tag emission	2026-03-11 12:33:10 -07:00
Owen Lin	fa1242c83b	fix(otel): make HTTP trace export survive app-server runtimes (#14300 ) ## Summary This PR fixes OTLP HTTP trace export in runtimes where the previous exporter setup was unreliable, especially around app-server usage. It also removes the old `codex_otel::otel_provider` compatibility shim and switches remaining call sites over to the crate-root `codex_otel::OtelProvider` export. ## What changed - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes. - Add an async HTTP client path for trace export when we are already inside a multi-thread Tokio runtime. - Make provider shutdown flush traces before tearing down the tracer provider. - Add loopback coverage that verifies traces are actually sent to `/v1/traces`: - outside Tokio - inside a multi-thread Tokio runtime - inside a current-thread Tokio runtime - Remove the `codex_otel::otel_provider` shim and update remaining imports. ## Why I hit cases where spans were being created correctly but never made it to the collector. The issue turned out to be in exporter/runtime behavior rather than the span plumbing itself. This PR narrows that gap and gives us regression coverage for the actual export path.	2026-03-11 12:33:10 -07:00
Owen Lin	da991bdf3a	feat(otel): Centralize OTEL metric names and shared tag builders (#14117 ) This cleans up a bunch of metric plumbing that had started to drift. The main change is making `codex-otel` the canonical home for shared metric definitions and metric tag helpers. I moved the `turn/thread` metric names that were still duplicated into the OTEL metric registry, added a shared `metrics::tags` module for common tag keys and session tag construction, and updated `SessionTelemetry` to build its metadata tags through that shared path. On the codex-core side, TTFT/TTFM now use the shared metric-name constants instead of local string definitions. I also switched the obvious remaining turn/thread metric callsites over to the shared constants, and added a small helper so TTFT/TTFM can attach an optional sanitized client.name tag from TurnContext. This should make follow-on telemetry work less ad hoc: - one canonical place for metric names - one canonical place for common metric tag keys/builders - less duplication between `codex-core` and `codex-otel`	2026-03-09 12:46:42 -07:00
Owen Lin	289ed549cf	chore(otel): rename OtelManager to SessionTelemetry (#13808 ) ## Summary This is a purely mechanical refactor of `OtelManager` -> `SessionTelemetry` to better convey what the struct is doing. No behavior change. ## Why `OtelManager` ended up sounding much broader than what this type actually does. It doesn't manage OTEL globally; it's the session-scoped telemetry surface for emitting log/trace events and recording metrics with consistent session metadata (`app_version`, `model`, `slug`, `originator`, etc.). `SessionTelemetry` is a more accurate name, and updating the call sites makes that boundary a lot easier to follow. ## Validation - `just fmt` - `cargo test -p codex-otel` - `cargo test -p codex-core`	2026-03-06 16:23:30 -08:00
Owen Lin	dd4a5216c9	chore(otel): reorganize codex-otel crate (#13800 ) ## Summary This is a structural cleanup of `codex-otel` to make the ownership boundaries a lot clearer. For example, previously it was quite confusing that `OtelManager` which emits log + trace event telemetry lived under `codex-rs/otel/src/traces/`. Also, there were two places that defined methods on OtelManager via `impl OtelManager` (`lib.rs` and `otel_manager.rs`). What changed: - move the `OtelProvider` implementation into `src/provider.rs` - move `OtelManager` and session-scoped event emission into `src/events/otel_manager.rs` - collapse the shared log/trace event helpers into `src/events/shared.rs` - pull target classification into `src/targets.rs` - move `traceparent_context_from_env()` into `src/trace_context.rs` - keep `src/otel_provider.rs` as a compatibility shim for existing imports - update the `codex-otel` README to reflect the new layout ## Why `lib.rs` and `otel_provider.rs` were doing too many different jobs at once: provider setup, export routing, trace-context helpers, and session event emission all lived together. This refactor separates those concerns without trying to change the behavior of the crate. The goal is to make future OTEL work easier to reason about and easier to review. ## Notes - no intended behavior change - `OtelManager` remains the session-scoped event emitter in this PR - the `otel_provider` shim keeps downstream churn low while the internals move around ## Validation - `just fmt` - `cargo test -p codex-otel` - `just fix -p codex-otel`	2026-03-06 14:58:18 -08:00
Owen Lin	3449e00bc9	feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630 ) ### Summary This adds turn-level latency metrics for the first model output and the first completed agent message. - `codex.turn.ttft.duration_ms` starts at turn start and records on the first output signal we see from the model. That includes normal assistant text, reasoning deltas, and non-text outputs like tool-call items. - `codex.turn.ttfm.duration_ms` also starts at turn start, but it records when the first agent message finishes streaming rather than when its first delta arrives. ### Implementation notes The timing is tracked in codex-core, not app-server, so the definition stays consistent across CLI, TUI, and app-server clients. I reused the existing turn lifecycle boundary that already drives `codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn state, and record each metric once per turn. I also wired the new metric names into the OTEL runtime metrics summary so they show up in the same in-memory/debug snapshot path as the existing timing metrics.	2026-03-06 10:23:48 -08:00
Owen Lin	c3736cff0a	feat(otel): safe tracing (#13626 ) ### Motivation Today config.toml has three different OTEL knobs under `[otel]`: - `exporter` controls where OTEL logs go - `trace_exporter` controls where OTEL traces go - `metrics_exporter` controls where metrics go Those often (pretty much always?) serve different purposes. For example, for OpenAI internal usage, the log exporter is already being used for IT/security telemetry, and that use case is intentionally content-rich: tool calls, arguments, outputs, MCP payloads, and in some cases user content are all useful there. `log_user_prompt` is a good example of that distinction. When it’s enabled, we include raw prompt text in OTEL logs, which is acceptable for the security use case. The trace exporter is a different story. The goal there is to give OpenAI engineers visibility into latency and request behavior when they run Codex locally, without sending sensitive prompt or tool data as trace event data. In other words, traces should help answer “what was slow?” or “where did time go?”, not “what did the user say?” or “what did the tool return?” The complication is that Rust’s `tracing` crate does not make a hard distinction between “logs” and “trace events.” It gives us one instrumentation API for logs and trace events (via `tracing::event!`), and subscribers decide what gets treated as logs, trace events, or both. Before this change, our OTEL trace layer was effectively attached to the general tracing stream, which meant turning on `trace_exporter` could pick up content-rich events that were originally written with logging (and the `log_exporter`) in mind. That made it too easy for sensitive data to end up in exported traces by accident. ### Concrete example In `otel_manager.rs`, this `tracing::event!` call would be exported in both logs AND traces (as a trace event). ``` pub fn user_prompt(&self, items: &[UserInput]) { let prompt = items .iter() .flat_map(\|item\| match item { UserInput::Text { text, .. } => Some(text.as_str()), _ => None, }) .collect::<String>(); let prompt_to_log = if self.metadata.log_user_prompts { prompt.as_str() } else { "[REDACTED]" }; tracing::event!( tracing::Level::INFO, event.name = "codex.user_prompt", event.timestamp = %timestamp(), // ... prompt = %prompt_to_log, ); } ``` Instead of `tracing::event!`, we should now be using `log_event!` and `trace_event!` instead to more clearly indicate which sink (logs vs. traces) that event should be exported to. ### What changed This PR makes the log and trace export distinct instead of treating them as two sinks for the same data. On the provider side, OTEL logs and traces now have separate routing/filtering policy. The log exporter keeps receiving the existing `codex_otel` events, while trace export is limited to spans and trace events. On the event side, `OtelManager` now emits two flavors of telemetry where needed: - a log-only event with the current rich payloads - a tracing-safe event with summaries only It also has a convenience `log_and_trace_event!` macro for emitting to both logs and traces when it's safe to do so, as well as log- and trace-specific fields. That means prompts, tool args, tool output, account email, MCP metadata, and similar content stay in the log lane, while traces get the pieces that are actually useful for performance work: durations, counts, sizes, status, token counts, tool origin, and normalized error classes. This preserves current IT/security logging behavior while making it safe to turn on trace export for employees. ### Full list of things removed from trace export - raw user prompt text from `codex.user_prompt` - raw tool arguments and output from `codex.tool_result` - MCP server metadata from `codex.tool_result` (mcp_server, mcp_server_origin) - account identity fields like `user.email` and `user.account_id` from trace-safe OTEL events - `host.name` from trace resources - generic `codex.tool_decision` events from traces - generic `codex.sse_event` events from traces - the full ToolCall debug payload from the `handle_tool_call` span What traces now keep instead is mostly: - spans - trace-safe OTEL events - counts, lengths, durations, status, token counts, and tool origin summaries	2026-03-05 16:30:53 -08:00
Owen Lin	aa3fe8abf8	feat(core): persist trace_id for turns in RolloutItem::TurnContext (#13602 ) This PR adds a durable trace linkage for each turn by storing the active trace ID on the rollout TurnContext record stored in session rollout files. Before this change, we propagated trace context at runtime but didn’t persist a stable per-turn trace key in rollout history. That made after-the-fact debugging harder (for example, mapping a historical turn to the corresponding trace in datadog). This sets us up for much easier debugging in the future. ### What changed - Added an optional `trace_id` to TurnContextItem (rollout schema). - Added a small OTEL helper to read the current span trace ID. - Captured `trace_id` when creating `TurnContext` and included it in `to_turn_context_item()`. - Updated tests and fixtures that construct TurnContextItem so older/no-trace cases still work. ### Why this approach TurnContext is already the canonical durable per-turn metadata in rollout. This keeps ownership clean: trace linkage lives with other persisted turn metadata.	2026-03-05 13:26:48 -08:00
Won Park	fa2306b303	image-gen-core (#13290 ) Core tool-calling for image-gen, handles requesting and receiving logic for images using response API	2026-03-03 23:11:28 -08:00
Owen Lin	52521a5e40	feat(app-server): propagate app-server trace context into core (#13368 ) ### Summary Propagate trace context originating at app-server RPC method handlers -> codex core submission loop (so this includes spans such as `run_turn`!). This implements PR 2 of the app-server tracing rollout. This also removes the old lower-level env-based reparenting in core so explicit request/submission ancestry wins instead of being overridden by ambient `TRACEPARENT` state. ### What changed - Added `trace: Option<W3cTraceContext>` to codex_protocol::Submission - Taught `Codex::submit()` / `submit_with_id()` to automatically capture the current span context when constructing or forwarding a submission - Wrapped the core submission loop in a submission_dispatch span parented from Submission.trace - Warn on invalid submission trace carriers and ignore them cleanly - Removed the old env-based downstream reparenting path in core task execution - Stopped OTEL provider init from implicitly attaching env trace context process-wide - Updated mcp-server Submission call sites for the new field Added focused unit tests for: - capturing trace context into Submission - preferring `Submission.trace` when building the core dispatch span ### Why PR 1 gave us consistent inbound request spans in app-server, but that only covered the transport boundary. For long-running work like turns and reviews, the important missing piece was preserving ancestry after the request handler returns and core continues work on a different async path. This change makes that handoff explicit and keeps the parentage rules simple: - app-server request span sets the current context - `Submission.trace` snapshots that context - core restores it once, at the submission boundary - deeper core spans inherit naturally That also lets us stop relying on env-based reparenting for this path, which was too ambient and could override explicit ancestry.	2026-03-04 01:03:45 +00:00
Owen Lin	d473e8d56d	feat(app-server): add tracing to all app-server APIs (#13285 ) ### Overview This PR adds the first piece of tracing for app-server JSON-RPC requests. There are two main changes: - JSON-RPC requests can now take an optional W3C trace context at the top level via a `trace` field (`traceparent` / `tracestate`). - app-server now creates a dedicated request span for every inbound JSON-RPC request in `MessageProcessor`, and uses the request-level trace context as the parent when present. For compatibility with existing flows, app-server still falls back to the TRACEPARENT env var when there is no request-level traceparent. This PR is intentionally scoped to the app-server boundary. In a followup, we'll actually propagate trace context through the async handoff into core execution spans like run_turn, which will make app-server traces much more useful. ### Spans A few details on the app-server span shape: - each inbound request gets its own server span - span/resource names are based on the JSON-RPC method (`initialize`, `thread/start`, `turn/start`, etc.) - spans record transport (stdio vs websocket), request id, connection id, and client name/version when available - `initialize` stores client metadata in session state so later requests on the same connection can reuse it	2026-03-02 16:01:41 -08:00
mcgrew-oai	bccce0d75f	otel: add host.name resource attribute to logs/traces via gethostname (#12352 ) PR Summary This PR adds the OpenTelemetry `host.name` resource attribute to Codex OTEL exports so every OTEL log (and trace, via the shared resource) carries the machine hostname. What changed - Added `host.name` to the shared OTEL `Resource` in `/Users/michael.mcgrew/code/codex/codex-rs/otel/src/otel_provider.rs` - This applies to both: - OTEL logs (`SdkLoggerProvider`) - OTEL traces (`SdkTracerProvider`) - Hostname is now resolved via `gethostname::gethostname()` (best-effort) - Value is trimmed - Empty values are omitted (non-fatal) - Added focused unit tests for: - including `host.name` when present - omitting `host.name` when missing/empty Why - `host.name` is host/process metadata and belongs on the OTEL `resource`, not per-event attributes. - Attaching it in the shared resource is the smallest change that guarantees coverage across all exported OTEL logs/traces. Scope / Non-goals - No public API changes - No changes to metrics behavior (this PR only updates log/trace resource metadata) Dependency updates - Added `gethostname` as a workspace dependency and `codex-otel` dependency - `Cargo.lock` updated accordingly - `MODULE.bazel.lock` unchanged after refresh/check Validation - `just fmt` - `cargo test -p codex-otel` - `just bazel-lock-update` - `just bazel-lock-check`	2026-02-25 09:54:45 -05:00
jif-oai	10c04e11b8	feat: add service name to app-server (#12319 ) Add service name to the app-server so that the app can use it's own service name This is on thread level because later we might plan the app-server to become a singleton on the computer	2026-02-25 09:51:42 +00:00
colby-oai	2036a5f5e0	Add MCP server context to otel tool_result logs (#12267 ) Summary - capture the origin for each configured MCP server and expose it via the connection manager - plumb MCP server name/origin into tool logging and emit codex.tool_result events with those fields - add unit coverage for origin parsing and extend OTEL tests to assert empty MCP fields for non-MCP tools - currently not logging full urls or url paths to prevent logging potentially sensitive data Testing - Not run (not requested)	2026-02-20 10:26:19 -05:00
Fouad Matin	02e9006547	add(core): safety check downgrade warning (#11964 ) Add per-turn notice when a request is downgraded to a fallback model due to cyber safety checks. Changes - codex-api: Emit a ServerModel event based on the openai-model response header and/or response payload (SSE + WebSocket), including when the model changes mid-stream. - core: When the server-reported model differs from the requested model, emit a single per-turn warning explaining the reroute to gpt-5.2 and directing users to Trusted Access verification and the cyber safety explainer. - app-server (v2): Surface these cyber model-routing warnings as synthetic userMessage items with text prefixed by Warning: (and document this behavior).	2026-02-16 22:13:36 -08:00
alexsong-oai	373f5467ef	Add originator to otel metadata tags (#11232 )	2026-02-09 14:29:19 -08:00
alexsong-oai	daeef06bec	add originator to otel (#10826 )	2026-02-06 15:13:56 -08:00
Anton Panasenko	4ee039744e	feat: expose detailed metrics to runtime metrics (#10699 )	2026-02-05 18:22:30 -08:00
iceweasel-oai	901d5b8fd6	add sandbox policy and sandbox name to codex.tool.call metrics (#10711 ) This will give visibility into the comparative success rate of the Windows sandbox implementations compared to other platforms.	2026-02-05 11:42:12 -08:00
Owen Lin	3582b74d01	fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423 ) So that the rest of the codebase (like TUI) don't need to be concerned whether ChatGPT auth was handled by Codex itself or passed in via app-server's external auth mode.	2026-02-05 10:46:06 -08:00
iceweasel-oai	f2ffc4e5d0	Include real OS info in metrics. (#10425 ) calculated a hashed user ID from either auth user id or API key Also correctly populates OS. These will make our metrics more useful and powerful for analysis.	2026-02-05 06:30:31 -08:00
Anton Panasenko	fcaed4cb88	feat: log webscocket timing into runtime metrics (#10577 )	2026-02-03 18:04:07 -08:00
Anton Panasenko	101d359cd7	Add websocket telemetry metrics and labels (#10316 ) Summary - expose websocket telemetry hooks through the responses client so request durations and event processing can be reported - record websocket request/event metrics and emit runtime telemetry events that the history UI now surfaces - improve tests to cover websocket telemetry reporting and guard runtime summary updates <img width="824" height="79" alt="Screenshot 2026-01-31 at 5 28 12 PM" src="https://github.com/user-attachments/assets/ea9a7965-d8b4-4e3c-a984-ef4fdc44c81d" />	2026-01-31 19:16:44 -08:00
Anton Panasenko	8660ad6c64	feat: show runtime metrics in console (#10278 ) Summary of changes: - Adds a new feature flag: runtime_metrics - Declared in core/src/features.rs - Added to core/config.schema.json - Wired into OTEL init in core/src/otel_init.rs - Enables on-demand runtime metric snapshots in OTEL - Adds runtime_metrics: bool to otel/src/config.rs - Enables experimental custom reader features in otel/Cargo.toml - Adds snapshot/reset/summary APIs in: - otel/src/lib.rs - otel/src/metrics/client.rs - otel/src/metrics/config.rs - otel/src/metrics/error.rs - Defines metric names and a runtime summary builder - New files: - otel/src/metrics/names.rs - otel/src/metrics/runtime_metrics.rs - Summarizes totals for: - Tool calls - API requests - SSE/streaming events - Instruments metrics collection in OTEL manager - otel/src/traces/otel_manager.rs now records: - API call counts + durations - SSE event counts + durations (success/failure) - Tool call metrics now use shared constants - Surfaces runtime metrics in the TUI - Resets runtime metrics at turn start in tui/src/chatwidget.rs - Displays metrics in the final separator line in tui/src/history_cell.rs - Adds tests - New OTEL tests: - otel/tests/suite/snapshot.rs - otel/tests/suite/runtime_summary.rs - New TUI test: - final_message_separator_includes_runtime_metrics in tui/src/history_cell.rs Scope: - 19 files changed - ~652 insertions, 38 deletions <img width="922" height="169" alt="Screenshot 2026-01-30 at 4 11 34 PM" src="https://github.com/user-attachments/assets/1efd754d-a16d-4564-83a5-f4442fd2f998" />	2026-01-30 22:20:02 -08:00
jif-oai	129787493f	feat: backfill timing metric (#10218 ) 1. Add a metric to measure the backfill time 2. Add a unit to the timing histogram	2026-01-30 10:19:41 +01:00
jif-oai	3878c3dc7c	feat: sqlite 1 (#10004 ) Add a `.sqlite` database to be used to store rollout metatdata (and later logs) This PR is phase 1: * Add the database and the required infrastructure * Add a backfill of the database * Persist the newly created rollout both in files and in the DB * When we need to get metadata or a rollout, consider the `JSONL` as the source of truth but compare the results with the DB and show any errors	2026-01-28 15:29:14 +01:00
alexsong-oai	0fa45fbca4	feat: add session source as otel metadata tag (#9720 ) Add session.source and user.account_id as global OTEL metric tags to identify client surface and user.	2026-01-22 18:46:14 -08:00
pakrym-oai	4d48d4e0c2	Revert "feat: support proxy for ws connection" (#9693 ) Reverts openai/codex#9409	2026-01-22 15:57:18 +00:00
Anton Panasenko	7b27aa7707	feat: support proxy for ws connection (#9409 ) unfortunately tokio-tungstenite doesn't support proxy configuration outbox, while https://github.com/snapview/tokio-tungstenite/pull/370 is in review, we can depend on source code for now.	2026-01-20 09:36:30 -08:00
jif-oai	6bbf506120	feat: metrics on remote models (#9528 )	2026-01-20 13:02:55 +00:00
jif-oai	a3a97f3ea9	feat: record timer with additional tags (#9529 )	2026-01-20 13:01:55 +00:00
Ahmed Ibrahim	b11e96fb04	Act on reasoning-included per turn (#9402 ) - Reset reasoning-included flag each turn and update compaction test	2026-01-19 11:23:25 -08:00
jif-oai	7ebe13f692	feat: timer total turn metrics (#9382 )	2026-01-19 10:44:31 +00:00
jif-oai	e650d4b02c	feat: tool call duration metric (#9364 )	2026-01-16 18:33:14 +01:00
charley-oai	4a9c2bcc5a	Add text element metadata to types (#9235 ) Initial type tweaking PR to make the diff of https://github.com/openai/codex/pull/9116 smaller This should not change any behavior, just adds some fields to types	2026-01-14 16:41:50 -08:00
Anton Panasenko	51d75bb80a	fix: drop session span at end of the session (#9126 )	2026-01-13 11:36:00 -08:00
zbarsky-openai	2a06d64bc9	feat: add support for building with Bazel (#8875 ) This PR configures Codex CLI so it can be built with [Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc` includes configuration so that remote builds can be done using [BuildBuddy](https://www.buildbuddy.io). If you are familiar with Bazel, things should work as you expect, e.g., run `bazel test //... --keep-going` to run all the tests in the repo, but we have also added some new aliases in the `justfile` for convenience: - `just bazel-test` to run tests locally - `just bazel-remote-test` to run tests remotely (currently, the remote build is for x86_64 Linux regardless of your host platform). Note we are currently seeing the following test failures in the remote build, so we still need to figure out what is happening here: ``` failures: suite::compact::manual_compact_twice_preserves_latest_user_messages suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view ``` - `just build-for-release` to build release binaries for all platforms/architectures remotely To setup remote execution: - [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI employees should also request org access at https://openai.buildbuddy.io/join/ with their `@openai.com` email address.) - [Copy your API key](https://app.buildbuddy.io/docs/setup/) to `~/.bazelrc` (add the line `build --remote_header=x-buildbuddy-api-key=YOUR_KEY`) - Use `--config=remote` in your `bazel` invocations (or add `common --config=remote` to your `~/.bazelrc`, or use the `just` commands) ## CI In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners (we are working on supporting Windows, but that is not ready yet). Note that the failures we are seeing in `just bazel-remote-test` do not occur on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml` is green right now. The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so that macOS CI jobs build _remotely_ on Linux hosts (using the `docker://docker.io/mbolin491/codex-bazel` Docker image declared in the root `BUILD.bazel`) using cross-compilation to build the macOS artifacts. Then these artifacts are downloaded locally to GitHub's macOS runner so the tests can be executed natively. This is the relevant config that enables this: ``` common:macos --config=remote common:macos --strategy=remote common:macos --strategy=TestRunner=darwin-sandbox,local ``` Because of the remote caching benefits we get from BuildBuddy, these new CI jobs can be extremely fast! For example, consider these two jobs that ran all the tests on Linux x86_64: - Bazel 1m37s https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875 - Cargo 9m20s https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875 For now, we will continue to run both the Bazel and Cargo jobs for PRs, but once we add support for Windows and running Clippy, we should be able to cutover to using Bazel exclusively for PRs, which should still speed things up considerably. We will probably continue to run the Cargo jobs post-merge for commits that land on `main` as a sanity check. Release builds will also continue to be done by Cargo for now. Earlier attempt at this PR: https://github.com/openai/codex/pull/8832 Earlier attempt to add support for Buck2, now abandoned: https://github.com/openai/codex/pull/8504 --------- Co-authored-by: David Zbarsky <dzbarsky@gmail.com> Co-authored-by: Michael Bolin <mbolin@openai.com>	2026-01-09 11:09:43 -08:00
jif-oai	bc92dc5cf0	chore: update metrics temporality (#8901 )	2026-01-09 14:57:42 +00:00
jif-oai	7e5b3e069e	chore: metrics tool call (#8975 )	2026-01-09 13:28:43 +00:00
jif-oai	16c66c37eb	chore: move otel provider outside of trace module (#8968 )	2026-01-09 12:42:54 +00:00

1 2

75 commits