Commit graph

20 commits

Author SHA1 Message Date
Colin Young
0d2ff40a58
Add auth env observability (#14905)
CXC-410 Emit Env Var Status with `/feedback` report

Add more observability on top of #14611 

[Unset](https://openai.sentry.io/issues/7340419168/?project=4510195390611458&query=019cfa8d-c1ba-7002-96fa-e35fc340551d&referrer=issue-stream)

[Set](https://openai.sentry.io/issues/7340426331/?project=4510195390611458&query=019cfa91-aba1-7823-ab7e-762edfbc0ed4&referrer=issue-stream)
<img width="1063" height="610" alt="image"
src="https://github.com/user-attachments/assets/937ab026-1c2d-4757-81d5-5f31b853113e"
/>


###### Summary
- Adds auth-env telemetry that records whether key auth-related env
overrides were present on session start and request paths.
- Threads those auth-env fields through `/responses`, websocket, and
`/models` telemetry and feedback metadata.
- Buckets custom provider `env_key` configuration to a safe
`"configured"` value instead of emitting raw config text.
- Keeps the slice observability-only: no raw token values or raw URLs
are emitted.

###### Rationale (from spec findings)
- 401 and auth-path debugging needs a way to distinguish env-driven auth
paths from sessions with no auth env override.
- Startup and model-refresh failures need the same auth-env diagnostics
as normal request failures.
- Feedback and Sentry tags need the same auth-env signal as OTel events
so reports can be triaged consistently.
- Custom provider config is user-controlled text, so the telemetry
contract must stay presence-only / bucketed.

###### Scope
- Adds a small `AuthEnvTelemetry` bundle for env presence collection and
threads it through the main request/session telemetry paths.
- Does not add endpoint/base-url/provider-header/geo routing attribution
or broader telemetry API redesign.

###### Trade-offs
- `provider_env_key_name` is bucketed to `"configured"` instead of
preserving the literal configured env var name.
- `/models` is included because startup/model-refresh auth failures need
the same diagnostics, but broader parity work remains out of scope.
- This slice keeps the existing telemetry APIs and layers auth-env
fields onto them rather than redesigning the metadata model.

###### Client follow-up
- Add the separate endpoint/base-url attribution slice if routing-source
diagnosis is still needed.
- Add provider-header or residency attribution only if auth-env presence
proves insufficient in real reports.
- Revisit whether any additional auth-related env inputs need safe
bucketing after more 401 triage data.

###### Testing
- `cargo test -p codex-core emit_feedback_request_tags -- --nocapture`
- `cargo test -p codex-core
collect_auth_env_telemetry_buckets_provider_env_key_name -- --nocapture`
- `cargo test -p codex-core
models_request_telemetry_emits_auth_env_feedback_tags_on_failure --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_api_request_auth_observability --
--nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_connect_auth_observability
-- --nocapture`
- `cargo test -p codex-otel
otel_export_routing_policy_routes_websocket_request_transport_observability
-- --nocapture`
- `cargo test -p codex-core --no-run --message-format short`
- `cargo test -p codex-otel --no-run --message-format short`

---------

Co-authored-by: Codex <noreply@openai.com>
2026-03-17 14:26:27 -07:00
Colin Young
d692b74007
Add auth 401 observability to client bug reports (#14611)
CXC-392

  [With
  401](https://openai.sentry.io/issues/7333870443/?project=4510195390611458&query=019ce8f8-560c-7f10-a00a-c59553740674&referrer=issue-stream)
  <img width="1909" height="555" alt="401 auth tags in Sentry"
  src="https://github.com/user-attachments/assets/412ea950-61c4-4780-9697-15c270971ee3"
  />


  - auth_401_*: preserved facts from the latest unauthorized response snapshot
  - auth_*: latest auth-related facts from the latest request attempt
  - auth_recovery_*: unauthorized recovery state and follow-up result


  Without 401
  <img width="1917" height="522" alt="happy-path auth tags in Sentry"
  src="https://github.com/user-attachments/assets/3381ed28-8022-43b0-b6c0-623a630e679f"
  />

  ###### Summary
  - Add client-visible 401 diagnostics for auth attachment, upstream auth classification, and 401 request id / cf-ray correlation.
  - Record unauthorized recovery mode, phase, outcome, and retry/follow-up status without changing auth behavior.
  - Surface the highest-signal auth and recovery fields on uploaded client bug reports so they are usable in Sentry.
  - Preserve original unauthorized evidence under `auth_401_*` while keeping follow-up result tags separate.

  ###### Rationale (from spec findings)
  - The dominant bucket needed proof of whether the client attached auth before send or upstream still classified the request as missing auth.
  - Client uploads needed to show whether unauthorized recovery ran and what the client tried next.
  - Request id and cf-ray needed to be preserved on the unauthorized response so server-side correlation is immediate.
  - The bug-report path needed the same auth evidence as the request telemetry path, otherwise the observability would not be operationally useful.

  ###### Scope
  - Add auth 401 and unauthorized-recovery observability in `codex-rs/core`, `codex-rs/codex-api`, and `codex-rs/otel`, including feedback-tag surfacing.
  - Keep auth semantics, refresh behavior, retry behavior, endpoint classification, and geo-denial follow-up work out of this PR.

  ###### Trade-offs
  - This exports only safe auth evidence: header presence/name, upstream auth classification, request ids, and recovery state. It does not export token values or raw upstream bodies.
  - This keeps websocket connection reuse as a transport clue because it can help distinguish stale reused sessions from fresh reconnects.
  - Misroute/base-url classification and geo-denial are intentionally deferred to a separate follow-up PR so this review stays focused on the dominant auth 401 bucket.

  ###### Client follow-up
  - PR 2 will add misroute/provider and geo-denial observability plus the matching feedback-tag surfacing.
  - A separate host/app-server PR should log auth-decision inputs so pre-send host auth state can be correlated with client request evidence.
  - `device_id` remains intentionally separate until there is a safe existing source on the feedback upload path.

  ###### Testing
  - `cargo test -p codex-core refresh_available_models_sorts_by_priority`
  - `cargo test -p codex-core emit_feedback_request_tags_`
  - `cargo test -p codex-core emit_feedback_auth_recovery_tags_`
  - `cargo test -p codex-core auth_request_telemetry_context_tracks_attached_auth_and_retry_phase`
  - `cargo test -p codex-core extract_response_debug_context_decodes_identity_headers`
  - `cargo test -p codex-core identity_auth_details`
  - `cargo test -p codex-core telemetry_error_messages_preserve_non_http_details`
  - `cargo test -p codex-core --all-features --no-run`
  - `cargo test -p codex-otel otel_export_routing_policy_routes_api_request_auth_observability`
  - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_connect_auth_observability`
  - `cargo test -p codex-otel otel_export_routing_policy_routes_websocket_request_transport_observability`
2026-03-14 15:38:51 -07:00
Owen Lin
fa1242c83b fix(otel): make HTTP trace export survive app-server runtimes (#14300)
## Summary

This PR fixes OTLP HTTP trace export in runtimes where the previous
exporter setup was unreliable, especially around app-server usage. It
also removes the old `codex_otel::otel_provider` compatibility shim and
switches remaining call sites over to the crate-root
`codex_otel::OtelProvider` export.

## What changed

- Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
- Add an async HTTP client path for trace export when we are already
inside a multi-thread Tokio runtime.
- Make provider shutdown flush traces before tearing down the tracer
provider.
- Add loopback coverage that verifies traces are actually sent to
`/v1/traces`:
  - outside Tokio
  - inside a multi-thread Tokio runtime
  - inside a current-thread Tokio runtime
- Remove the `codex_otel::otel_provider` shim and update remaining
imports.

## Why

I hit cases where spans were being created correctly but never made it
to the collector. The issue turned out to be in exporter/runtime
behavior rather than the span plumbing itself. This PR narrows that gap
and gives us regression coverage for the actual export path.
2026-03-11 12:33:10 -07:00
Owen Lin
289ed549cf
chore(otel): rename OtelManager to SessionTelemetry (#13808)
## Summary
This is a purely mechanical refactor of `OtelManager` ->
`SessionTelemetry` to better convey what the struct is doing. No
behavior change.

## Why

`OtelManager` ended up sounding much broader than what this type
actually does. It doesn't manage OTEL globally; it's the session-scoped
telemetry surface for emitting log/trace events and recording metrics
with consistent session metadata (`app_version`, `model`, `slug`,
`originator`, etc.).

`SessionTelemetry` is a more accurate name, and updating the call sites
makes that boundary a lot easier to follow.

## Validation

- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core`
2026-03-06 16:23:30 -08:00
Owen Lin
3449e00bc9
feat(otel, core): record turn TTFT and TTFM metrics in codex-core (#13630)
### Summary
This adds turn-level latency metrics for the first model output and the
first completed agent message.
- `codex.turn.ttft.duration_ms` starts at turn start and records on the
first output signal we see from the model. That includes normal
assistant text, reasoning deltas, and non-text outputs like tool-call
items.
- `codex.turn.ttfm.duration_ms` also starts at turn start, but it
records when the first agent message finishes streaming rather than when
its first delta arrives.

### Implementation notes
The timing is tracked in codex-core, not app-server, so the definition
stays consistent across CLI, TUI, and app-server clients.

I reused the existing turn lifecycle boundary that already drives
`codex.turn.e2e_duration_ms`, stored the turn start timestamp in turn
state, and record each metric once per turn.

I also wired the new metric names into the OTEL runtime metrics summary
so they show up in the same in-memory/debug snapshot path as the
existing timing metrics.
2026-03-06 10:23:48 -08:00
Owen Lin
c3736cff0a
feat(otel): safe tracing (#13626)
### Motivation
Today config.toml has three different OTEL knobs under `[otel]`:
- `exporter` controls where OTEL logs go
- `trace_exporter` controls where OTEL traces go
- `metrics_exporter` controls where metrics go

Those often (pretty much always?) serve different purposes.

For example, for OpenAI internal usage, the **log exporter** is already
being used for IT/security telemetry, and that use case is intentionally
content-rich: tool calls, arguments, outputs, MCP payloads, and in some
cases user content are all useful there. `log_user_prompt` is a good
example of that distinction. When it’s enabled, we include raw prompt
text in OTEL logs, which is acceptable for the security use case.

The **trace exporter** is a different story. The goal there is to give
OpenAI engineers visibility into latency and request behavior when they
run Codex locally, without sending sensitive prompt or tool data as
trace event data. In other words, traces should help answer “what was
slow?” or “where did time go?”, not “what did the user say?” or “what
did the tool return?”

The complication is that Rust’s `tracing` crate does not make a hard
distinction between “logs” and “trace events.” It gives us one
instrumentation API for logs and trace events (via `tracing::event!`),
and subscribers decide what gets treated as logs, trace events, or both.

Before this change, our OTEL trace layer was effectively attached to the
general tracing stream, which meant turning on `trace_exporter` could
pick up content-rich events that were originally written with logging
(and the `log_exporter`) in mind. That made it too easy for sensitive
data to end up in exported traces by accident.

### Concrete example
In `otel_manager.rs`, this `tracing::event!` call would be exported in
both logs AND traces (as a trace event).
```
    pub fn user_prompt(&self, items: &[UserInput]) {
        let prompt = items
            .iter()
            .flat_map(|item| match item {
                UserInput::Text { text, .. } => Some(text.as_str()),
                _ => None,
            })
            .collect::<String>();

        let prompt_to_log = if self.metadata.log_user_prompts {
            prompt.as_str()
        } else {
            "[REDACTED]"
        };

        tracing::event!(
            tracing::Level::INFO,
            event.name = "codex.user_prompt",
            event.timestamp = %timestamp(),
            // ...
            prompt = %prompt_to_log,
        );
    }
```

Instead of `tracing::event!`, we should now be using `log_event!` and
`trace_event!` instead to more clearly indicate which sink (logs vs.
traces) that event should be exported to.

### What changed
This PR makes the log and trace export distinct instead of treating them
as two sinks for the same data.

On the provider side, OTEL logs and traces now have separate
routing/filtering policy. The log exporter keeps receiving the existing
`codex_otel` events, while trace export is limited to spans and trace
events.

On the event side, `OtelManager` now emits two flavors of telemetry
where needed:
- a log-only event with the current rich payloads
- a tracing-safe event with summaries only

It also has a convenience `log_and_trace_event!` macro for emitting to
both logs and traces when it's safe to do so, as well as log- and
trace-specific fields.

That means prompts, tool args, tool output, account email, MCP metadata,
and similar content stay in the log lane, while traces get the pieces
that are actually useful for performance work: durations, counts, sizes,
status, token counts, tool origin, and normalized error classes.

This preserves current IT/security logging behavior while making it safe
to turn on trace export for employees.

### Full list of things removed from trace export
- raw user prompt text from `codex.user_prompt`
- raw tool arguments and output from `codex.tool_result`
- MCP server metadata from `codex.tool_result` (mcp_server,
mcp_server_origin)
- account identity fields like `user.email` and `user.account_id` from
trace-safe OTEL events
- `host.name` from trace resources
- generic `codex.tool_decision` events from traces
- generic `codex.sse_event` events from traces
- the full ToolCall debug payload from the `handle_tool_call` span

What traces now keep instead is mostly:
- spans
- trace-safe OTEL events
- counts, lengths, durations, status, token counts, and tool origin
summaries
2026-03-05 16:30:53 -08:00
jif-oai
10c04e11b8
feat: add service name to app-server (#12319)
Add service name to the app-server so that the app can use it's own
service name

This is on thread level because later we might plan the app-server to
become a singleton on the computer
2026-02-25 09:51:42 +00:00
colby-oai
2036a5f5e0
Add MCP server context to otel tool_result logs (#12267)
Summary
- capture the origin for each configured MCP server and expose it via
the connection manager
- plumb MCP server name/origin into tool logging and emit
codex.tool_result events with those fields
- add unit coverage for origin parsing and extend OTEL tests to assert
empty MCP fields for non-MCP tools
- currently not logging full urls or url paths to prevent logging
potentially sensitive data

Testing
- Not run (not requested)
2026-02-20 10:26:19 -05:00
alexsong-oai
373f5467ef
Add originator to otel metadata tags (#11232) 2026-02-09 14:29:19 -08:00
alexsong-oai
daeef06bec
add originator to otel (#10826) 2026-02-06 15:13:56 -08:00
Anton Panasenko
4ee039744e
feat: expose detailed metrics to runtime metrics (#10699) 2026-02-05 18:22:30 -08:00
iceweasel-oai
901d5b8fd6
add sandbox policy and sandbox name to codex.tool.call metrics (#10711)
This will give visibility into the comparative success rate of the
Windows sandbox implementations compared to other platforms.
2026-02-05 11:42:12 -08:00
Owen Lin
3582b74d01
fix(auth): isolate chatgptAuthTokens concept to auth manager and app-server (#10423)
So that the rest of the codebase (like TUI) don't need to be concerned
whether ChatGPT auth was handled by Codex itself or passed in via
app-server's external auth mode.
2026-02-05 10:46:06 -08:00
Anton Panasenko
fcaed4cb88
feat: log webscocket timing into runtime metrics (#10577) 2026-02-03 18:04:07 -08:00
Anton Panasenko
101d359cd7
Add websocket telemetry metrics and labels (#10316)
Summary
- expose websocket telemetry hooks through the responses client so
request durations and event processing can be reported
- record websocket request/event metrics and emit runtime telemetry
events that the history UI now surfaces
- improve tests to cover websocket telemetry reporting and guard runtime
summary updates


<img width="824" height="79" alt="Screenshot 2026-01-31 at 5 28 12 PM"
src="https://github.com/user-attachments/assets/ea9a7965-d8b4-4e3c-a984-ef4fdc44c81d"
/>
2026-01-31 19:16:44 -08:00
Anton Panasenko
8660ad6c64
feat: show runtime metrics in console (#10278)
Summary of changes:

- Adds a new feature flag: runtime_metrics
  - Declared in core/src/features.rs
  - Added to core/config.schema.json
  - Wired into OTEL init in core/src/otel_init.rs

- Enables on-demand runtime metric snapshots in OTEL
  - Adds runtime_metrics: bool to otel/src/config.rs
  - Enables experimental custom reader features in otel/Cargo.toml
  - Adds snapshot/reset/summary APIs in:
    - otel/src/lib.rs
    - otel/src/metrics/client.rs
    - otel/src/metrics/config.rs
    - otel/src/metrics/error.rs

- Defines metric names and a runtime summary builder
  - New files:
    - otel/src/metrics/names.rs
    - otel/src/metrics/runtime_metrics.rs
  - Summarizes totals for:
    - Tool calls
    - API requests
    - SSE/streaming events

- Instruments metrics collection in OTEL manager
  - otel/src/traces/otel_manager.rs now records:
    - API call counts + durations
    - SSE event counts + durations (success/failure)
    - Tool call metrics now use shared constants

- Surfaces runtime metrics in the TUI
  - Resets runtime metrics at turn start in tui/src/chatwidget.rs
- Displays metrics in the final separator line in
tui/src/history_cell.rs

- Adds tests
  - New OTEL tests:
    - otel/tests/suite/snapshot.rs
    - otel/tests/suite/runtime_summary.rs
  - New TUI test:
- final_message_separator_includes_runtime_metrics in
tui/src/history_cell.rs

Scope:
- 19 files changed
- ~652 insertions, 38 deletions


<img width="922" height="169" alt="Screenshot 2026-01-30 at 4 11 34 PM"
src="https://github.com/user-attachments/assets/1efd754d-a16d-4564-83a5-f4442fd2f998"
/>
2026-01-30 22:20:02 -08:00
jif-oai
129787493f
feat: backfill timing metric (#10218)
1. Add a metric to measure the backfill time
2. Add a unit to the timing histogram
2026-01-30 10:19:41 +01:00
alexsong-oai
0fa45fbca4
feat: add session source as otel metadata tag (#9720)
Add session.source and user.account_id as global OTEL metric tags to
identify client surface and user.
2026-01-22 18:46:14 -08:00
gt-oai
93dec9045e
otel test: retry WouldBlock errors (#8915)
This test looks flaky on Windows:

```
        FAIL [   0.034s] (1442/2802) codex-otel::tests suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector
  stdout ───

    running 1 test
    test suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector ... FAILED

    failures:

    failures:
        suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector

    test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 14 filtered out; finished in 0.02s
    
  stderr ───
    Error: ProviderShutdown { source: InternalFailure("[InternalFailure(\"Failed to shutdown\")]") }

────────────
     Summary [ 175.360s] 2802 tests run: 2801 passed, 1 failed, 15 skipped
        FAIL [   0.034s] (1442/2802) codex-otel::tests suite::otlp_http_loopback::otlp_http_exporter_sends_metrics_to_collector
```
2026-01-08 18:18:49 +00:00
jif-oai
634650dd25
feat: metrics capabilities (#8318)
Add metrics capabilities to Codex. The `README.md` is up to date.

This will not be merged with the metrics before this PR of course:
https://github.com/openai/codex/pull/8350
2026-01-08 11:47:36 +00:00