History

Owen Lin c3736cff0a feat(otel): safe tracing (#13626 ) ### Motivation Today config.toml has three different OTEL knobs under `[otel]`: - `exporter` controls where OTEL logs go - `trace_exporter` controls where OTEL traces go - `metrics_exporter` controls where metrics go Those often (pretty much always?) serve different purposes. For example, for OpenAI internal usage, the log exporter is already being used for IT/security telemetry, and that use case is intentionally content-rich: tool calls, arguments, outputs, MCP payloads, and in some cases user content are all useful there. `log_user_prompt` is a good example of that distinction. When it’s enabled, we include raw prompt text in OTEL logs, which is acceptable for the security use case. The trace exporter is a different story. The goal there is to give OpenAI engineers visibility into latency and request behavior when they run Codex locally, without sending sensitive prompt or tool data as trace event data. In other words, traces should help answer “what was slow?” or “where did time go?”, not “what did the user say?” or “what did the tool return?” The complication is that Rust’s `tracing` crate does not make a hard distinction between “logs” and “trace events.” It gives us one instrumentation API for logs and trace events (via `tracing::event!`), and subscribers decide what gets treated as logs, trace events, or both. Before this change, our OTEL trace layer was effectively attached to the general tracing stream, which meant turning on `trace_exporter` could pick up content-rich events that were originally written with logging (and the `log_exporter`) in mind. That made it too easy for sensitive data to end up in exported traces by accident. ### Concrete example In `otel_manager.rs`, this `tracing::event!` call would be exported in both logs AND traces (as a trace event). ``` pub fn user_prompt(&self, items: &[UserInput]) { let prompt = items .iter() .flat_map(\|item\| match item { UserInput::Text { text, .. } => Some(text.as_str()), _ => None, }) .collect::<String>(); let prompt_to_log = if self.metadata.log_user_prompts { prompt.as_str() } else { "[REDACTED]" }; tracing::event!( tracing::Level::INFO, event.name = "codex.user_prompt", event.timestamp = %timestamp(), // ... prompt = %prompt_to_log, ); } ``` Instead of `tracing::event!`, we should now be using `log_event!` and `trace_event!` instead to more clearly indicate which sink (logs vs. traces) that event should be exported to. ### What changed This PR makes the log and trace export distinct instead of treating them as two sinks for the same data. On the provider side, OTEL logs and traces now have separate routing/filtering policy. The log exporter keeps receiving the existing `codex_otel` events, while trace export is limited to spans and trace events. On the event side, `OtelManager` now emits two flavors of telemetry where needed: - a log-only event with the current rich payloads - a tracing-safe event with summaries only It also has a convenience `log_and_trace_event!` macro for emitting to both logs and traces when it's safe to do so, as well as log- and trace-specific fields. That means prompts, tool args, tool output, account email, MCP metadata, and similar content stay in the log lane, while traces get the pieces that are actually useful for performance work: durations, counts, sizes, status, token counts, tool origin, and normalized error classes. This preserves current IT/security logging behavior while making it safe to turn on trace export for employees. ### Full list of things removed from trace export - raw user prompt text from `codex.user_prompt` - raw tool arguments and output from `codex.tool_result` - MCP server metadata from `codex.tool_result` (mcp_server, mcp_server_origin) - account identity fields like `user.email` and `user.account_id` from trace-safe OTEL events - `host.name` from trace resources - generic `codex.tool_decision` events from traces - generic `codex.sse_event` events from traces - the full ToolCall debug payload from the `handle_tool_call` span What traces now keep instead is mostly: - spans - trace-safe OTEL events - counts, lengths, durations, status, token counts, and tool origin summaries		2026-03-05 16:30:53 -08:00
..
src	feat(otel): safe tracing (#13626 )	2026-03-05 16:30:53 -08:00
tests	feat(otel): safe tracing (#13626 )	2026-03-05 16:30:53 -08:00
BUILD.bazel	feat: add support for building with Bazel (#8875 )	2026-01-09 11:09:43 -08:00
Cargo.toml	otel: add host.name resource attribute to logs/traces via gethostname (#12352 )	2026-02-25 09:54:45 -05:00
README.md	chore: move otel provider outside of trace module (#8968 )	2026-01-09 12:42:54 +00:00

README.md

codex-otel

codex-otel is the OpenTelemetry integration crate for Codex. It provides:

Trace/log/metrics exporters and tracing subscriber layers (codex_otel::otel_provider).
A structured event helper (codex_otel::OtelManager).
OpenTelemetry metrics support via OTLP exporters (codex_otel::metrics).
A metrics facade on OtelManager so tracing + metrics share metadata.

Tracing and logs

Create an OTEL provider from OtelSettings. The provider also configures metrics (when enabled), then attach its layers to your tracing_subscriber registry:

use codex_otel::config::OtelExporter;
use codex_otel::config::OtelHttpProtocol;
use codex_otel::config::OtelSettings;
use codex_otel::otel_provider::OtelProvider;
use tracing_subscriber::prelude::*;

let settings = OtelSettings {
    environment: "dev".to_string(),
    service_name: "codex-cli".to_string(),
    service_version: env!("CARGO_PKG_VERSION").to_string(),
    codex_home: std::path::PathBuf::from("/tmp"),
    exporter: OtelExporter::OtlpHttp {
        endpoint: "https://otlp.example.com".to_string(),
        headers: std::collections::HashMap::new(),
        protocol: OtelHttpProtocol::Binary,
        tls: None,
    },
    trace_exporter: OtelExporter::OtlpHttp {
        endpoint: "https://otlp.example.com".to_string(),
        headers: std::collections::HashMap::new(),
        protocol: OtelHttpProtocol::Binary,
        tls: None,
    },
    metrics_exporter: OtelExporter::None,
};

if let Some(provider) = OtelProvider::from(&settings)? {
    let registry = tracing_subscriber::registry()
        .with(provider.logger_layer())
        .with(provider.tracing_layer());
    registry.init();
}

OtelManager (events)

OtelManager adds consistent metadata to tracing events and helps record Codex-specific events.

use codex_otel::OtelManager;

let manager = OtelManager::new(
    conversation_id,
    model,
    slug,
    account_id,
    account_email,
    auth_mode,
    log_user_prompts,
    terminal_type,
    session_source,
);

manager.user_prompt(&prompt_items);

Metrics (OTLP or in-memory)

Modes:

OTLP: exports metrics via the OpenTelemetry OTLP exporter (HTTP or gRPC).
In-memory: records via opentelemetry_sdk::metrics::InMemoryMetricExporter for tests/assertions; call shutdown() to flush.

codex-otel also provides OtelExporter::Statsig, a shorthand for exporting OTLP/HTTP JSON metrics to Statsig using Codex-internal defaults.

Statsig ingestion (OTLP/HTTP JSON) example:

use codex_otel::config::{OtelExporter, OtelHttpProtocol};

let metrics = MetricsClient::new(MetricsConfig::otlp(
    "dev",
    "codex-cli",
    env!("CARGO_PKG_VERSION"),
    OtelExporter::OtlpHttp {
        endpoint: "https://api.statsig.com/otlp".to_string(),
        headers: std::collections::HashMap::from([(
            "statsig-api-key".to_string(),
            std::env::var("STATSIG_SERVER_SDK_SECRET")?,
        )]),
        protocol: OtelHttpProtocol::Json,
        tls: None,
    },
))?;

metrics.counter("codex.session_started", 1, &[("source", "tui")])?;
metrics.histogram("codex.request_latency", 83, &[("route", "chat")])?;

In-memory (tests):

let exporter = InMemoryMetricExporter::default();
let metrics = MetricsClient::new(MetricsConfig::in_memory(
    "test",
    "codex-cli",
    env!("CARGO_PKG_VERSION"),
    exporter.clone(),
))?;
metrics.counter("codex.turns", 1, &[("model", "gpt-5.1")])?;
metrics.shutdown()?; // flushes in-memory exporter

Shutdown

OtelProvider::shutdown() stops the OTEL exporter.
OtelManager::shutdown_metrics() flushes and shuts down the metrics provider.

Both are optional because drop performs best-effort shutdown, but calling them explicitly gives deterministic flushing (or a shutdown error if flushing does not complete in time).