core-agent-ide/codex-rs/otel/README.md
Owen Lin fa1242c83b fix(otel): make HTTP trace export survive app-server runtimes (#14300)
## Summary

This PR fixes OTLP HTTP trace export in runtimes where the previous
exporter setup was unreliable, especially around app-server usage. It
also removes the old `codex_otel::otel_provider` compatibility shim and
switches remaining call sites over to the crate-root
`codex_otel::OtelProvider` export.

## What changed

- Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes.
- Add an async HTTP client path for trace export when we are already
inside a multi-thread Tokio runtime.
- Make provider shutdown flush traces before tearing down the tracer
provider.
- Add loopback coverage that verifies traces are actually sent to
`/v1/traces`:
  - outside Tokio
  - inside a multi-thread Tokio runtime
  - inside a current-thread Tokio runtime
- Remove the `codex_otel::otel_provider` shim and update remaining
imports.

## Why

I hit cases where spans were being created correctly but never made it
to the collector. The issue turned out to be in exporter/runtime
behavior rather than the span plumbing itself. This PR narrows that gap
and gives us regression coverage for the actual export path.
2026-03-11 12:33:10 -07:00

141 lines
4.1 KiB
Markdown

# codex-otel
`codex-otel` is the OpenTelemetry integration crate for Codex. It provides:
- Provider wiring for log/trace/metric exporters (`codex_otel::OtelProvider`
and `codex_otel::provider`).
- Session-scoped business event emission via `codex_otel::SessionTelemetry`.
- Low-level metrics APIs via `codex_otel::metrics`.
- Trace-context helpers via `codex_otel::trace_context` and crate-root re-exports.
## Tracing and logs
Create an OTEL provider from `OtelSettings`. The provider also configures
metrics (when enabled), then attach its layers to your `tracing_subscriber`
registry:
```rust
use codex_otel::config::OtelExporter;
use codex_otel::config::OtelHttpProtocol;
use codex_otel::config::OtelSettings;
use codex_otel::OtelProvider;
use tracing_subscriber::prelude::*;
let settings = OtelSettings {
environment: "dev".to_string(),
service_name: "codex-cli".to_string(),
service_version: env!("CARGO_PKG_VERSION").to_string(),
codex_home: std::path::PathBuf::from("/tmp"),
exporter: OtelExporter::OtlpHttp {
endpoint: "https://otlp.example.com".to_string(),
headers: std::collections::HashMap::new(),
protocol: OtelHttpProtocol::Binary,
tls: None,
},
trace_exporter: OtelExporter::OtlpHttp {
endpoint: "https://otlp.example.com".to_string(),
headers: std::collections::HashMap::new(),
protocol: OtelHttpProtocol::Binary,
tls: None,
},
metrics_exporter: OtelExporter::None,
};
if let Some(provider) = OtelProvider::from(&settings)? {
let registry = tracing_subscriber::registry()
.with(provider.logger_layer())
.with(provider.tracing_layer());
registry.init();
}
```
## SessionTelemetry (events)
`SessionTelemetry` adds consistent metadata to tracing events and helps record
Codex-specific session events. Rich session/business events should go through
`SessionTelemetry`; subsystem-owned audit events can stay with the owning subsystem.
```rust
use codex_otel::SessionTelemetry;
let manager = SessionTelemetry::new(
conversation_id,
model,
slug,
account_id,
account_email,
auth_mode,
originator,
log_user_prompts,
terminal_type,
session_source,
);
manager.user_prompt(&prompt_items);
```
## Metrics (OTLP or in-memory)
Modes:
- OTLP: exports metrics via the OpenTelemetry OTLP exporter (HTTP or gRPC).
- In-memory: records via `opentelemetry_sdk::metrics::InMemoryMetricExporter` for tests/assertions; call `shutdown()` to flush.
`codex-otel` also provides `OtelExporter::Statsig`, a shorthand for exporting OTLP/HTTP JSON metrics
to Statsig using Codex-internal defaults.
Statsig ingestion (OTLP/HTTP JSON) example:
```rust
use codex_otel::config::{OtelExporter, OtelHttpProtocol};
let metrics = MetricsClient::new(MetricsConfig::otlp(
"dev",
"codex-cli",
env!("CARGO_PKG_VERSION"),
OtelExporter::OtlpHttp {
endpoint: "https://api.statsig.com/otlp".to_string(),
headers: std::collections::HashMap::from([(
"statsig-api-key".to_string(),
std::env::var("STATSIG_SERVER_SDK_SECRET")?,
)]),
protocol: OtelHttpProtocol::Json,
tls: None,
},
))?;
metrics.counter("codex.session_started", 1, &[("source", "tui")])?;
metrics.histogram("codex.request_latency", 83, &[("route", "chat")])?;
```
In-memory (tests):
```rust
let exporter = InMemoryMetricExporter::default();
let metrics = MetricsClient::new(MetricsConfig::in_memory(
"test",
"codex-cli",
env!("CARGO_PKG_VERSION"),
exporter.clone(),
))?;
metrics.counter("codex.turns", 1, &[("model", "gpt-5.1")])?;
metrics.shutdown()?; // flushes in-memory exporter
```
## Trace context
Trace propagation helpers remain separate from the session event emitter:
```rust
use codex_otel::current_span_w3c_trace_context;
use codex_otel::set_parent_from_w3c_trace_context;
```
## Shutdown
- `OtelProvider::shutdown()` stops the OTEL exporter.
- `SessionTelemetry::shutdown_metrics()` flushes and shuts down the metrics provider.
Both are optional because drop performs best-effort shutdown, but calling them
explicitly gives deterministic flushing (or a shutdown error if flushing does
not complete in time).