## Summary This PR fixes OTLP HTTP trace export in runtimes where the previous exporter setup was unreliable, especially around app-server usage. It also removes the old `codex_otel::otel_provider` compatibility shim and switches remaining call sites over to the crate-root `codex_otel::OtelProvider` export. ## What changed - Use a runtime-safe OTLP HTTP trace exporter path for Tokio runtimes. - Add an async HTTP client path for trace export when we are already inside a multi-thread Tokio runtime. - Make provider shutdown flush traces before tearing down the tracer provider. - Add loopback coverage that verifies traces are actually sent to `/v1/traces`: - outside Tokio - inside a multi-thread Tokio runtime - inside a current-thread Tokio runtime - Remove the `codex_otel::otel_provider` shim and update remaining imports. ## Why I hit cases where spans were being created correctly but never made it to the collector. The issue turned out to be in exporter/runtime behavior rather than the span plumbing itself. This PR narrows that gap and gives us regression coverage for the actual export path.
141 lines
4.1 KiB
Markdown
141 lines
4.1 KiB
Markdown
# codex-otel
|
|
|
|
`codex-otel` is the OpenTelemetry integration crate for Codex. It provides:
|
|
|
|
- Provider wiring for log/trace/metric exporters (`codex_otel::OtelProvider`
|
|
and `codex_otel::provider`).
|
|
- Session-scoped business event emission via `codex_otel::SessionTelemetry`.
|
|
- Low-level metrics APIs via `codex_otel::metrics`.
|
|
- Trace-context helpers via `codex_otel::trace_context` and crate-root re-exports.
|
|
|
|
## Tracing and logs
|
|
|
|
Create an OTEL provider from `OtelSettings`. The provider also configures
|
|
metrics (when enabled), then attach its layers to your `tracing_subscriber`
|
|
registry:
|
|
|
|
```rust
|
|
use codex_otel::config::OtelExporter;
|
|
use codex_otel::config::OtelHttpProtocol;
|
|
use codex_otel::config::OtelSettings;
|
|
use codex_otel::OtelProvider;
|
|
use tracing_subscriber::prelude::*;
|
|
|
|
let settings = OtelSettings {
|
|
environment: "dev".to_string(),
|
|
service_name: "codex-cli".to_string(),
|
|
service_version: env!("CARGO_PKG_VERSION").to_string(),
|
|
codex_home: std::path::PathBuf::from("/tmp"),
|
|
exporter: OtelExporter::OtlpHttp {
|
|
endpoint: "https://otlp.example.com".to_string(),
|
|
headers: std::collections::HashMap::new(),
|
|
protocol: OtelHttpProtocol::Binary,
|
|
tls: None,
|
|
},
|
|
trace_exporter: OtelExporter::OtlpHttp {
|
|
endpoint: "https://otlp.example.com".to_string(),
|
|
headers: std::collections::HashMap::new(),
|
|
protocol: OtelHttpProtocol::Binary,
|
|
tls: None,
|
|
},
|
|
metrics_exporter: OtelExporter::None,
|
|
};
|
|
|
|
if let Some(provider) = OtelProvider::from(&settings)? {
|
|
let registry = tracing_subscriber::registry()
|
|
.with(provider.logger_layer())
|
|
.with(provider.tracing_layer());
|
|
registry.init();
|
|
}
|
|
```
|
|
|
|
## SessionTelemetry (events)
|
|
|
|
`SessionTelemetry` adds consistent metadata to tracing events and helps record
|
|
Codex-specific session events. Rich session/business events should go through
|
|
`SessionTelemetry`; subsystem-owned audit events can stay with the owning subsystem.
|
|
|
|
```rust
|
|
use codex_otel::SessionTelemetry;
|
|
|
|
let manager = SessionTelemetry::new(
|
|
conversation_id,
|
|
model,
|
|
slug,
|
|
account_id,
|
|
account_email,
|
|
auth_mode,
|
|
originator,
|
|
log_user_prompts,
|
|
terminal_type,
|
|
session_source,
|
|
);
|
|
|
|
manager.user_prompt(&prompt_items);
|
|
```
|
|
|
|
## Metrics (OTLP or in-memory)
|
|
|
|
Modes:
|
|
|
|
- OTLP: exports metrics via the OpenTelemetry OTLP exporter (HTTP or gRPC).
|
|
- In-memory: records via `opentelemetry_sdk::metrics::InMemoryMetricExporter` for tests/assertions; call `shutdown()` to flush.
|
|
|
|
`codex-otel` also provides `OtelExporter::Statsig`, a shorthand for exporting OTLP/HTTP JSON metrics
|
|
to Statsig using Codex-internal defaults.
|
|
|
|
Statsig ingestion (OTLP/HTTP JSON) example:
|
|
|
|
```rust
|
|
use codex_otel::config::{OtelExporter, OtelHttpProtocol};
|
|
|
|
let metrics = MetricsClient::new(MetricsConfig::otlp(
|
|
"dev",
|
|
"codex-cli",
|
|
env!("CARGO_PKG_VERSION"),
|
|
OtelExporter::OtlpHttp {
|
|
endpoint: "https://api.statsig.com/otlp".to_string(),
|
|
headers: std::collections::HashMap::from([(
|
|
"statsig-api-key".to_string(),
|
|
std::env::var("STATSIG_SERVER_SDK_SECRET")?,
|
|
)]),
|
|
protocol: OtelHttpProtocol::Json,
|
|
tls: None,
|
|
},
|
|
))?;
|
|
|
|
metrics.counter("codex.session_started", 1, &[("source", "tui")])?;
|
|
metrics.histogram("codex.request_latency", 83, &[("route", "chat")])?;
|
|
```
|
|
|
|
In-memory (tests):
|
|
|
|
```rust
|
|
let exporter = InMemoryMetricExporter::default();
|
|
let metrics = MetricsClient::new(MetricsConfig::in_memory(
|
|
"test",
|
|
"codex-cli",
|
|
env!("CARGO_PKG_VERSION"),
|
|
exporter.clone(),
|
|
))?;
|
|
metrics.counter("codex.turns", 1, &[("model", "gpt-5.1")])?;
|
|
metrics.shutdown()?; // flushes in-memory exporter
|
|
```
|
|
|
|
## Trace context
|
|
|
|
Trace propagation helpers remain separate from the session event emitter:
|
|
|
|
```rust
|
|
use codex_otel::current_span_w3c_trace_context;
|
|
use codex_otel::set_parent_from_w3c_trace_context;
|
|
```
|
|
|
|
## Shutdown
|
|
|
|
- `OtelProvider::shutdown()` stops the OTEL exporter.
|
|
- `SessionTelemetry::shutdown_metrics()` flushes and shuts down the metrics provider.
|
|
|
|
Both are optional because drop performs best-effort shutdown, but calling them
|
|
explicitly gives deterministic flushing (or a shutdown error if flushing does
|
|
not complete in time).
|