## Description This PR fixes a bad first-turn failure mode in app-server when the startup websocket prewarm hangs. Before this change, `initialize -> thread/start -> turn/start` could sit behind the prewarm for up to five minutes, so the client would not see `turn/started`, and even `turn/interrupt` would block because the turn had not actually started yet. Now, we: - set a (configurable) timeout of 15s for websocket startup time, exposed as `websocket_startup_timeout_ms` in config.toml - `turn/started` is sent immediately on `turn/start` even if the websocket is still connecting - `turn/interrupt` can be used to cancel a turn that is still waiting on the websocket warmup - the turn task will wait for the full 15s websocket warming timeout before falling back ## Why The old behavior made app-server feel stuck at exactly the moment the client expects turn lifecycle events to start flowing. That was especially painful for external clients, because from their point of view the server had accepted the request but then went silent for minutes. ## Configuring the websocket startup timeout Can set it in config.toml like this: ``` [model_providers.openai] supports_websockets = true websocket_connect_timeout_ms = 15000 ``` |
||
|---|---|---|
| .. | ||
| src | ||
| tests | ||
| BUILD.bazel | ||
| Cargo.toml | ||
| README.md | ||
codex-otel
codex-otel is the OpenTelemetry integration crate for Codex. It provides:
- Provider wiring for log/trace/metric exporters (
codex_otel::OtelProviderandcodex_otel::provider). - Session-scoped business event emission via
codex_otel::SessionTelemetry. - Low-level metrics APIs via
codex_otel::metrics. - Trace-context helpers via
codex_otel::trace_contextand crate-root re-exports.
Tracing and logs
Create an OTEL provider from OtelSettings. The provider also configures
metrics (when enabled), then attach its layers to your tracing_subscriber
registry:
use codex_otel::config::OtelExporter;
use codex_otel::config::OtelHttpProtocol;
use codex_otel::config::OtelSettings;
use codex_otel::OtelProvider;
use tracing_subscriber::prelude::*;
let settings = OtelSettings {
environment: "dev".to_string(),
service_name: "codex-cli".to_string(),
service_version: env!("CARGO_PKG_VERSION").to_string(),
codex_home: std::path::PathBuf::from("/tmp"),
exporter: OtelExporter::OtlpHttp {
endpoint: "https://otlp.example.com".to_string(),
headers: std::collections::HashMap::new(),
protocol: OtelHttpProtocol::Binary,
tls: None,
},
trace_exporter: OtelExporter::OtlpHttp {
endpoint: "https://otlp.example.com".to_string(),
headers: std::collections::HashMap::new(),
protocol: OtelHttpProtocol::Binary,
tls: None,
},
metrics_exporter: OtelExporter::None,
};
if let Some(provider) = OtelProvider::from(&settings)? {
let registry = tracing_subscriber::registry()
.with(provider.logger_layer())
.with(provider.tracing_layer());
registry.init();
}
SessionTelemetry (events)
SessionTelemetry adds consistent metadata to tracing events and helps record
Codex-specific session events. Rich session/business events should go through
SessionTelemetry; subsystem-owned audit events can stay with the owning subsystem.
use codex_otel::SessionTelemetry;
let manager = SessionTelemetry::new(
conversation_id,
model,
slug,
account_id,
account_email,
auth_mode,
originator,
log_user_prompts,
terminal_type,
session_source,
);
manager.user_prompt(&prompt_items);
Metrics (OTLP or in-memory)
Modes:
- OTLP: exports metrics via the OpenTelemetry OTLP exporter (HTTP or gRPC).
- In-memory: records via
opentelemetry_sdk::metrics::InMemoryMetricExporterfor tests/assertions; callshutdown()to flush.
codex-otel also provides OtelExporter::Statsig, a shorthand for exporting OTLP/HTTP JSON metrics
to Statsig using Codex-internal defaults.
Statsig ingestion (OTLP/HTTP JSON) example:
use codex_otel::config::{OtelExporter, OtelHttpProtocol};
let metrics = MetricsClient::new(MetricsConfig::otlp(
"dev",
"codex-cli",
env!("CARGO_PKG_VERSION"),
OtelExporter::OtlpHttp {
endpoint: "https://api.statsig.com/otlp".to_string(),
headers: std::collections::HashMap::from([(
"statsig-api-key".to_string(),
std::env::var("STATSIG_SERVER_SDK_SECRET")?,
)]),
protocol: OtelHttpProtocol::Json,
tls: None,
},
))?;
metrics.counter("codex.session_started", 1, &[("source", "tui")])?;
metrics.histogram("codex.request_latency", 83, &[("route", "chat")])?;
In-memory (tests):
let exporter = InMemoryMetricExporter::default();
let metrics = MetricsClient::new(MetricsConfig::in_memory(
"test",
"codex-cli",
env!("CARGO_PKG_VERSION"),
exporter.clone(),
))?;
metrics.counter("codex.turns", 1, &[("model", "gpt-5.1")])?;
metrics.shutdown()?; // flushes in-memory exporter
Trace context
Trace propagation helpers remain separate from the session event emitter:
use codex_otel::current_span_w3c_trace_context;
use codex_otel::set_parent_from_w3c_trace_context;
Shutdown
OtelProvider::shutdown()stops the OTEL exporter.SessionTelemetry::shutdown_metrics()flushes and shuts down the metrics provider.
Both are optional because drop performs best-effort shutdown, but calling them explicitly gives deterministic flushing (or a shutdown error if flushing does not complete in time).