diff --git a/codex-rs/client.md b/codex-rs/client.md deleted file mode 100644 index 4c9027ccb..000000000 --- a/codex-rs/client.md +++ /dev/null @@ -1,206 +0,0 @@ -# Client Extraction Plan - -## Goals -- Split the HTTP transport/client code out of `codex-core` into a reusable crate that is agnostic of Codex/OpenAI business logic and API schemas. -- Create a separate API library crate that houses typed requests/responses for well-known APIs (Responses, Chat Completions, Compact) and plugs into the transport crate via minimal traits. -- Preserve current behaviour (auth headers, retries, SSE handling, rate-limit parsing, compaction, fixtures) while making the APIs symmetric and avoiding code duplication. -- Keep existing consumers (`codex-core`, tests, and tools) stable by providing a small compatibility layer during the transition. - -## Snapshot of Today -- `core/src/client.rs (ModelClient)` owns config/auth/session state, chooses wire API, builds payloads, drives retries, parses SSE, compaction, and rate-limit headers. -- `core/src/chat_completions.rs` implements the Chat Completions call + SSE parser + aggregation helper. -- `core/src/client_common.rs` holds `Prompt`, tool specs, shared request structs (`ResponsesApiRequest`, `TextControls`), and `ResponseEvent`/`ResponseStream`. -- `core/src/default_client.rs` wraps `reqwest` with Codex UA/originator defaults. -- `core/src/model_provider_info.rs` models providers (base URL, headers, env keys, retry/timeout tuning) and builds `CodexRequestBuilder`s. - - Current retry logic is co-located with API handling; streaming SSE parsing is duplicated across Responses/Chat. - -## Target Crates (with interfaces) - -- `codex-client` (generic transport) - - Owns the generic HTTP machinery: a `CodexHttpClient`/`CodexRequestBuilder`-style wrapper, retry/backoff hooks, streaming connector (SSE framing + idle timeout), header injection, and optional telemetry callbacks. - - Does **not** know about OpenAI/Codex-specific paths, headers, or error codes; it only exposes HTTP-level concepts (status, headers, bodies, connection errors). - - Minimal surface: - ```rust - pub trait HttpTransport { - fn execute(&self, req: Request) -> Result; - fn stream(&self, req: Request) -> Result; - } - - pub struct Request { - pub method: Method, - pub url: String, - pub headers: HeaderMap, - pub body: Option, - pub timeout: Option, - } - ``` - - Generic client traits (request/response/chunk are abstract over the transport): - ```rust - #[async_trait::async_trait] - pub trait UnaryClient { - async fn run(&self, req: Req) -> Result; - } - - #[async_trait::async_trait] - pub trait StreamClient { - async fn run(&self, req: Req) -> Result, TransportError>; - } - - pub struct RetryPolicy { - pub max_attempts: u64, - pub base_delay: Duration, - pub retry_on: RetryOn, // e.g., transport errors + 429/5xx - } - ``` - - `RetryOn` lives in `codex-client` and captures HTTP status classes and transport failures that qualify for retry. - - Implementations in `codex-api` plug in their own request types, parsers, and retry policies while reusing the transport’s backoff and error types. - - Planned runtime helper: - ```rust - pub async fn run_with_retry( - policy: RetryPolicy, - make_req: impl Fn() -> Request, - op: F, - ) -> Result - where - F: Fn(Request) -> Fut, - Fut: Future>, - { - for attempt in 0..=policy.max_attempts { - let req = make_req(); - match op(req).await { - Ok(resp) => return Ok(resp), - Err(err) if policy.retry_on.should_retry(&err, attempt) => { - tokio::time::sleep(backoff(policy.base_delay, attempt + 1)).await; - } - Err(err) => return Err(err), - } - } - Err(TransportError::RetryLimit) - } - ``` - - Unary clients wrap `transport.execute` with this helper and then deserialize. - - Stream clients wrap the **initial** `transport.stream` call with this helper. Mid-stream disconnects are surfaced as `StreamError`s; automatic resume/reconnect can be added later on top of this primitive if we introduce cursor support. - - Common helpers: `retry::backoff(attempt)`, `errors::{TransportError, StreamError}`. - - Streaming utility (SSE framing only): - ```rust - pub fn sse_stream( - bytes: S, - idle_timeout: Duration, - tx: mpsc::Sender>, - telemetry: Option>, - ) - where - S: Stream> + Unpin + Send + 'static; - ``` - - `sse_stream` is responsible for timeouts, connection-level errors, and emitting raw `data:` chunks as UTF-8 strings; parsing those strings into structured events is done in `codex-api`. - -- `codex-api` (OpenAI/Codex API library) - - Owns typed models for Responses/Chat/Compact plus shared helpers (`Prompt`, tool specs, text controls, `ResponsesApiRequest`, etc.). - - Knows about OpenAI/Codex semantics: - - URL shapes (`/v1/responses`, `/v1/chat/completions`, `/responses/compact`). - - Provider configuration (`WireApi`, base URLs, query params, per-provider retry knobs). - - Rate-limit headers (`x-codex-*`) and their mapping into `RateLimitSnapshot` / `CreditsSnapshot`. - - Error body formats (`{ error: { type, code, message, plan_type, resets_at } }`) and how they become API errors (context window exceeded, quota/usage limit, etc.). - - SSE event names (`response.output_item.done`, `response.completed`, `response.failed`, etc.) and their mapping into high-level events. - - Provides a provider abstraction (conceptually similar to `ModelProviderInfo`): - ```rust - pub struct Provider { - pub name: String, - pub base_url: String, - pub wire: WireApi, // Responses | Chat - pub headers: HeaderMap, - pub retry: RetryConfig, - pub stream_idle_timeout: Duration, - } - - pub trait AuthProvider { - /// Returns a bearer token to use for this request (if any). - /// Implementations are expected to be cheap and to surface already-refreshed tokens; - /// higher layers (`codex-core`) remain responsible for token refresh flows. - fn bearer_token(&self) -> Option; - - /// Optional ChatGPT account id header for Chat mode. - fn account_id(&self) -> Option; - } - ``` - - Ready-made clients built on `HttpTransport`: - ```rust - pub struct ResponsesClient { /* ... */ } - impl ResponsesClient { - pub async fn stream(&self, prompt: &Prompt) -> ApiResult>; - pub async fn compact(&self, prompt: &Prompt) -> ApiResult>; - } - - pub struct ChatClient { /* ... */ } - impl ChatClient { - pub async fn stream(&self, prompt: &Prompt) -> ApiResult>; - } - - pub struct CompactClient { /* ... */ } - impl CompactClient { - pub async fn compact(&self, prompt: &Prompt) -> ApiResult>; - } - ``` - - Streaming events unified across wire APIs (this can closely mirror `ResponseEvent` today, and we may type-alias one to the other during migration): - ```rust - pub enum ApiEvent { - Created, - OutputItemAdded(ResponseItem), - OutputItemDone(ResponseItem), - OutputTextDelta(String), - ReasoningContentDelta { delta: String, content_index: i64 }, - ReasoningSummaryDelta { delta: String, summary_index: i64 }, - RateLimits(RateLimitSnapshot), - Completed { response_id: String, token_usage: Option }, - } - ``` - - Error layering: - - `codex-client`: defines `TransportError` / `StreamError` (status codes, IO, timeouts). - - `codex-api`: defines `ApiError` that wraps `TransportError` plus API-specific errors parsed from bodies and headers. - - `codex-core`: maps `ApiError` into existing `CodexErr` variants so downstream callers remain unchanged. - - Aggregation strategies (today’s `AggregateStreamExt`) live here as adapters (`Aggregated`, `Streaming`) that transform `ResponseStream` into the higher-level views used by `codex-core`. - -## Implementation Steps - -1. **Create crates**: add `codex-client` and `codex-api` (names keep the `codex-` prefix). Stub lib files with feature flags/tests wired into the workspace; wire them into `Cargo.toml`. -2. **Extract API-level SSE + rate limits into `codex-api`**: - - Move the Responses SSE parser (`process_sse`), rate-limit parsing, and related tests from `core/src/client.rs` into `codex-api`, keeping the behavior identical. - - Introduce `ApiEvent` (initially equivalent to `ResponseEvent`) and `ApiError`, and adjust the parser to emit those. - - Provide test-only helpers for fixture streams (replacement for `CODEX_RS_SSE_FIXTURE`) in `codex-api`. -3. **Lift transport layer into `codex-client`**: - - Move `CodexHttpClient`/`CodexRequestBuilder`, UA/originator plumbing, and backoff helpers from `core/src/default_client.rs` into `codex-client` (or a thin wrapper on top of it). - - Introduce `HttpTransport`, `Request`, `RetryPolicy`, `RetryOn`, and `run_with_retry` as described above. - - Keep sandbox/no-proxy toggles behind injected configuration so `codex-client` stays generic and does not depend on Codex-specific env vars. -4. **Model provider abstraction in `codex-api`**: - - Relocate `ModelProviderInfo` (base URL, env/header resolution, retry knobs, wire API enum) into `codex-api`, expressed in terms of `Provider` and `AuthProvider`. - - Ensure provider logic handles: - - URL building for Responses/Chat/Compact (including Azure special cases). - - Static and env-based headers. - - Per-provider retry and idle-timeout settings that map cleanly into `RetryPolicy`/`RetryOn`. -5. **API crate wiring**: - - Move `Prompt`, tool specs, `ResponsesApiRequest`, `TextControls`, and `ResponseEvent/ResponseStream` into `codex-api` under modules (`common`, `responses`, `chat`, `compact`), keeping public types stable or re-exported through `codex-core` as needed. - - Rebuild Responses and Chat clients on top of `HttpTransport` + `StreamClient`, reusing shared retry + SSE helpers; keep aggregation adapters as reusable strategies instead of `ModelClient`-local logic. - - Implement Compact on top of `UnaryClient` and the unary `execute` path with JSON deserialization, sharing the same retry policy. - - Keep request builders symmetric: each client prepares a `Request`, attaches headers/auth via `AuthProvider`, and plugs in its parser (streaming clients) or deserializer (unary) while sharing retry/backoff configuration derived from `Provider`. -6. **Core integration layer**: - - Replace `core::ModelClient` internals with thin adapters that construct `codex-api` clients using `Config`, `AuthManager`, and `OtelEventManager`. - - Keep the public `ModelClient` API and `ResponseEvent`/`ResponseStream` types stable by re-exporting `codex-api` types or providing type aliases. - - Preserve existing auth flows (including ChatGPT token refresh) inside `codex-core` or a thin adapter, using `AuthProvider` to surface bearer tokens to `codex-api` and handling 401/refresh semantics at this layer. -7. **Tests/migration**: - - Move unit tests for SSE parsing, retry/backoff decisions, and provider/header behavior into the new crates; keep integration tests in `core` using the compatibility layer. - - Update fixtures to be consumed via test-only adapters in `codex-api`. - - Run targeted `just fmt`, `just fix -p` for the touched crates, and scoped `cargo test -p codex-client`, `-p codex-api`, and existing `codex-core` suites. - -## Design Decisions - -- **UA construction** - - `codex-client` exposes an optional UA suffix/provider hook (tiny feature) and remains unaware of the CLI; `codex-core` / the CLI compute the full UA (including `terminal::user_agent()`) and pass the suffix or builder down. -- **Config vs provider** - - Most configuration stays in `codex-core`. `codex-api::Provider` only contains what is strictly required for HTTP (base URLs, query params, retry/timeout knobs, wire API), while higher-level knobs (reasoning defaults, verbosity flags, etc.) remain core concerns. -- **Auth flow ownership** - - Auth flows (including ChatGPT token refresh) remain in `codex-core`. `AuthProvider` simply exposes already-fresh tokens/account IDs; 401 handling and refresh retries stay in the existing auth layer. -- **Error enums** - - `codex-client` continues to define `TransportError` / `StreamError`. `codex-api` defines an `ApiError` (deriving `thiserror::Error`) that wraps `TransportError` and API-specific failures, and `codex-core` maps `ApiError` into existing `CodexErr` variants for callers. -- **Streaming reconnection semantics** - - For now, mid-stream SSE failures are surfaced as errors and only the initial connection is retried via `run_with_retry`. We will revisit mid-stream reconnect/resume once the underlying APIs support cursor/idempotent event semantics. -