core-agent-ide/AGENTS.md
Josh McKinney 6912da84a8
client: extend custom CA handling across HTTPS and websocket clients (#14239)
## Stacked PRs

This work is now effectively split across two steps:

- #14178: add custom CA support for browser and device-code login flows,
docs, and hermetic subprocess tests
- #14239: extend that shared custom CA handling across Codex HTTPS
clients and secure websocket TLS

Note: #14240 was merged into this branch while it was stacked on top of
this PR. This PR now subsumes that websocket follow-up and should be
treated as the combined change.

Builds on top of #14178.

## Problem

Custom CA support landed first in the login path, but the real
requirement is broader. Codex constructs outbound TLS clients in
multiple places, and both HTTPS and secure websocket paths can fail
behind enterprise TLS interception if they do not honor
`CODEX_CA_CERTIFICATE` or `SSL_CERT_FILE` consistently.

This PR broadens the shared custom-CA logic beyond login and applies the
same policy to websocket TLS, so the enterprise-proxy story is no longer
split between “HTTPS works” and “websockets still fail”.

## What This Delivers

Custom CA support is no longer limited to login. Codex outbound HTTPS
clients and secure websocket connections can now honor the same
`CODEX_CA_CERTIFICATE` / `SSL_CERT_FILE` configuration, so enterprise
proxy/intercept setups work more consistently end-to-end.

For users and operators, nothing new needs to be configured beyond the
same CA env vars introduced in #14178. The change is that more of Codex
now respects them, including websocket-backed flows that were previously
still using default trust roots.

I also manually validated the proxy path locally with mitmproxy using:
`CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem
HTTPS_PROXY=http://127.0.0.1:8080 just codex`
with mitmproxy installed via `brew install mitmproxy` and configured as
the macOS system proxy.

## Mental model

`codex-client` is now the owner of shared custom-CA policy for outbound
TLS client construction. Reqwest callers start from the builder
configuration they already need, then pass that builder through
`build_reqwest_client_with_custom_ca(...)`. Websocket callers ask the
same module for a rustls client config when a custom CA bundle is
configured.

The env precedence is the same everywhere:
- `CODEX_CA_CERTIFICATE` wins
- otherwise fall back to `SSL_CERT_FILE`
- otherwise use system roots

The helper is intentionally narrow. It loads every usable certificate
from the configured PEM bundle into the appropriate root store and
returns either a configured transport or a typed error that explains
what went wrong.

## Non-goals

This does not add handshake-level integration tests against a live TLS
endpoint. It does not validate that the configured bundle forms a
meaningful certificate chain. It also does not try to force every
transport in the repo through one abstraction; it extends the shared CA
policy across the reqwest and websocket paths that actually needed it.

## Tradeoffs

The main tradeoff is centralizing CA behavior in `codex-client` while
still leaving adoption up to call sites. That keeps the implementation
additive and reviewable, but it means the rule "outbound Codex TLS that
should honor enterprise roots must use the shared helper" is still
partly enforced socially rather than by types.

For websockets, the shared helper only builds an explicit rustls config
when a custom CA bundle is configured. When no override env var is set,
websocket callers still use their ordinary default connector path.

## Architecture

`codex-client::custom_ca` now owns CA bundle selection, PEM
normalization, mixed-section parsing, certificate extraction, typed
CA-loading errors, and optional rustls client-config construction for
websocket TLS.

The affected consumers now call into that shared helper directly rather
than carrying login-local CA behavior:
- backend-client
- cloud-tasks
- RMCP client paths that use `reqwest`
- TUI voice HTTP paths
- `codex-core` default reqwest client construction
- `codex-api` websocket clients for both responses and realtime
websocket connections

The subprocess CA probe, env-sensitive integration tests, and shared PEM
fixtures also live in `codex-client`, which is now the actual owner of
the behavior they exercise.

## Observability

The shared CA path logs:
- which environment variable selected the bundle
- which path was loaded
- how many certificates were accepted
- when `TRUSTED CERTIFICATE` labels were normalized
- when CRLs were ignored
- where client construction failed

Returned errors remain user-facing and include the relevant env var,
path, and remediation hint. That same error model now applies whether
the failure surfaced while building a reqwest client or websocket TLS
configuration.

## Tests

Pure unit tests in `codex-client` cover env precedence and PEM
normalization behavior. Real client construction remains in subprocess
tests so the suite can control process env and avoid the macOS seatbelt
panic path that motivated the hermetic test split.

The subprocess coverage verifies:
- `CODEX_CA_CERTIFICATE` precedence over `SSL_CERT_FILE`
- fallback to `SSL_CERT_FILE`
- single-cert and multi-cert bundles
- malformed and empty-file errors
- OpenSSL `TRUSTED CERTIFICATE` handling
- CRL tolerance for well-formed CRL sections

The websocket side is covered by the existing `codex-api` / `codex-core`
websocket test suites plus the manual mitmproxy validation above.

---------

Co-authored-by: Ivan Zakharchanka <3axap4eHko@gmail.com>
Co-authored-by: Codex <noreply@openai.com>
2026-03-13 00:59:26 +00:00

14 KiB
Raw Blame History

Rust/codex-rs

In the codex-rs folder where the rust code lives:

  • Crate names are prefixed with codex-. For example, the core folder's crate is named codex-core
  • When using format! and you can inline variables into {}, always do that.
  • Install any commands the repo relies on (for example just, rg, or cargo-insta) if they aren't already available before running instructions here.
  • Never add or modify any code related to CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR or CODEX_SANDBOX_ENV_VAR.
    • You operate in a sandbox where CODEX_SANDBOX_NETWORK_DISABLED=1 will be set whenever you use the shell tool. Any existing code that uses CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR was authored with this fact in mind. It is often used to early exit out of tests that the author knew you would not be able to run given your sandbox limitations.
    • Similarly, when you spawn a process using Seatbelt (/usr/bin/sandbox-exec), CODEX_SANDBOX=seatbelt will be set on the child process. Integration tests that want to run Seatbelt themselves cannot be run under Seatbelt, so checks for CODEX_SANDBOX=seatbelt are also often used to early exit out of tests, as appropriate.
  • Always collapse if statements per https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_if
  • Always inline format! args when possible per https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
  • Use method references over closures when possible per https://rust-lang.github.io/rust-clippy/master/index.html#redundant_closure_for_method_calls
  • When possible, make match statements exhaustive and avoid wildcard arms.
  • When writing tests, prefer comparing the equality of entire objects over fields one by one.
  • When making a change that adds or changes an API, ensure that the documentation in the docs/ folder is up to date if applicable.
  • If you change ConfigToml or nested config types, run just write-config-schema to update codex-rs/core/config.schema.json.
  • If you change Rust dependencies (Cargo.toml or Cargo.lock), run just bazel-lock-update from the repo root to refresh MODULE.bazel.lock, and include that lockfile update in the same change.
  • After dependency changes, run just bazel-lock-check from the repo root so lockfile drift is caught locally before CI.
  • Bazel does not automatically make source-tree files available to compile-time Rust file access. If you add include_str!, include_bytes!, sqlx::migrate!, or similar build-time file or directory reads, update the crate's BUILD.bazel (compile_data, build_script_data, or test data) or Bazel may fail even when Cargo passes.
  • Do not create small helper methods that are referenced only once.
  • Avoid large modules:
    • Prefer adding new modules instead of growing existing ones.
    • Target Rust modules under 500 LoC, excluding tests.
    • If a file exceeds roughly 800 LoC, add new functionality in a new module instead of extending the existing file unless there is a strong documented reason not to.
    • This rule applies especially to high-touch files that already attract unrelated changes, such as codex-rs/tui/src/app.rs, codex-rs/tui/src/bottom_pane/chat_composer.rs, codex-rs/tui/src/bottom_pane/footer.rs, codex-rs/tui/src/chatwidget.rs, codex-rs/tui/src/bottom_pane/mod.rs, and similarly central orchestration modules.
    • When extracting code from a large module, move the related tests and module/type docs toward the new implementation so the invariants stay close to the code that owns them.

Run just fmt (in codex-rs directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:

  1. Run the test for the specific project that was changed. For example, if changes were made in codex-rs/tui, run cargo test -p codex-tui.
  2. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with cargo test (or just test if cargo-nextest is installed). Avoid --all-features for routine local runs because it expands the build matrix and can significantly increase target/ disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.

Before finalizing a large change to codex-rs, run just fix -p <project> (in codex-rs directory) to fix any linter issues in the code. Prefer scoping with -p to avoid slow workspacewide Clippy builds; only run just fix without -p if you changed shared crates. Do not re-run tests after running fix or fmt.

TUI style conventions

See codex-rs/tui/styles.md.

TUI code conventions

  • Use concise styling helpers from ratatuis Stylize trait.
    • Basic spans: use "text".into()
    • Styled spans: use "text".red(), "text".green(), "text".magenta(), "text".dim(), etc.
    • Prefer these over constructing styles with Span::styled and Style directly.
    • Example: patch summary file lines
      • Desired: vec![" └ ".into(), "M".red(), " ".dim(), "tui/src/app.rs".dim()]

TUI Styling (ratatui)

  • Prefer Stylize helpers: use "text".dim(), .bold(), .cyan(), .italic(), .underlined() instead of manual Style where possible.
  • Prefer simple conversions: use "text".into() for spans and vec![…].into() for lines; when inference is ambiguous (e.g., Paragraph::new/Cell::from), use Line::from(spans) or Span::from(text).
  • Computed styles: if the Style is computed at runtime, using Span::styled is OK (Span::from(text).set_style(style) is also acceptable).
  • Avoid hardcoded white: do not use .white(); prefer the default foreground (no color).
  • Chaining: combine helpers by chaining for readability (e.g., url.cyan().underlined()).
  • Single items: prefer "text".into(); use Line::from(text) or Span::from(text) only when the target type isnt obvious from context, or when using .into() would require extra type annotations.
  • Building lines: use vec![…].into() to construct a Line when the target type is obvious and no extra type annotations are needed; otherwise use Line::from(vec![…]).
  • Avoid churn: dont refactor between equivalent forms (Span::styled ↔ set_style, Line::from ↔ .into()) without a clear readability or functional gain; follow filelocal conventions and do not introduce type annotations solely to satisfy .into().
  • Compactness: prefer the form that stays on one line after rustfmt; if only one of Line::from(vec![…]) or vec![…].into() avoids wrapping, choose that. If both wrap, pick the one with fewer wrapped lines.

Text wrapping

  • Always use textwrap::wrap to wrap plain strings.
  • If you have a ratatui Line and you want to wrap it, use the helpers in tui/src/wrapping.rs, e.g. word_wrap_lines / word_wrap_line.
  • If you need to indent wrapped lines, use the initial_indent / subsequent_indent options from RtOptions if you can, rather than writing custom logic.
  • If you have a list of lines and you need to prefix them all with some prefix (optionally different on the first vs subsequent lines), use the prefix_lines helper from line_utils.

Tests

Snapshot tests

This repo uses snapshot tests (via insta), especially in codex-rs/tui, to validate rendered output.

Requirement: any change that affects user-visible UI (including adding new UI) must include corresponding insta snapshot coverage (add a new snapshot test if one doesn't exist yet, or update the existing snapshot). Review and accept snapshot updates as part of the PR so UI impact is easy to review and future diffs stay visual.

When UI or text output changes intentionally, update the snapshots as follows:

  • Run tests to generate any updated snapshots:
    • cargo test -p codex-tui
  • Check whats pending:
    • cargo insta pending-snapshots -p codex-tui
  • Review changes by reading the generated *.snap.new files directly in the repo, or preview a specific file:
    • cargo insta show -p codex-tui path/to/file.snap.new
  • Only if you intend to accept all new snapshots in this crate, run:
    • cargo insta accept -p codex-tui

If you dont have the tool:

  • cargo install cargo-insta

Test assertions

  • Tests should use pretty_assertions::assert_eq for clearer diffs. Import this at the top of the test module if it isn't already.
  • Prefer deep equals comparisons whenever possible. Perform assert_eq!() on entire objects, rather than individual fields.
  • Avoid mutating process environment in tests; prefer passing environment-derived flags or dependencies from above.

Spawning workspace binaries in tests (Cargo vs Bazel)

  • Prefer codex_utils_cargo_bin::cargo_bin("...") over assert_cmd::Command::cargo_bin(...) or escargot when tests need to spawn first-party binaries.
    • Under Bazel, binaries and resources may live under runfiles; use codex_utils_cargo_bin::cargo_bin to resolve absolute paths that remain stable after chdir.
  • When locating fixture files or test resources under Bazel, avoid env!("CARGO_MANIFEST_DIR"). Prefer codex_utils_cargo_bin::find_resource! so paths resolve correctly under both Cargo and Bazel runfiles.

Integration tests (core)

  • Prefer the utilities in core_test_support::responses when writing end-to-end Codex tests.

  • All mount_sse* helpers return a ResponseMock; hold onto it so you can assert against outbound /responses POST bodies.

  • Use ResponseMock::single_request() when a test should only issue one POST, or ResponseMock::requests() to inspect every captured ResponsesRequest.

  • ResponsesRequest exposes helpers (body_json, input, function_call_output, custom_tool_call_output, call_output, header, path, query_param) so assertions can target structured payloads instead of manual JSON digging.

  • Build SSE payloads with the provided ev_* constructors and the sse(...).

  • Prefer wait_for_event over wait_for_event_with_timeout.

  • Prefer mount_sse_once over mount_sse_once_match or mount_sse_sequence

  • Typical pattern:

    let mock = responses::mount_sse_once(&server, responses::sse(vec![
        responses::ev_response_created("resp-1"),
        responses::ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
        responses::ev_completed("resp-1"),
    ])).await;
    
    codex.submit(Op::UserTurn { ... }).await?;
    
    // Assert request body if needed.
    let request = mock.single_request();
    // assert using request.function_call_output(call_id) or request.json_body() or other helpers.
    

App-server API Development Best Practices

These guidelines apply to app-server protocol work in codex-rs, especially:

  • app-server-protocol/src/protocol/common.rs
  • app-server-protocol/src/protocol/v2.rs
  • app-server/README.md

Core Rules

  • All active API development should happen in app-server v2. Do not add new API surface area to v1.
  • Follow payload naming consistently: *Params for request payloads, *Response for responses, and *Notification for notifications.
  • Expose RPC methods as <resource>/<method> and keep <resource> singular (for example, thread/read, app/list).
  • Always expose fields as camelCase on the wire with #[serde(rename_all = "camelCase")] unless a tagged union or explicit compatibility requirement needs a targeted rename.
  • Exception: config RPC payloads are expected to use snake_case to mirror config.toml keys (see the config read/write/list APIs in app-server-protocol/src/protocol/v2.rs).
  • Always set #[ts(export_to = "v2/")] on v2 request/response/notification types so generated TypeScript lands in the correct namespace.
  • Never use #[serde(skip_serializing_if = "Option::is_none")] for v2 API payload fields. Exception: client->server requests that intentionally have no params may use: params: #[ts(type = "undefined")] #[serde(skip_serializing_if = "Option::is_none")] Option<()>.
  • Keep Rust and TS wire renames aligned. If a field or variant uses #[serde(rename = "...")], add matching #[ts(rename = "...")].
  • For discriminated unions, use explicit tagging in both serializers: #[serde(tag = "type", ...)] and #[ts(tag = "type", ...)].
  • Prefer plain String IDs at the API boundary (do UUID parsing/conversion internally if needed).
  • Timestamps should be integer Unix seconds (i64) and named *_at (for example, created_at, updated_at, resets_at).
  • For experimental API surface area: use #[experimental("method/or/field")], derive ExperimentalApi when field-level gating is needed, and use inspect_params: true in common.rs when only some fields of a method are experimental.

Client->server request payloads (*Params)

  • Every optional field must be annotated with #[ts(optional = nullable)]. Do not use #[ts(optional = nullable)] outside client->server request payloads (*Params).
  • Optional collection fields (for example Vec, HashMap) must use Option<...> + #[ts(optional = nullable)]. Do not use #[serde(default)] to model optional collections, and do not use skip_serializing_if on v2 payload fields.
  • When you want omission to mean false for boolean fields, use #[serde(default, skip_serializing_if = "std::ops::Not::not")] pub field: bool over Option<bool>.
  • For new list methods, implement cursor pagination by default: request fields pub cursor: Option<String> and pub limit: Option<u32>, response fields pub data: Vec<...> and pub next_cursor: Option<String>.

Development Workflow

  • Update docs/examples when API behavior changes (at minimum app-server/README.md).
  • Regenerate schema fixtures when API shapes change: just write-app-server-schema (and just write-app-server-schema --experimental when experimental API fixtures are affected).
  • Validate with cargo test -p codex-app-server-protocol.
  • Avoid boilerplate tests that only assert experimental field markers for individual request fields in common.rs; rely on schema generation/tests and behavioral coverage instead.