core-agent-ide/AGENTS.md

# Rust/codex-rs

In the codex-rs folder where the rust code lives:

- Crate names are prefixed with `codex-`. For example, the `core` folder's crate is named `codex-core`
- When using format! and you can inline variables into {}, always do that.
- Install any commands the repo relies on (for example `just`, `rg`, or `cargo-insta`) if they aren't already available before running instructions here.
- Never add or modify any code related to `CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR` or `CODEX_SANDBOX_ENV_VAR`.
  - You operate in a sandbox where `CODEX_SANDBOX_NETWORK_DISABLED=1` will be set whenever you use the `shell` tool. Any existing code that uses `CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR` was authored with this fact in mind. It is often used to early exit out of tests that the author knew you would not be able to run given your sandbox limitations.
  - Similarly, when you spawn a process using Seatbelt (`/usr/bin/sandbox-exec`), `CODEX_SANDBOX=seatbelt` will be set on the child process. Integration tests that want to run Seatbelt themselves cannot be run under Seatbelt, so checks for `CODEX_SANDBOX=seatbelt` are also often used to early exit out of tests, as appropriate.
- Always collapse if statements per https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_if
- Always inline format! args when possible per https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
- Use method references over closures when possible per https://rust-lang.github.io/rust-clippy/master/index.html#redundant_closure_for_method_calls
- Avoid bool or ambiguous `Option` parameters that force callers to write hard-to-read code such as `foo(false)` or `bar(None)`. Prefer enums, named methods, newtypes, or other idiomatic Rust API shapes when they keep the callsite self-documenting.
- When you cannot make that API change and still need a small positional-literal callsite in Rust, follow the `argument_comment_lint` convention:
  - Use an exact `/*param_name*/` comment before opaque literal arguments such as `None`, booleans, and numeric literals when passing them by position.
  - Do not add these comments for string or char literals unless the comment adds real clarity; those literals are intentionally exempt from the lint.
  - If you add one of these comments, the parameter name must exactly match the callee signature.
- When possible, make `match` statements exhaustive and avoid wildcard arms.
- When writing tests, prefer comparing the equality of entire objects over fields one by one.
- When making a change that adds or changes an API, ensure that the documentation in the `docs/` folder is up to date if applicable.
- If you change `ConfigToml` or nested config types, run `just write-config-schema` to update `codex-rs/core/config.schema.json`.
- If you change Rust dependencies (`Cargo.toml` or `Cargo.lock`), run `just bazel-lock-update` from the
  repo root to refresh `MODULE.bazel.lock`, and include that lockfile update in the same change.
- After dependency changes, run `just bazel-lock-check` from the repo root so lockfile drift is caught
  locally before CI.
- Bazel does not automatically make source-tree files available to compile-time Rust file access. If
  you add `include_str!`, `include_bytes!`, `sqlx::migrate!`, or similar build-time file or
  directory reads, update the crate's `BUILD.bazel` (`compile_data`, `build_script_data`, or test
  data) or Bazel may fail even when Cargo passes.
- Do not create small helper methods that are referenced only once.
- Avoid large modules:
  - Prefer adding new modules instead of growing existing ones.
  - Target Rust modules under 500 LoC, excluding tests.
  - If a file exceeds roughly 800 LoC, add new functionality in a new module instead of extending
    the existing file unless there is a strong documented reason not to.
  - This rule applies especially to high-touch files that already attract unrelated changes, such
    as `codex-rs/tui/src/app.rs`, `codex-rs/tui/src/bottom_pane/chat_composer.rs`,
    `codex-rs/tui/src/bottom_pane/footer.rs`, `codex-rs/tui/src/chatwidget.rs`,
    `codex-rs/tui/src/bottom_pane/mod.rs`, and similarly central orchestration modules.
  - When extracting code from a large module, move the related tests and module/type docs toward
    the new implementation so the invariants stay close to the code that owns them.

Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:

1. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `cargo test -p codex-tui`.
2. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `cargo test` (or `just test` if `cargo-nextest` is installed). Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.

Before finalizing a large change to `codex-rs`, run `just fix -p <project>` (in `codex-rs` directory) to fix any linter issues in the code. Prefer scoping with `-p` to avoid slow workspace‑wide Clippy builds; only run `just fix` without `-p` if you changed shared crates. Do not re-run tests after running `fix` or `fmt`.

Also run `just argument-comment-lint` to ensure the codebase is clean of comment lint errors.

## TUI style conventions

See `codex-rs/tui/styles.md`.

## TUI code conventions

- When a change lands in `codex-rs/tui` and `codex-rs/tui_app_server` has a parallel implementation of the same behavior, reflect the change in `codex-rs/tui_app_server` too unless there is a documented reason not to.

- Use concise styling helpers from ratatui’s Stylize trait.
  - Basic spans: use "text".into()
  - Styled spans: use "text".red(), "text".green(), "text".magenta(), "text".dim(), etc.
  - Prefer these over constructing styles with `Span::styled` and `Style` directly.
  - Example: patch summary file lines
    - Desired: vec!["  └ ".into(), "M".red(), " ".dim(), "tui/src/app.rs".dim()]

### TUI Styling (ratatui)

- Prefer Stylize helpers: use "text".dim(), .bold(), .cyan(), .italic(), .underlined() instead of manual Style where possible.
- Prefer simple conversions: use "text".into() for spans and vec![…].into() for lines; when inference is ambiguous (e.g., Paragraph::new/Cell::from), use Line::from(spans) or Span::from(text).
- Computed styles: if the Style is computed at runtime, using `Span::styled` is OK (`Span::from(text).set_style(style)` is also acceptable).
- Avoid hardcoded white: do not use `.white()`; prefer the default foreground (no color).
- Chaining: combine helpers by chaining for readability (e.g., url.cyan().underlined()).
- Single items: prefer "text".into(); use Line::from(text) or Span::from(text) only when the target type isn’t obvious from context, or when using .into() would require extra type annotations.
- Building lines: use vec![…].into() to construct a Line when the target type is obvious and no extra type annotations are needed; otherwise use Line::from(vec![…]).
- Avoid churn: don’t refactor between equivalent forms (Span::styled ↔ set_style, Line::from ↔ .into()) without a clear readability or functional gain; follow file‑local conventions and do not introduce type annotations solely to satisfy .into().
- Compactness: prefer the form that stays on one line after rustfmt; if only one of Line::from(vec![…]) or vec![…].into() avoids wrapping, choose that. If both wrap, pick the one with fewer wrapped lines.

### Text wrapping

- Always use textwrap::wrap to wrap plain strings.
- If you have a ratatui Line and you want to wrap it, use the helpers in tui/src/wrapping.rs, e.g. word_wrap_lines / word_wrap_line.
- If you need to indent wrapped lines, use the initial_indent / subsequent_indent options from RtOptions if you can, rather than writing custom logic.
- If you have a list of lines and you need to prefix them all with some prefix (optionally different on the first vs subsequent lines), use the `prefix_lines` helper from line_utils.

## Tests

### Snapshot tests

This repo uses snapshot tests (via `insta`), especially in `codex-rs/tui`, to validate rendered output.

**Requirement:** any change that affects user-visible UI (including adding new UI) must include
corresponding `insta` snapshot coverage (add a new snapshot test if one doesn't exist yet, or
update the existing snapshot). Review and accept snapshot updates as part of the PR so UI impact
is easy to review and future diffs stay visual.

When UI or text output changes intentionally, update the snapshots as follows:

- Run tests to generate any updated snapshots:
  - `cargo test -p codex-tui`
- Check what’s pending:
  - `cargo insta pending-snapshots -p codex-tui`
- Review changes by reading the generated `*.snap.new` files directly in the repo, or preview a specific file:
  - `cargo insta show -p codex-tui path/to/file.snap.new`
- Only if you intend to accept all new snapshots in this crate, run:
  - `cargo insta accept -p codex-tui`

If you don’t have the tool:

- `cargo install cargo-insta`

### Test assertions

- Tests should use pretty_assertions::assert_eq for clearer diffs. Import this at the top of the test module if it isn't already.
- Prefer deep equals comparisons whenever possible. Perform `assert_eq!()` on entire objects, rather than individual fields.
- Avoid mutating process environment in tests; prefer passing environment-derived flags or dependencies from above.

### Spawning workspace binaries in tests (Cargo vs Bazel)

- Prefer `codex_utils_cargo_bin::cargo_bin("...")` over `assert_cmd::Command::cargo_bin(...)` or `escargot` when tests need to spawn first-party binaries.
  - Under Bazel, binaries and resources may live under runfiles; use `codex_utils_cargo_bin::cargo_bin` to resolve absolute paths that remain stable after `chdir`.
- When locating fixture files or test resources under Bazel, avoid `env!("CARGO_MANIFEST_DIR")`. Prefer `codex_utils_cargo_bin::find_resource!` so paths resolve correctly under both Cargo and Bazel runfiles.

### Integration tests (core)

- Prefer the utilities in `core_test_support::responses` when writing end-to-end Codex tests.

- All `mount_sse*` helpers return a `ResponseMock`; hold onto it so you can assert against outbound `/responses` POST bodies.
- Use `ResponseMock::single_request()` when a test should only issue one POST, or `ResponseMock::requests()` to inspect every captured `ResponsesRequest`.
- `ResponsesRequest` exposes helpers (`body_json`, `input`, `function_call_output`, `custom_tool_call_output`, `call_output`, `header`, `path`, `query_param`) so assertions can target structured payloads instead of manual JSON digging.
- Build SSE payloads with the provided `ev_*` constructors and the `sse(...)`.
- Prefer `wait_for_event` over `wait_for_event_with_timeout`.
- Prefer `mount_sse_once` over `mount_sse_once_match` or `mount_sse_sequence`

- Typical pattern:

  ```rust
  let mock = responses::mount_sse_once(&server, responses::sse(vec![
      responses::ev_response_created("resp-1"),
      responses::ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
      responses::ev_completed("resp-1"),
  ])).await;

  codex.submit(Op::UserTurn { ... }).await?;

  // Assert request body if needed.
  let request = mock.single_request();
  // assert using request.function_call_output(call_id) or request.json_body() or other helpers.
  ```

## App-server API Development Best Practices

These guidelines apply to app-server protocol work in `codex-rs`, especially:

- `app-server-protocol/src/protocol/common.rs`
- `app-server-protocol/src/protocol/v2.rs`
- `app-server/README.md`

### Core Rules

- All active API development should happen in app-server v2. Do not add new API surface area to v1.
- Follow payload naming consistently:
  `*Params` for request payloads, `*Response` for responses, and `*Notification` for notifications.
- Expose RPC methods as `<resource>/<method>` and keep `<resource>` singular (for example, `thread/read`, `app/list`).
- Always expose fields as camelCase on the wire with `#[serde(rename_all = "camelCase")]` unless a tagged union or explicit compatibility requirement needs a targeted rename.
- Exception: config RPC payloads are expected to use snake_case to mirror config.toml keys (see the config read/write/list APIs in `app-server-protocol/src/protocol/v2.rs`).
- Always set `#[ts(export_to = "v2/")]` on v2 request/response/notification types so generated TypeScript lands in the correct namespace.
- Never use `#[serde(skip_serializing_if = "Option::is_none")]` for v2 API payload fields.
  Exception: client->server requests that intentionally have no params may use:
  `params: #[ts(type = "undefined")] #[serde(skip_serializing_if = "Option::is_none")] Option<()>`.
- Keep Rust and TS wire renames aligned. If a field or variant uses `#[serde(rename = "...")]`, add matching `#[ts(rename = "...")]`.
- For discriminated unions, use explicit tagging in both serializers:
  `#[serde(tag = "type", ...)]` and `#[ts(tag = "type", ...)]`.
- Prefer plain `String` IDs at the API boundary (do UUID parsing/conversion internally if needed).
- Timestamps should be integer Unix seconds (`i64`) and named `*_at` (for example, `created_at`, `updated_at`, `resets_at`).
- For experimental API surface area:
  use `#[experimental("method/or/field")]`, derive `ExperimentalApi` when field-level gating is needed, and use `inspect_params: true` in `common.rs` when only some fields of a method are experimental.

### Client->server request payloads (`*Params`)

- Every optional field must be annotated with `#[ts(optional = nullable)]`. Do not use `#[ts(optional = nullable)]` outside client->server request payloads (`*Params`).
- Optional collection fields (for example `Vec`, `HashMap`) must use `Option<...>` + `#[ts(optional = nullable)]`. Do not use `#[serde(default)]` to model optional collections, and do not use `skip_serializing_if` on v2 payload fields.
- When you want omission to mean `false` for boolean fields, use `#[serde(default, skip_serializing_if = "std::ops::Not::not")] pub field: bool` over `Option<bool>`.
- For new list methods, implement cursor pagination by default:
  request fields `pub cursor: Option<String>` and `pub limit: Option<u32>`,
  response fields `pub data: Vec<...>` and `pub next_cursor: Option<String>`.

### Development Workflow

- Update docs/examples when API behavior changes (at minimum `app-server/README.md`).
- Regenerate schema fixtures when API shapes change:
  `just write-app-server-schema`
  (and `just write-app-server-schema --experimental` when experimental API fixtures are affected).
- Validate with `cargo test -p codex-app-server-protocol`.
- Avoid boilerplate tests that only assert experimental field markers for individual
  request fields in `common.rs`; rely on schema generation/tests and behavioral coverage instead.
-												feat: add support for AGENTS.md in Rust CLI (#885)

The TypeScript CLI already has support for including the contents of
`AGENTS.md` in the instructions sent with the first turn of a
conversation. This PR brings this functionality to the Rust CLI.

To be considered, `AGENTS.md` must be in the `cwd` of the session, or in
one of the parent folders up to a Git/filesystem root (whichever is
encountered first).

By default, a maximum of 32 KiB of `AGENTS.md` will be included, though
this is configurable using the new-in-this-PR `project_doc_max_bytes`
option in `config.toml`.
											
										
										
											2025-05-10 17:52:59 -07:00
+								# Rust/codex-rs
 								In the codex-rs folder where the rust code lives:
-												Fix typo in AGENTS.md (#2518)

- Change `examole` to `example`
											
										
										
											2025-08-22 16:05:39 -07:00
+								- Crate names are prefixed with `codex-`. For example, the `core` folder's crate is named `codex-core`
-												[1/3] Parse exec commands and format them more nicely in the UI (#2095)

# Note for reviewers
The bulk of this PR is in in the new file, `parse_command.rs`. This file
is designed to be written TDD and implemented with Codex. Do not worry
about reviewing the code, just review the unit tests (if you want). If
any cases are missing, we'll add more tests and have Codex fix them.

I think the best approach will be to land and iterate. I have some
follow-ups I want to do after this lands. The next PR after this will
let us merge (and dedupe) multiple sequential cells of the same such as
multiple read commands. The deduping will also be important because the
model often reads the same file multiple times in a row in chunks

===

This PR formats common commands like reading, formatting, testing, etc
more nicely:

It tries to extract things like file names, tests and falls back to the
cmd if it doesn't. It also only shows stdout/err if the command failed.

<img width="770" height="238" alt="CleanShot 2025-08-09 at 16 05 15"
src="https://github.com/user-attachments/assets/0ead179a-8910-486b-aa3d-7d26264d751e"
/>
<img width="348" height="158" alt="CleanShot 2025-08-09 at 16 05 32"
src="https://github.com/user-attachments/assets/4302681b-5e87-4ff3-85b4-0252c6c485a9"
/>
<img width="834" height="324" alt="CleanShot 2025-08-09 at 16 05 56 2"
src="https://github.com/user-attachments/assets/09fb3517-7bd6-40f6-a126-4172106b700f"
/>

Part 2: https://github.com/openai/codex/pull/2097
Part 3: https://github.com/openai/codex/pull/2110
											
										
										
											2025-08-11 11:26:15 -07:00
+								- When using format! and you can inline variables into {}, always do that.
-												AGENTS.md: Add instruction to install missing commands (#3807)

This change instructs the model to install any missing command. Else
tokens are wasted when it tries to run
commands that aren't available multiple times before installing them.
											
										
										
											2025-09-17 11:06:59 -07:00
+								- Install any commands the repo relies on (for example `just`, `rg`, or `cargo-insta`) if they aren't already available before running instructions here.
-												feat: make .git read-only within a writable root when using Seatbelt (#1765)

To make `--full-auto` safer, this PR updates the Seatbelt policy so that
a `SandboxPolicy` with a `writable_root` that contains a `.git/`
_directory_ will make `.git/` _read-only_ (though as a follow-up, we
should also consider the case where `.git` is a _file_ with a `gitdir:
/path/to/actual/repo/.git` entry that should also be protected).

The two major changes in this PR:

- Updating `SandboxPolicy::get_writable_roots_with_cwd()` to return a
`Vec<WritableRoot>` instead of a `Vec<PathBuf>` where a `WritableRoot`
can specify a list of read-only subpaths.
- Updating `create_seatbelt_command_args()` to honor the read-only
subpaths in `WritableRoot`.

The logic to update the policy is a fairly straightforward update to
`create_seatbelt_command_args()`, but perhaps the more interesting part
of this PR is the introduction of an integration test in
`tests/sandbox.rs`. Leveraging the new API in #1785, we test
`SandboxPolicy` under various conditions, including ones where `$TMPDIR`
is not readable, which is critical for verifying the new behavior.

To ensure that Codex can run its own tests, e.g.:

```
just codex debug seatbelt --full-auto -- cargo test if_git_repo_is_writable_root_then_dot_git_folder_is_read_only
```

I had to introduce the use of `CODEX_SANDBOX=sandbox`, which is
comparable to how `CODEX_SANDBOX_NETWORK_DISABLED=1` was already being
used.

Adding a comparable change for Landlock will be done in a subsequent PR.
											
										
										
											2025-08-01 16:11:24 -07:00
+								- Never add or modify any code related to `CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR` or `CODEX_SANDBOX_ENV_VAR`.
 								  - You operate in a sandbox where `CODEX_SANDBOX_NETWORK_DISABLED=1` will be set whenever you use the `shell` tool. Any existing code that uses `CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR` was authored with this fact in mind. It is often used to early exit out of tests that the author knew you would not be able to run given your sandbox limitations.
 								  - Similarly, when you spawn a process using Seatbelt (`/usr/bin/sandbox-exec`), `CODEX_SANDBOX=seatbelt` will be set on the child process. Integration tests that want to run Seatbelt themselves cannot be run under Seatbelt, so checks for `CODEX_SANDBOX=seatbelt` are also often used to early exit out of tests, as appropriate.
-												[MCP] Add support for MCP Oauth credentials (#4517)

This PR adds oauth login support to streamable http servers when
`experimental_use_rmcp_client` is enabled.

This PR is large but represents the minimal amount of work required for
this to work. To keep this PR smaller, login can only be done with
`codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
or `codex mcp list` yet. Fingers crossed that this is the last large MCP
PR and that subsequent PRs can be smaller.

Under the hood, credentials are stored using platform credential
managers using the [keyring crate](https://crates.io/crates/keyring).
When the keyring isn't available, it falls back to storing credentials
in `CODEX_HOME/.credentials.json` which is consistent with how other
coding agents handle authentication.

I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
to test the dbus store on linux but did verify that the fallback works.

One quirk is that if you have credentials, during development, every
build will have its own ad-hoc binary so the keyring won't recognize the
reader as being the same as the write so it may ask for the user's
password. I may add an override to disable this or allow
users/enterprises to opt-out of the keyring storage if it causes issues.

<img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
/>
<img width="745" height="486" alt="image"
src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
/>
											
										
										
											2025-10-03 10:43:12 -07:00
+								- Always collapse if statements per https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_if
 								- Always inline format! args when possible per https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
 								- Use method references over closures when possible per https://rust-lang.github.io/rust-clippy/master/index.html#redundant_closure_for_method_calls
-												Add argument-comment Dylint runner (#14651)
											
										
										
											2026-03-14 08:18:04 -07:00
+								- Avoid bool or ambiguous `Option` parameters that force callers to write hard-to-read code such as `foo(false)` or `bar(None)`. Prefer enums, named methods, newtypes, or other idiomatic Rust API shapes when they keep the callsite self-documenting.
 								- When you cannot make that API change and still need a small positional-literal callsite in Rust, follow the `argument_comment_lint` convention:
 								  - Use an exact `/*param_name*/` comment before opaque literal arguments such as `None`, booleans, and numeric literals when passing them by position.
 								  - Do not add these comments for string or char literals unless the comment adds real clarity; those literals are intentionally exempt from the lint.
 								  - If you add one of these comments, the parameter name must exactly match the callee signature.
-												Reject request_user_input outside Plan/Pair (#9955)

## Context

Previous work in https://github.com/openai/codex/pull/9560 only rejected
`request_user_input` in Execute and Custom modes. Since then, additional
modes
(e.g., Code) were added, so the guard should be mode-agnostic.

## What changed

- Switch the handler to an allowlist: only Plan and PairProgramming are
allowed
- Return the same error for any other mode (including Code)
- Add a Code-mode rejection test alongside the existing Execute/Custom
tests

## Why

This prevents `request_user_input` from being used in modes where it is
not
intended, even as new modes are introduced.
											
										
										
											2026-01-26 17:12:17 -08:00
+								- When possible, make `match` statements exhaustive and avoid wildcard arms.
-												[MCP] Add support for MCP Oauth credentials (#4517)

This PR adds oauth login support to streamable http servers when
`experimental_use_rmcp_client` is enabled.

This PR is large but represents the minimal amount of work required for
this to work. To keep this PR smaller, login can only be done with
`codex mcp login` and `codex mcp logout` but it doesn't appear in `/mcp`
or `codex mcp list` yet. Fingers crossed that this is the last large MCP
PR and that subsequent PRs can be smaller.

Under the hood, credentials are stored using platform credential
managers using the [keyring crate](https://crates.io/crates/keyring).
When the keyring isn't available, it falls back to storing credentials
in `CODEX_HOME/.credentials.json` which is consistent with how other
coding agents handle authentication.

I tested this on macOS, Windows, WSL (ubuntu), and Linux. I wasn't able
to test the dbus store on linux but did verify that the fallback works.

One quirk is that if you have credentials, during development, every
build will have its own ad-hoc binary so the keyring won't recognize the
reader as being the same as the write so it may ask for the user's
password. I may add an override to disable this or allow
users/enterprises to opt-out of the keyring storage if it causes issues.

<img width="5064" height="686" alt="CleanShot 2025-09-30 at 19 31 40"
src="https://github.com/user-attachments/assets/9573f9b4-07f1-4160-83b8-2920db287e2d"
/>
<img width="745" height="486" alt="image"
src="https://github.com/user-attachments/assets/9562649b-ea5f-4f22-ace2-d0cb438b143e"
/>
											
										
										
											2025-10-03 10:43:12 -07:00
+								- When writing tests, prefer comparing the equality of entire objects over fields one by one.
-												[MCP] Add support for resources (#5239)

This PR adds support for [MCP
resources](https://modelcontextprotocol.io/specification/2025-06-18/server/resources)
by adding three new tools for the model:
1. `list_resources`
2. `list_resource_templates`
3. `read_resource`

These 3 tools correspond to the [three primary MCP resource protocol
messages](https://modelcontextprotocol.io/specification/2025-06-18/server/resources#protocol-messages).

Example of listing and reading a GitHub resource tempalte
<img width="2984" height="804" alt="CleanShot 2025-10-15 at 17 31 10"
src="https://github.com/user-attachments/assets/89b7f215-2e2a-41c5-90dd-b932ac84a585"
/>

`/mcp` with Figma configured
<img width="2984" height="442" alt="CleanShot 2025-10-15 at 18 29 35"
src="https://github.com/user-attachments/assets/a7578080-2ed2-4c59-b9b4-d8461f90d8ee"
/>

Fixes #4956
											
										
										
											2025-10-16 22:05:15 -07:00
+								- When making a change that adds or changes an API, ensure that the documentation in the `docs/` folder is up to date if applicable.
-												add generated jsonschema for config.toml (#8956)

### What
Add JSON Schema generation for `config.toml`, with checked‑in
`docs/config.schema.json`. We can move the schema elsewhere if preferred
(and host it if there's demand).

Add fixture test to prevent drift and `just write-config-schema` to
regenerate on schema changes.

Generate MCP config schema from `RawMcpServerConfig` instead of
`McpServerConfig` because that is the runtime type used for
deserialization.

Populate feature flag values into generated schema so they can be
autocompleted.

### Tests
Added tests + regenerate script to prevent drift. Tested autocompletions
using generated jsonschema locally with Even Better TOML.



https://github.com/user-attachments/assets/5aa7cd39-520c-4a63-96fb-63798183d0bc
											
										
										
											2026-01-13 10:22:51 -08:00
+								- If you change `ConfigToml` or nested config types, run `just write-config-schema` to update `codex-rs/core/config.schema.json`.
-												bazel: enforce MODULE.bazel.lock sync with Cargo.lock (#11790)

## Why this change

When Cargo dependencies change, it is easy to end up with an unexpected
local diff in
`MODULE.bazel.lock` after running Bazel. That creates noisy working
copies and pushes lockfile fixes
later in the cycle. This change addresses that pain point directly.

## What this change enforces

The expected invariant is: after dependency updates, `MODULE.bazel.lock`
is already in sync with
Cargo resolution. In practice, running `bazel mod deps` should not
mutate the lockfile in a clean
state. If it does, the dependency update is incomplete.

## How this is enforced

This change adds a single lockfile check script that snapshots
`MODULE.bazel.lock`, runs
`bazel mod deps`, and fails if the file changes. The same check is wired
into local workflow
commands (`just bazel-lock-update` and `just bazel-lock-check`) and into
Bazel CI (Linux x86_64 job)
so drift is caught early and consistently. The developer documentation
is updated in
`codex-rs/docs/bazel.md` and `AGENTS.md` to make the expected flow
explicit.

`MODULE.bazel.lock` is also refreshed in this PR to match the current
Cargo dependency resolution.

## Expected developer workflow

After changing `Cargo.toml` or `Cargo.lock`, run `just
bazel-lock-update`, then run
`just bazel-lock-check`, and include any resulting `MODULE.bazel.lock`
update in the same change.

## Testing

Ran `just bazel-lock-check` locally.
											
										
										
											2026-02-13 18:11:19 -08:00
+								- If you change Rust dependencies (`Cargo.toml` or `Cargo.lock`), run `just bazel-lock-update` from the
 								  repo root to refresh `MODULE.bazel.lock`, and include that lockfile update in the same change.
 								- After dependency changes, run `just bazel-lock-check` from the repo root so lockfile drift is caught
 								  locally before CI.
-												client: extend custom CA handling across HTTPS and websocket clients (#14239)

## Stacked PRs

This work is now effectively split across two steps:

- #14178: add custom CA support for browser and device-code login flows,
docs, and hermetic subprocess tests
- #14239: extend that shared custom CA handling across Codex HTTPS
clients and secure websocket TLS

Note: #14240 was merged into this branch while it was stacked on top of
this PR. This PR now subsumes that websocket follow-up and should be
treated as the combined change.

Builds on top of #14178.

## Problem

Custom CA support landed first in the login path, but the real
requirement is broader. Codex constructs outbound TLS clients in
multiple places, and both HTTPS and secure websocket paths can fail
behind enterprise TLS interception if they do not honor
`CODEX_CA_CERTIFICATE` or `SSL_CERT_FILE` consistently.

This PR broadens the shared custom-CA logic beyond login and applies the
same policy to websocket TLS, so the enterprise-proxy story is no longer
split between “HTTPS works” and “websockets still fail”.

## What This Delivers

Custom CA support is no longer limited to login. Codex outbound HTTPS
clients and secure websocket connections can now honor the same
`CODEX_CA_CERTIFICATE` / `SSL_CERT_FILE` configuration, so enterprise
proxy/intercept setups work more consistently end-to-end.

For users and operators, nothing new needs to be configured beyond the
same CA env vars introduced in #14178. The change is that more of Codex
now respects them, including websocket-backed flows that were previously
still using default trust roots.

I also manually validated the proxy path locally with mitmproxy using:
`CODEX_CA_CERTIFICATE=~/.mitmproxy/mitmproxy-ca-cert.pem
HTTPS_PROXY=http://127.0.0.1:8080 just codex`
with mitmproxy installed via `brew install mitmproxy` and configured as
the macOS system proxy.

## Mental model

`codex-client` is now the owner of shared custom-CA policy for outbound
TLS client construction. Reqwest callers start from the builder
configuration they already need, then pass that builder through
`build_reqwest_client_with_custom_ca(...)`. Websocket callers ask the
same module for a rustls client config when a custom CA bundle is
configured.

The env precedence is the same everywhere:
- `CODEX_CA_CERTIFICATE` wins
- otherwise fall back to `SSL_CERT_FILE`
- otherwise use system roots

The helper is intentionally narrow. It loads every usable certificate
from the configured PEM bundle into the appropriate root store and
returns either a configured transport or a typed error that explains
what went wrong.

## Non-goals

This does not add handshake-level integration tests against a live TLS
endpoint. It does not validate that the configured bundle forms a
meaningful certificate chain. It also does not try to force every
transport in the repo through one abstraction; it extends the shared CA
policy across the reqwest and websocket paths that actually needed it.

## Tradeoffs

The main tradeoff is centralizing CA behavior in `codex-client` while
still leaving adoption up to call sites. That keeps the implementation
additive and reviewable, but it means the rule "outbound Codex TLS that
should honor enterprise roots must use the shared helper" is still
partly enforced socially rather than by types.

For websockets, the shared helper only builds an explicit rustls config
when a custom CA bundle is configured. When no override env var is set,
websocket callers still use their ordinary default connector path.

## Architecture

`codex-client::custom_ca` now owns CA bundle selection, PEM
normalization, mixed-section parsing, certificate extraction, typed
CA-loading errors, and optional rustls client-config construction for
websocket TLS.

The affected consumers now call into that shared helper directly rather
than carrying login-local CA behavior:
- backend-client
- cloud-tasks
- RMCP client paths that use `reqwest`
- TUI voice HTTP paths
- `codex-core` default reqwest client construction
- `codex-api` websocket clients for both responses and realtime
websocket connections

The subprocess CA probe, env-sensitive integration tests, and shared PEM
fixtures also live in `codex-client`, which is now the actual owner of
the behavior they exercise.

## Observability

The shared CA path logs:
- which environment variable selected the bundle
- which path was loaded
- how many certificates were accepted
- when `TRUSTED CERTIFICATE` labels were normalized
- when CRLs were ignored
- where client construction failed

Returned errors remain user-facing and include the relevant env var,
path, and remediation hint. That same error model now applies whether
the failure surfaced while building a reqwest client or websocket TLS
configuration.

## Tests

Pure unit tests in `codex-client` cover env precedence and PEM
normalization behavior. Real client construction remains in subprocess
tests so the suite can control process env and avoid the macOS seatbelt
panic path that motivated the hermetic test split.

The subprocess coverage verifies:
- `CODEX_CA_CERTIFICATE` precedence over `SSL_CERT_FILE`
- fallback to `SSL_CERT_FILE`
- single-cert and multi-cert bundles
- malformed and empty-file errors
- OpenSSL `TRUSTED CERTIFICATE` handling
- CRL tolerance for well-formed CRL sections

The websocket side is covered by the existing `codex-api` / `codex-core`
websocket test suites plus the manual mitmproxy validation above.

---------

Co-authored-by: Ivan Zakharchanka <3axap4eHko@gmail.com>
Co-authored-by: Codex <noreply@openai.com>
											
										
										
											2026-03-12 17:59:26 -07:00
+								- Bazel does not automatically make source-tree files available to compile-time Rust file access. If
 								  you add `include_str!`, `include_bytes!`, `sqlx::migrate!`, or similar build-time file or
 								  directory reads, update the crate's `BUILD.bazel` (`compile_data`, `build_script_data`, or test
 								  data) or Bazel may fail even when Cargo passes.
-												Try to stop small helper methods (#11203)


											
										
										
											2026-02-09 12:01:30 -08:00
+								- Do not create small helper methods that are referenced only once.
-												Add keyboard based fast switching between agents in TUI (#13923)
											
										
										
											2026-03-10 19:41:51 -07:00
+								- Avoid large modules:
 								  - Prefer adding new modules instead of growing existing ones.
 								  - Target Rust modules under 500 LoC, excluding tests.
 								  - If a file exceeds roughly 800 LoC, add new functionality in a new module instead of extending
 								    the existing file unless there is a strong documented reason not to.
 								  - This rule applies especially to high-touch files that already attract unrelated changes, such
 								    as `codex-rs/tui/src/app.rs`, `codex-rs/tui/src/bottom_pane/chat_composer.rs`,
 								    `codex-rs/tui/src/bottom_pane/footer.rs`, `codex-rs/tui/src/chatwidget.rs`,
 								    `codex-rs/tui/src/bottom_pane/mod.rs`, and similarly central orchestration modules.
 								  - When extracting code from a large module, move the related tests and module/type docs toward
 								    the new implementation so the invariants stay close to the code that owns them.
-												chore: auto format code on save and add more details to AGENTS.md (#1582)

Adds a default vscode config with generally applicable settings.
Adds more entrypoints to justfile both  for environment setup and to help
agents better verify changes.
											
										
										
											2025-07-17 11:40:00 -07:00
-												chore: tweak AGENTS.md (#9650)

## Summary
Update AGENTS.md to improve testing flow

## Testing
- [x] Tested locally, much faster
											
										
										
											2026-01-21 20:20:45 -08:00
+								Run `just fmt` (in `codex-rs` directory) automatically after you have finished making Rust code changes; do not ask for approval to run it. Additionally, run the tests:
-												chore: subject docs/*.md to Prettier checks (#4645)

Apparently we were not running our `pnpm run prettier` check in CI, so
many files that were covered by the existing Prettier check were not
well-formatted.

This updates CI and formats the files.
											
										
										
											2025-10-03 11:35:48 -07:00
-												AGENTS.md more strongly suggests running targeted tests first (#2306)


											
										
										
											2025-08-14 20:51:32 -04:00
+. Run the test for the specific project that was changed. For example, if changes were made in `codex-rs/tui`, run `cargo test -p codex-tui`.
-												feat: discourage the use of the --all-features flag (#12429)

## Why

Developers are frequently running low on disk space, and routine use of
`--all-features` contributes to larger Cargo build caches in `target/`
by compiling additional feature combinations.

This change updates local workflow guidance to avoid `--all-features` by
default and reserve it for cases where full feature coverage is
specifically needed.

## What Changed

- Updated `AGENTS.md` guidance for `codex-rs` to recommend `cargo test`
/ `just test` for full-suite local runs, and to call out the disk-usage
cost of routine `--all-features` usage.
- Updated the root `justfile` so `just fix` and `just clippy` no longer
pass `--all-features` by default.
- Updated `docs/install.md` to explicitly describe `cargo test
--all-features` as an optional heavier-weight run (more build time and
`target/` disk usage).

## Verification

- Confirmed the `justfile` parses and the recipes list successfully with
`just --list`.
											
										
										
											2026-02-20 23:02:24 -08:00
+. Once those pass, if any changes were made in common, core, or protocol, run the complete test suite with `cargo test` (or `just test` if `cargo-nextest` is installed). Avoid `--all-features` for routine local runs because it expands the build matrix and can significantly increase `target/` disk usage; use it only when you specifically need full feature coverage. project-specific or individual tests can be run without asking the user, but do ask the user before running the complete test suite.
-												tui: align diff display by always showing sign char and keeping fixed gutter (#2353)

diff lines without a sign char were misaligned.
											
										
										
											2025-08-15 12:32:45 -04:00
-												Try to stop small helper methods (#11203)


											
										
										
											2026-02-09 12:01:30 -08:00
+								Before finalizing a large change to `codex-rs`, run `just fix -p <project>` (in `codex-rs` directory) to fix any linter issues in the code. Prefer scoping with `-p` to avoid slow workspace‑wide Clippy builds; only run `just fix` without `-p` if you changed shared crates. Do not re-run tests after running `fix` or `fmt`.
-												chore: tweak AGENTS.md (#9650)

## Summary
Update AGENTS.md to improve testing flow

## Testing
- [x] Tested locally, much faster
											
										
										
											2026-01-21 20:20:45 -08:00
-												Use released DotSlash package for argument-comment lint (#15199)

## Why
The argument-comment lint now has a packaged DotSlash artifact from
[#15198](https://github.com/openai/codex/pull/15198), so the normal repo
lint path should use that released payload instead of rebuilding the
lint from source every time.

That keeps `just clippy` and CI aligned with the shipped artifact while
preserving a separate source-build path for people actively hacking on
the lint crate.

The current alpha package also exposed two integration wrinkles that the
repo-side prebuilt wrapper needs to smooth over:
- the bundled Dylint library filename includes the host triple, for
example `@nightly-2025-09-18-aarch64-apple-darwin`, and Dylint derives
`RUSTUP_TOOLCHAIN` from that filename
- on Windows, Dylint's driver path also expects `RUSTUP_HOME` to be
present in the environment

Without those adjustments, the prebuilt CI jobs fail during `cargo
metadata` or driver setup. This change makes the checked-in prebuilt
wrapper normalize the packaged library name to the plain
`nightly-2025-09-18` channel before invoking `cargo-dylint`, and it
teaches both the wrapper and the packaged runner source to infer
`RUSTUP_HOME` from `rustup show home` when the environment does not
already provide it.

After the prebuilt Windows lint job started running successfully, it
also surfaced a handful of existing anonymous literal callsites in
`windows-sandbox-rs`. This PR now annotates those callsites so the new
cross-platform lint job is green on the current tree.

## What Changed
- checked in the current
`tools/argument-comment-lint/argument-comment-lint` DotSlash manifest
- kept `tools/argument-comment-lint/run.sh` as the source-build wrapper
for lint development
- added `tools/argument-comment-lint/run-prebuilt-linter.sh` as the
normal enforcement path, using the checked-in DotSlash package and
bundled `cargo-dylint`
- updated `just clippy` and `just argument-comment-lint` to use the
prebuilt wrapper
- split `.github/workflows/rust-ci.yml` so source-package checks live in
a dedicated `argument_comment_lint_package` job, while the released lint
runs in an `argument_comment_lint_prebuilt` matrix on Linux, macOS, and
Windows
- kept the pinned `nightly-2025-09-18` toolchain install in the prebuilt
CI matrix, since the prebuilt package still relies on rustup-provided
toolchain components
- updated `tools/argument-comment-lint/run-prebuilt-linter.sh` to
normalize host-qualified nightly library filenames, keep the `rustup`
shim directory ahead of direct toolchain `cargo` binaries, and export
`RUSTUP_HOME` when needed for Windows Dylint driver setup
- updated `tools/argument-comment-lint/src/bin/argument-comment-lint.rs`
so future published DotSlash artifacts apply the same nightly-filename
normalization and `RUSTUP_HOME` inference internally
- fixed the remaining Windows lint violations in
`codex-rs/windows-sandbox-rs` by adding the required `/*param*/`
comments at the reported callsites
- documented the checked-in DotSlash file, wrapper split, archive
layout, nightly prerequisite, and Windows `RUSTUP_HOME` requirement in
`tools/argument-comment-lint/README.md`
											
										
										
											2026-03-19 20:19:22 -07:00
+								Also run `just argument-comment-lint` to ensure the codebase is clean of comment lint errors.
-												fix: clean up styles & colors and define in styles.md (#2401)

New style guide:

  # Headers, primary, and secondary text
  
- **Headers:** Use `bold`. For markdown with various header levels,
leave in the `#` signs.
  - **Primary text:** Default.
  - **Secondary text:** Use `dim`.
  
  # Foreground colors
  
- **Default:** Most of the time, just use the default foreground color.
`reset` can help get it back.
- **Selection:** Use ANSI `blue`. (Ed & AE want to make this cyan too,
but we'll do that in a followup since it's riskier in different themes.)
  - **User input tips and status indicators:** Use ANSI `cyan`.
  - **Success and additions:** Use ANSI `green`.
  - **Errors, failures and deletions:** Use ANSI `red`.
  - **Codex:** Use ANSI `magenta`.
  
  # Avoid
  
- Avoid custom colors because there's no guarantee that they'll contrast
well or look good on various terminal color themes.
- Avoid ANSI `black`, `white`, `yellow` as foreground colors because the
terminal theme will do a better job. (Use `reset` if you need to in
order to get those.) The exception is if you need contrast rendering
over a manually colored background.
  
  (There are some rules to try to catch this in `clippy.toml`.)

# Testing

Tested in a variety of light and dark color themes in Terminal, iTerm2, and Ghostty.
											
										
										
											2025-08-18 08:26:29 -07:00
+								## TUI style conventions
 								See `codex-rs/tui/styles.md`.
-												color the status letter in apply patch summary (#2337)

<img width="440" height="77" alt="Screenshot 2025-08-14 at 8 30 30 PM"
src="https://github.com/user-attachments/assets/c6169a3a-2e98-4ace-b7ee-918cf4368b7a"
/>
											
										
										
											2025-08-15 16:25:48 -04:00
+								## TUI code conventions
-												Move TUI on top of app server (parallel code) (#14717)

This PR replicates the `tui` code directory and creates a temporary
parallel `tui_app_server` directory. It also implements a new feature
flag `tui_app_server` to select between the two tui implementations.

Once the new app-server-based TUI is stabilized, we'll delete the old
`tui` directory and feature flag.
											
										
										
											2026-03-16 10:49:19 -06:00
+								- When a change lands in `codex-rs/tui` and `codex-rs/tui_app_server` has a parallel implementation of the same behavior, reflect the change in `codex-rs/tui_app_server` too unless there is a documented reason not to.
-												color the status letter in apply patch summary (#2337)

<img width="440" height="77" alt="Screenshot 2025-08-14 at 8 30 30 PM"
src="https://github.com/user-attachments/assets/c6169a3a-2e98-4ace-b7ee-918cf4368b7a"
/>
											
										
										
											2025-08-15 16:25:48 -04:00
+								- Use concise styling helpers from ratatui’s Stylize trait.
 								  - Basic spans: use "text".into()
 								  - Styled spans: use "text".red(), "text".green(), "text".magenta(), "text".dim(), etc.
 								  - Prefer these over constructing styles with `Span::styled` and `Style` directly.
 								  - Example: patch summary file lines
 								    - Desired: vec!["  └ ".into(), "M".red(), " ".dim(), "tui/src/app.rs".dim()]
-												prefer ratatui Stylized for constructing lines/spans (#3068)

no functional change, just simplifying ratatui styling and adding
guidance in AGENTS.md for future.
											
										
										
											2025-09-02 16:19:54 -07:00
+								### TUI Styling (ratatui)
-												chore: subject docs/*.md to Prettier checks (#4645)

Apparently we were not running our `pnpm run prettier` check in CI, so
many files that were covered by the existing Prettier check were not
well-formatted.

This updates CI and formats the files.
											
										
										
											2025-10-03 11:35:48 -07:00
-												prefer ratatui Stylized for constructing lines/spans (#3068)

no functional change, just simplifying ratatui styling and adding
guidance in AGENTS.md for future.
											
										
										
											2025-09-02 16:19:54 -07:00
+								- Prefer Stylize helpers: use "text".dim(), .bold(), .cyan(), .italic(), .underlined() instead of manual Style where possible.
 								- Prefer simple conversions: use "text".into() for spans and vec![…].into() for lines; when inference is ambiguous (e.g., Paragraph::new/Cell::from), use Line::from(spans) or Span::from(text).
 								- Computed styles: if the Style is computed at runtime, using `Span::styled` is OK (`Span::from(text).set_style(style)` is also acceptable).
 								- Avoid hardcoded white: do not use `.white()`; prefer the default foreground (no color).
 								- Chaining: combine helpers by chaining for readability (e.g., url.cyan().underlined()).
 								- Single items: prefer "text".into(); use Line::from(text) or Span::from(text) only when the target type isn’t obvious from context, or when using .into() would require extra type annotations.
 								- Building lines: use vec![…].into() to construct a Line when the target type is obvious and no extra type annotations are needed; otherwise use Line::from(vec![…]).
 								- Avoid churn: don’t refactor between equivalent forms (Span::styled ↔ set_style, Line::from ↔ .into()) without a clear readability or functional gain; follow file‑local conventions and do not introduce type annotations solely to satisfy .into().
 								- Compactness: prefer the form that stays on one line after rustfmt; if only one of Line::from(vec![…]) or vec![…].into() avoids wrapping, choose that. If both wrap, pick the one with fewer wrapped lines.
-												syntax-highlight bash lines (#3142)

i'm not yet convinced i have the best heuristics for what to highlight,
but this feels like a useful step towards something a bit easier to
read, esp. when the model is producing large commands.

<img width="669" height="589" alt="Screenshot 2025-09-03 at 8 21 56 PM"
src="https://github.com/user-attachments/assets/b9cbcc43-80e8-4d41-93c8-daa74b84b331"
/>

also a fairly significant refactor of our line wrapping logic.
											
										
										
											2025-09-05 07:10:32 -07:00
+								### Text wrapping
-												chore: subject docs/*.md to Prettier checks (#4645)

Apparently we were not running our `pnpm run prettier` check in CI, so
many files that were covered by the existing Prettier check were not
well-formatted.

This updates CI and formats the files.
											
										
										
											2025-10-03 11:35:48 -07:00
-												syntax-highlight bash lines (#3142)

i'm not yet convinced i have the best heuristics for what to highlight,
but this feels like a useful step towards something a bit easier to
read, esp. when the model is producing large commands.

<img width="669" height="589" alt="Screenshot 2025-09-03 at 8 21 56 PM"
src="https://github.com/user-attachments/assets/b9cbcc43-80e8-4d41-93c8-daa74b84b331"
/>

also a fairly significant refactor of our line wrapping logic.
											
										
										
											2025-09-05 07:10:32 -07:00
+								- Always use textwrap::wrap to wrap plain strings.
 								- If you have a ratatui Line and you want to wrap it, use the helpers in tui/src/wrapping.rs, e.g. word_wrap_lines / word_wrap_line.
 								- If you need to indent wrapped lines, use the initial_indent / subsequent_indent options from RtOptions if you can, rather than writing custom logic.
 								- If you have a list of lines and you need to prefix them all with some prefix (optionally different on the first vs subsequent lines), use the `prefix_lines` helper from line_utils.
 								## Tests
 								### Snapshot tests
-												tui: align diff display by always showing sign char and keeping fixed gutter (#2353)

diff lines without a sign char were misaligned.
											
										
										
											2025-08-15 12:32:45 -04:00
-												docs: require insta snapshot coverage for UI changes (#10669)

Adds an explicit requirement in AGENTS.md that any user-visible UI
change includes corresponding insta snapshot coverage and that snapshots
are reviewed/accepted in the PR.

Tests: N/A (docs only)
											
										
										
											2026-02-12 14:47:09 -08:00
+								This repo uses snapshot tests (via `insta`), especially in `codex-rs/tui`, to validate rendered output.
 								**Requirement:** any change that affects user-visible UI (including adding new UI) must include
 								corresponding `insta` snapshot coverage (add a new snapshot test if one doesn't exist yet, or
 								update the existing snapshot). Review and accept snapshot updates as part of the PR so UI impact
 								is easy to review and future diffs stay visual.
 								When UI or text output changes intentionally, update the snapshots as follows:
-												tui: align diff display by always showing sign char and keeping fixed gutter (#2353)

diff lines without a sign char were misaligned.
											
										
										
											2025-08-15 12:32:45 -04:00
 								- Run tests to generate any updated snapshots:
 								  - `cargo test -p codex-tui`
 								- Check what’s pending:
 								  - `cargo insta pending-snapshots -p codex-tui`
 								- Review changes by reading the generated `*.snap.new` files directly in the repo, or preview a specific file:
 								  - `cargo insta show -p codex-tui path/to/file.snap.new`
 								- Only if you intend to accept all new snapshots in this crate, run:
 								  - `cargo insta accept -p codex-tui`
 								If you don’t have the tool:
-												chore: subject docs/*.md to Prettier checks (#4645)

Apparently we were not running our `pnpm run prettier` check in CI, so
many files that were covered by the existing Prettier check were not
well-formatted.

This updates CI and formats the files.
											
										
										
											2025-10-03 11:35:48 -07:00
-												tui: align diff display by always showing sign char and keeping fixed gutter (#2353)

diff lines without a sign char were misaligned.
											
										
										
											2025-08-15 12:32:45 -04:00
+								- `cargo install cargo-insta`
-												syntax-highlight bash lines (#3142)

i'm not yet convinced i have the best heuristics for what to highlight,
but this feels like a useful step towards something a bit easier to
read, esp. when the model is producing large commands.

<img width="669" height="589" alt="Screenshot 2025-09-03 at 8 21 56 PM"
src="https://github.com/user-attachments/assets/b9cbcc43-80e8-4d41-93c8-daa74b84b331"
/>

also a fairly significant refactor of our line wrapping logic.
											
										
										
											2025-09-05 07:10:32 -07:00
 								### Test assertions
 								- Tests should use pretty_assertions::assert_eq for clearer diffs. Import this at the top of the test module if it isn't already.
-												feat(core) Add login to shell_command tool (#6846)

## Summary
Adds the `login` parameter to the `shell_command` tool - optional,
defaults to true.

## Testing
- [x] Tested locally
											
										
										
											2025-12-05 11:03:25 -08:00
+								- Prefer deep equals comparisons whenever possible. Perform `assert_eq!()` on entire objects, rather than individual fields.
-												Fixed resume matching to respect case insensitivity when using WSL mount points (#8000)

This fixes #7995
											
										
										
											2025-12-16 18:27:38 -06:00
+								- Avoid mutating process environment in tests; prefer passing environment-derived flags or dependencies from above.
-												Simplify request body assertions (#4845)

We'll have a lot more test like these
											
										
										
											2025-10-07 01:56:39 -07:00
-												feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)

To support Bazelification in https://github.com/openai/codex/pull/8875,
this PR introduces a new `find_resource!` macro that we use in place of
our existing logic in tests that looks for resources relative to the
compile-time `CARGO_MANIFEST_DIR` env var.

To make this work, we plan to add the following to all `rust_library()`
and `rust_test()` Bazel rules in the project:

```
rustc_env = {
    "BAZEL_PACKAGE": native.package_name(),
},
```

Our new `find_resource!` macro reads this value via
`option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
using `find_resource!`_ is injected into the code expanded from the
macro. (If `find_resource()` were a function, then
`option_env!("BAZEL_PACKAGE")` would always be
`codex-rs/utils/cargo-bin`, which is not what we want.)

Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
environment variable is set at runtime, indicating that the test is
being run by Bazel. In this case, we have to concatenate the runtime
`RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
path to the resource.

In testing this change, I discovered one funky edge case in
`codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
(but not canonicalize!) the result from `find_resource!` because the
path contains a `common/..` component that does not exist on disk when
the test is run under Bazel, so it must be semantically normalized using
the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
before it is passed to `dotslash fetch`.

Because this new behavior may be non-obvious, this PR also updates
`AGENTS.md` to make humans/Codex aware that this API is preferred.
											
										
										
											2026-01-07 18:06:08 -08:00
+								### Spawning workspace binaries in tests (Cargo vs Bazel)
-												feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)

This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.

As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.

See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.

Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.

My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
											
										
										
											2025-12-23 19:29:32 -08:00
 								- Prefer `codex_utils_cargo_bin::cargo_bin("...")` over `assert_cmd::Command::cargo_bin(...)` or `escargot` when tests need to spawn first-party binaries.
-												feat: introduce find_resource! macro that works with Cargo or Bazel (#8879)

To support Bazelification in https://github.com/openai/codex/pull/8875,
this PR introduces a new `find_resource!` macro that we use in place of
our existing logic in tests that looks for resources relative to the
compile-time `CARGO_MANIFEST_DIR` env var.

To make this work, we plan to add the following to all `rust_library()`
and `rust_test()` Bazel rules in the project:

```
rustc_env = {
    "BAZEL_PACKAGE": native.package_name(),
},
```

Our new `find_resource!` macro reads this value via
`option_env!("BAZEL_PACKAGE")` so that the Bazel package _of the code
using `find_resource!`_ is injected into the code expanded from the
macro. (If `find_resource()` were a function, then
`option_env!("BAZEL_PACKAGE")` would always be
`codex-rs/utils/cargo-bin`, which is not what we want.)

Note we only consider the `BAZEL_PACKAGE` value when the `RUNFILES_DIR`
environment variable is set at runtime, indicating that the test is
being run by Bazel. In this case, we have to concatenate the runtime
`RUNFILES_DIR` with the compile-time `BAZEL_PACKAGE` value to build the
path to the resource.

In testing this change, I discovered one funky edge case in
`codex-rs/exec-server/tests/common/lib.rs` where we have to _normalize_
(but not canonicalize!) the result from `find_resource!` because the
path contains a `common/..` component that does not exist on disk when
the test is run under Bazel, so it must be semantically normalized using
the [`path-absolutize`](https://crates.io/crates/path-absolutize) crate
before it is passed to `dotslash fetch`.

Because this new behavior may be non-obvious, this PR also updates
`AGENTS.md` to make humans/Codex aware that this API is preferred.
											
										
										
											2026-01-07 18:06:08 -08:00
+								  - Under Bazel, binaries and resources may live under runfiles; use `codex_utils_cargo_bin::cargo_bin` to resolve absolute paths that remain stable after `chdir`.
 								- When locating fixture files or test resources under Bazel, avoid `env!("CARGO_MANIFEST_DIR")`. Prefer `codex_utils_cargo_bin::find_resource!` so paths resolve correctly under both Cargo and Bazel runfiles.
-												feat: introduce codex-utils-cargo-bin as an alternative to assert_cmd::Command (#8496)

This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.

As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.

See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.

Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.

My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496
											
										
										
											2025-12-23 19:29:32 -08:00
-												Simplify request body assertions (#4845)

We'll have a lot more test like these
											
										
										
											2025-10-07 01:56:39 -07:00
+								### Integration tests (core)
 								- Prefer the utilities in `core_test_support::responses` when writing end-to-end Codex tests.
 								- All `mount_sse*` helpers return a `ResponseMock`; hold onto it so you can assert against outbound `/responses` POST bodies.
 								- Use `ResponseMock::single_request()` when a test should only issue one POST, or `ResponseMock::requests()` to inspect every captured `ResponsesRequest`.
 								- `ResponsesRequest` exposes helpers (`body_json`, `input`, `function_call_output`, `custom_tool_call_output`, `call_output`, `header`, `path`, `query_param`) so assertions can target structured payloads instead of manual JSON digging.
 								- Build SSE payloads with the provided `ev_*` constructors and the `sse(...)`.
-												Prefer `wait_for_event` over `wait_for_event_with_timeout`. (#6346)

No need to specify the timeout in most cases.
											
										
										
											2025-11-06 16:14:43 -08:00
+								- Prefer `wait_for_event` over `wait_for_event_with_timeout`.
-												tests: replace mount_sse_once_match with mount_sse_once for SSE mocking (#6640)


											
										
										
											2025-11-13 18:04:05 -08:00
+								- Prefer `mount_sse_once` over `mount_sse_once_match` or `mount_sse_sequence`
-												Simplify request body assertions (#4845)

We'll have a lot more test like these
											
										
										
											2025-10-07 01:56:39 -07:00
 								- Typical pattern:
 								  ```rust
 								  let mock = responses::mount_sse_once(&server, responses::sse(vec![
 								      responses::ev_response_created("resp-1"),
 								      responses::ev_function_call(call_id, "shell", &serde_json::to_string(&args)?),
 								      responses::ev_completed("resp-1"),
 								  ])).await;
 								  codex.submit(Op::UserTurn { ... }).await?;
 								  // Assert request body if needed.
 								  let request = mock.single_request();
 								  // assert using request.function_call_output(call_id) or request.json_body() or other helpers.
 								  ```
-												fix(app-server): fix TS annotations for optional fields on requests (#10412)

This updates our generated TypeScript types to be more correct with how
the server actually behaves, **specifically for JSON-RPC requests**.

Before this PR, we'd generate `field: T | null`. After this PR, we will
have `field?: T | null`. The latter matches how the server actually
works, in that if an optional field is omitted, the server will treat it
as null. This also makes it less annoying in theory for clients to
upgrade to newer versions of Codex, since adding a new optional field to
a JSON-RPC request should not require a client change.

NOTE: This only applies to JSON-RPC requests. All other payloads (i.e.
responses, notifications) will return `field: T | null` as usual.
											
										
										
											2026-02-03 11:51:37 -08:00
 								## App-server API Development Best Practices
 								These guidelines apply to app-server protocol work in `codex-rs`, especially:
 								- `app-server-protocol/src/protocol/common.rs`
 								- `app-server-protocol/src/protocol/v2.rs`
 								- `app-server/README.md`
 								### Core Rules
 								- All active API development should happen in app-server v2. Do not add new API surface area to v1.
 								- Follow payload naming consistently:
 								  `*Params` for request payloads, `*Response` for responses, and `*Notification` for notifications.
 								- Expose RPC methods as `<resource>/<method>` and keep `<resource>` singular (for example, `thread/read`, `app/list`).
 								- Always expose fields as camelCase on the wire with `#[serde(rename_all = "camelCase")]` unless a tagged union or explicit compatibility requirement needs a targeted rename.
-												chore(app-server): update AGENTS.md for config + optional collection guidance (#10914)

Based on recent app-server PRs
											
										
										
											2026-02-06 12:45:27 -08:00
+								- Exception: config RPC payloads are expected to use snake_case to mirror config.toml keys (see the config read/write/list APIs in `app-server-protocol/src/protocol/v2.rs`).
-												fix(app-server): fix TS annotations for optional fields on requests (#10412)

This updates our generated TypeScript types to be more correct with how
the server actually behaves, **specifically for JSON-RPC requests**.

Before this PR, we'd generate `field: T | null`. After this PR, we will
have `field?: T | null`. The latter matches how the server actually
works, in that if an optional field is omitted, the server will treat it
as null. This also makes it less annoying in theory for clients to
upgrade to newer versions of Codex, since adding a new optional field to
a JSON-RPC request should not require a client change.

NOTE: This only applies to JSON-RPC requests. All other payloads (i.e.
responses, notifications) will return `field: T | null` as usual.
											
										
										
											2026-02-03 11:51:37 -08:00
+								- Always set `#[ts(export_to = "v2/")]` on v2 request/response/notification types so generated TypeScript lands in the correct namespace.
 								- Never use `#[serde(skip_serializing_if = "Option::is_none")]` for v2 API payload fields.
 								  Exception: client->server requests that intentionally have no params may use:
 								  `params: #[ts(type = "undefined")] #[serde(skip_serializing_if = "Option::is_none")] Option<()>`.
 								- Keep Rust and TS wire renames aligned. If a field or variant uses `#[serde(rename = "...")]`, add matching `#[ts(rename = "...")]`.
 								- For discriminated unions, use explicit tagging in both serializers:
 								  `#[serde(tag = "type", ...)]` and `#[ts(tag = "type", ...)]`.
 								- Prefer plain `String` IDs at the API boundary (do UUID parsing/conversion internally if needed).
 								- Timestamps should be integer Unix seconds (`i64`) and named `*_at` (for example, `created_at`, `updated_at`, `resets_at`).
 								- For experimental API surface area:
 								  use `#[experimental("method/or/field")]`, derive `ExperimentalApi` when field-level gating is needed, and use `inspect_params: true` in `common.rs` when only some fields of a method are experimental.
-												chore(app-server): update AGENTS.md for config + optional collection guidance (#10914)

Based on recent app-server PRs
											
										
										
											2026-02-06 12:45:27 -08:00
+								### Client->server request payloads (`*Params`)
 								- Every optional field must be annotated with `#[ts(optional = nullable)]`. Do not use `#[ts(optional = nullable)]` outside client->server request payloads (`*Params`).
 								- Optional collection fields (for example `Vec`, `HashMap`) must use `Option<...>` + `#[ts(optional = nullable)]`. Do not use `#[serde(default)]` to model optional collections, and do not use `skip_serializing_if` on v2 payload fields.
 								- When you want omission to mean `false` for boolean fields, use `#[serde(default, skip_serializing_if = "std::ops::Not::not")] pub field: bool` over `Option<bool>`.
 								- For new list methods, implement cursor pagination by default:
 								  request fields `pub cursor: Option<String>` and `pub limit: Option<u32>`,
 								  response fields `pub data: Vec<...>` and `pub next_cursor: Option<String>`.
-												fix(app-server): fix TS annotations for optional fields on requests (#10412)

This updates our generated TypeScript types to be more correct with how
the server actually behaves, **specifically for JSON-RPC requests**.

Before this PR, we'd generate `field: T | null`. After this PR, we will
have `field?: T | null`. The latter matches how the server actually
works, in that if an optional field is omitted, the server will treat it
as null. This also makes it less annoying in theory for clients to
upgrade to newer versions of Codex, since adding a new optional field to
a JSON-RPC request should not require a client change.

NOTE: This only applies to JSON-RPC requests. All other payloads (i.e.
responses, notifications) will return `field: T | null` as usual.
											
										
										
											2026-02-03 11:51:37 -08:00
+								### Development Workflow
 								- Update docs/examples when API behavior changes (at minimum `app-server/README.md`).
 								- Regenerate schema fixtures when API shapes change:
 								  `just write-app-server-schema`
 								  (and `just write-app-server-schema --experimental` when experimental API fixtures are affected).
 								- Validate with `cargo test -p codex-app-server-protocol`.
-												feat(app-server): experimental flag to persist extended history (#11227)

This PR adds an experimental `persist_extended_history` bool flag to
app-server thread APIs so rollout logs can retain a richer set of
EventMsgs for non-lossy Thread > Turn > ThreadItems reconstruction (i.e.
on `thread/resume`).

### Motivation
Today, our rollout recorder only persists a small subset (e.g. user
message, reasoning, assistant message) of `EventMsg` types, dropping a
good number (like command exec, file change, etc.) that are important
for reconstructing full item history for `thread/resume`, `thread/read`,
and `thread/fork`.

Some clients want to be able to resume a thread without lossiness. This
lossiness is primarily a UI thing, since what the model sees are
`ResponseItem` and not `EventMsg`.

### Approach
This change introduces an opt-in `persist_full_history` flag to preserve
those events when you start/resume/fork a thread (defaults to `false`).

This is done by adding an `EventPersistenceMode` to the rollout
recorder:
- `Limited` (existing behavior, default)
- `Extended` (new opt-in behavior)

In `Extended` mode, persist additional `EventMsg` variants needed for
non-lossy app-server `ThreadItem` reconstruction. We now store the
following ThreadItems that we didn't before:
- web search
- command execution
- patch/file changes
- MCP tool calls
- image view calls
- collab tool outcomes
- context compaction
- review mode enter/exit

For **command executions** in particular, we truncate the output using
the existing `truncate_text` from core to store an upper bound of 10,000
bytes, which is also the default value for truncating tool outputs shown
to the model. This keeps the size of the rollout file and command
execution items returned over the wire reasonable.

And we also persist `EventMsg::Error` which we can now map back to the
Turn's status and populates the Turn's error metadata.

#### Updates to EventMsgs
To truly make `thread/resume` non-lossy, we also needed to persist the
`status` on `EventMsg::CommandExecutionEndEvent` and
`EventMsg::PatchApplyEndEvent`. Previously it was not obvious whether a
command failed or was declined (similar for apply_patch). These
EventMsgs were never persisted before so I made it a required field.
											
										
										
											2026-02-12 11:34:22 -08:00
+								- Avoid boilerplate tests that only assert experimental field markers for individual
 								  request fields in `common.rs`; rely on schema generation/tests and behavioral coverage instead.