## What changed
- `codex-stdio-to-uds` now tolerates `NotConnected` when
`shutdown(Write)` happens after the peer has already closed.
- The socket test was rewritten to send stdin from a fixture file and to
read an exact request payload length instead of waiting on EOF timing.
## Why this fixes the flake
- This one exposed a real cross-platform runtime edge case: on macOS,
the peer can close first after a successful exchange, and
`shutdown(Write)` can report `NotConnected` even though the interaction
already succeeded.
- Treating that specific ordering as a harmless shutdown condition
removes the production-level false failure.
- The old test compounded the problem by depending on EOF timing, which
varies by platform and scheduler. Exact-length IO makes the test
deterministic and focused on the actual data exchange.
## Scope
- Production logic change with matching test rewrite.
This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).
If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:
- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:
```
failures:
suite::compact::manual_compact_twice_preserves_latest_user_messages
suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```
- `just build-for-release` to build release binaries for all
platforms/architectures remotely
To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)
## CI
In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.
The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:
```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```
Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:
- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875
For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.
Release builds will also continue to be done by Cargo for now.
Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504
---------
Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
This PR introduces a `codex-utils-cargo-bin` utility crate that
wraps/replaces our use of `assert_cmd::Command` and
`escargot::CargoBuild`.
As you can infer from the introduction of `buck_project_root()` in this
PR, I am attempting to make it possible to build Codex under
[Buck2](https://buck2.build) as well as `cargo`. With Buck2, I hope to
achieve faster incremental local builds (largely due to Buck2's
[dice](https://buck2.build/docs/insights_and_knowledge/modern_dice/)
build strategy, as well as benefits from its local build daemon) as well
as faster CI builds if we invest in remote execution and caching.
See
https://buck2.build/docs/getting_started/what_is_buck2/#why-use-buck2-key-advantages
for more details about the performance advantages of Buck2.
Buck2 enforces stronger requirements in terms of build and test
isolation. It discourages assumptions about absolute paths (which is key
to enabling remote execution). Because the `CARGO_BIN_EXE_*` environment
variables that Cargo provides are absolute paths (which
`assert_cmd::Command` reads), this is a problem for Buck2, which is why
we need this `codex-utils-cargo-bin` utility.
My WIP-Buck2 setup sets the `CARGO_BIN_EXE_*` environment variables
passed to a `rust_test()` build rule as relative paths.
`codex-utils-cargo-bin` will resolve these values to absolute paths,
when necessary.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/8496).
* #8498
* __->__ #8496