Commit graph

239 commits

Author SHA1 Message Date
Ahmed Ibrahim
69cfc73dc6
change collaboration mode to struct (#9793)
Shouldn't cause behavioral change
2026-01-23 17:00:23 -08:00
JUAN DAVID SALAS CAMARGO
f815fa14ea
Fix execpolicy parsing for multiline quoted args (#9565)
## What
Fix bash command parsing to accept double-quoted strings that contain
literal newlines so execpolicy can match allow rules.

## Why
Allow rules like [git, commit] should still match when commit messages
include a newline in a quoted argument; the parser currently rejects
these strings and falls back to the outer shell invocation.

## How
- Validate double-quoted strings by ensuring all named children are
string_content and then stripping the outer quotes from the raw node
text so embedded newlines are preserved.
- Reuse the helper for concatenated arguments.
- Ensure large SI suffix formatting uses the caller-provided locale
formatter for grouping.
- Add coverage for newline-containing quoted arguments.

Fixes #9541.

## Tests
- cargo test -p codex-core
- just fix -p codex-core
- cargo test -p codex-protocol
- just fix -p codex-protocol
- cargo test --all-features
2026-01-22 22:16:53 -08:00
Dylan Hurd
8b3521ee77
feat(core) update Personality on turn (#9644)
## Summary
Support updating Personality mid-Thread via UserTurn/OverwriteTurn. This
is explicitly unused by the clients so far, to simplify PRs - app-server
and tui implementations will be follow-ups.

## Testing
- [x] added integration tests
2026-01-22 12:04:23 -08:00
pakrym-oai
b511c38ddb
Support end_turn flag (#9698)
Experimental flag that signals the end of the turn.
2026-01-22 17:27:48 +00:00
pakrym-oai
836f0343a3
Add tui.experimental_mode setting (#9656)
To simplify testing
2026-01-22 05:27:57 +00:00
xl-openai
577ba3a4ca
Add UI for skill enable/disable. (#9627)
"/skill" will now allow you to enable/disable skills:
<img width="658" height="199" alt="image"
src="https://github.com/user-attachments/assets/bf8994c8-d6c1-462f-8bbb-f1ee9241caa4"
/>
2026-01-21 18:21:12 -08:00
Dylan Hurd
96a72828be
feat(core) ModelInfo.model_instructions_template (#9597)
## Summary
#9555 is the start of a rename, so I'm starting to standardize here.
Sets up `model_instructions` templating with a strongly-typed object for
injecting a personality block into the model instructions.

## Testing
- [x] Added tests
- [x] Ran locally
2026-01-21 18:11:18 -08:00
Eric Traut
2ca9a56528
Add layered config.toml support to app server (#9510)
This PR adds support for chained (layered) config.toml file merging for
clients that use the app server interface. This feature already exists
for the TUI, but it does not work for GUI clients.

It does the following:
* Changes code paths for new thread, resume thread, and fork thread to
use the effective config based on the cwd.
* Updates the `config/read` API to accept an optional `cwd` parameter.
If specified, the API returns the effective config based on that cwd
path. Also optionally includes all layers including project config
files. If cwd is not specified, the API falls back on its older behavior
where it considers only the global (non-project) config files when
computing the effective config.

The changes in codex_message_processor.rs look deceptively large. They
mostly just involve moving existing blocks of code to a later point in
some functions so it can use the cwd to calculate the config.

This PR builds upon #9509 and should be reviewed and merged after that
PR.

Tested:
* Verified change with (dependent, as-yet-uncommitted) changes to IDE
Extension and confirmed correct behavior

The full fix requires additional changes in the IDE Extension code base,
but they depend on this PR.
2026-01-21 14:21:48 -08:00
charley-oai
fe641f759f
Add collaboration_mode to TurnContextItem (#9583)
## Summary
- add optional `collaboration_mode` to `TurnContextItem` in rollouts
- persist the current collaboration mode when recording turn context
(sampling + compaction)

## Rationale
We already persist turn context data for resume logic. Capturing
collaboration mode in the rollout gives us the mode context for each
turn, enabling follow‑up work to diff mode instructions correctly on
resume.

## Changes
- protocol: add optional `collaboration_mode` field to `TurnContextItem`
- core: persist collaboration mode alongside other turn context settings
in rollouts
2026-01-21 14:14:21 -08:00
Shijie Rao
3fcb40245e
Chore: update plan mode output in prompt (#9592)
### Summary
* Update plan prompt output
* Update requestUserInput response to be a single key value pair
`answer: String`.
2026-01-21 14:12:18 -08:00
charley-oai
531748a080
Prompt Expansion: Preserve Text Elements (#9518)
Summary
- Preserve `text_elements` through custom prompt argument parsing and
expansion (named and numeric placeholders).
- Translate text element ranges through Shlex parsing using sentinel
substitution, and rehydrate text + element ranges per arg.
- Drop image attachments when their placeholder does not survive prompt
expansion, keeping attachments consistent with rendered elements.
- Mirror changes in TUI2 and expand tests for prompt parsing/expansion
edge cases.

Tests
- placeholders with spaces as single tokens (positional + key=value,
quoted + unquoted),
  - prompt expansion with image placeholders,
  - large paste + image arg combinations,
  - unused image arg dropped after expansion.
2026-01-20 18:30:20 -08:00
charley-oai
be9e55c5fc
Add total (non-partial) TextElement placeholder accessors (#9545)
## Summary
- Make `TextElement` placeholders private and add a text-backed accessor
to avoid assuming `Some`.
- Since they are optional in the protocol, we want to make sure any
accessors properly handle the None case (getting the placeholder using
the byte range in the text)
- Preserve placeholders during protocol/app-server conversions using the
accessor fallback.
- Update TUI composer/remap logic and tests to use the new
constructor/accessor.
2026-01-20 14:04:11 -08:00
Dylan Hurd
675f165c56
fix(core) Preserve base_instructions in SessionMeta (#9427)
## Summary
This PR consolidates base_instructions onto SessionMeta /
SessionConfiguration, so we ensure `base_instructions` is set once per
session and should be (mostly) immutable, unless:
- overridden by config on resume / fork
- sub-agent tasks, like review or collab


In a future PR, we should convert all references to `base_instructions`
to consistently used the typed struct, so it's less likely that we put
other strings there. See #9423. However, this PR is already quite
complex, so I'm deferring that to a follow-up.

## Testing
- [x] Added a resume test to assert that instructions are preserved. In
particular, `resume_switches_models_preserves_base_instructions` fails
against main.

Existing test coverage thats assert base instructions are preserved
across multiple requests in a session:
- Manual compact keeps baseline instructions:
core/tests/suite/compact.rs:199
- Auto-compact keeps baseline instructions:
core/tests/suite/compact.rs:1142
- Prompt caching reuses the same instructions across two requests:
core/tests/suite/prompt_caching.rs:150 and
core/tests/suite/prompt_caching.rs:157
- Prompt caching with explicit expected string across two requests:
core/tests/suite/prompt_caching.rs:213 and
core/tests/suite/prompt_caching.rs:222
- Resume with model switch keeps original instructions:
core/tests/suite/resume.rs:136
- Compact/resume/fork uses request 0 instructions for later expected
payloads: core/tests/suite/compact_resume_fork.rs:215
2026-01-19 21:59:36 -08:00
Ahmed Ibrahim
65d3b9e145
Migrate tui to use UserTurn (#9497)
- `tui/` and `tui2/` submit `Op::UserTurn` and own full turn context
(cwd/approval/sandbox/model/etc.).
- `Op::UserInput` is documented as legacy in `codex-protocol` (doc-only;
no `#[deprecated]` to avoid `-D warnings` fallout).
- Remove obsolete `#[allow(deprecated)]` and the unused `ConversationId`
alias/re-export.
2026-01-19 13:40:39 -08:00
Shijie Rao
57ec3a8277
Feat: request user input tool (#9472)
### Summary
* Add `requestUserInput` tool that the model can use for gather
feedback/asking question mid turn.


### Tool input schema
```
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "requestUserInput input",
  "type": "object",
  "additionalProperties": false,
  "required": ["questions"],
  "properties": {
    "questions": {
      "type": "array",
      "description": "Questions to show the user (1-3). Prefer 1 unless multiple independent decisions block progress.",
      "minItems": 1,
      "maxItems": 3,
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["id", "header", "question"],
        "properties": {
          "id": {
            "type": "string",
            "description": "Stable identifier for mapping answers (snake_case)."
          },
          "header": {
            "type": "string",
            "description": "Short header label shown in the UI (12 or fewer chars)."
          },
          "question": {
            "type": "string",
            "description": "Single-sentence prompt shown to the user."
          },
          "options": {
            "type": "array",
            "description": "Optional 2-3 mutually exclusive choices. Put the recommended option first and suffix its label with \"(Recommended)\". Only include \"Other\" option if we want to include a free form option. If the question is free form in nature, do not include any option.",
            "minItems": 2,
            "maxItems": 3,
            "items": {
              "type": "object",
              "additionalProperties": false,
              "required": ["value", "label", "description"],
              "properties": {
                "value": {
                  "type": "string",
                  "description": "Machine-readable value (snake_case)."
                },
                "label": {
                  "type": "string",
                  "description": "User-facing label (1-5 words)."
                },
                "description": {
                  "type": "string",
                  "description": "One short sentence explaining impact/tradeoff if selected."
                }
              }
            }
          }
        }
      }
    }
  }
}
```

### Tool output schema
```
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "requestUserInput output",
  "type": "object",
  "additionalProperties": false,
  "required": ["answers"],
  "properties": {
    "answers": {
      "type": "object",
      "description": "Map of question id to user answer.",
      "additionalProperties": {
        "type": "object",
        "additionalProperties": false,
        "required": ["selected"],
        "properties": {
          "selected": {
            "type": "array",
            "items": { "type": "string" }
          },
          "other": {
            "type": ["string", "null"]
          }
        }
      }
    }
  }
}
```
2026-01-19 10:17:30 -08:00
Ahmed Ibrahim
31415ebfcf
Remove unused protocol collaboration mode prompts (#9463)
Delete duplicate collaboration mode markdown under protocol prompts;
core templates remain the single source of truth.
2026-01-19 08:54:19 -08:00
Ahmed Ibrahim
f72f87fbee
Add collaboration modes test prompts (#9443)
# External (non-OpenAI) Pull Request Requirements

Before opening this Pull Request, please read the dedicated
"Contributing" markdown file or your PR may be closed:
https://github.com/openai/codex/blob/main/docs/contributing.md

If your PR conforms to our contribution guidelines, replace this text
with a detailed and high quality description of your changes.

Include a link to a bug report or enhancement request.
2026-01-18 11:39:08 -08:00
Ahmed Ibrahim
1478a88eb0
Add collaboration developer instructions (#9424)
- Add additional instructions when they are available
- Make sure to update them on change either UserInput or UserTurn
2026-01-18 01:31:14 +00:00
Dylan Hurd
80d7a5d7fe
chore(instructions) Remove unread SessionMeta.instructions field (#9423)
### Description
- Remove the now-unused `instructions` field from the session metadata
to simplify SessionMeta and stop propagating transient instruction text
through the rollout recorder API. This was only saving
user_instructions, and was never being read.
- Stop passing user instructions into the rollout writer at session
creation so the rollout header only contains canonical session metadata.

### Testing

- Ran `just fmt` which completed successfully.
- Ran `just fix -p codex-protocol`, `just fix -p codex-core`, `just fix
-p codex-app-server`, `just fix -p codex-tui`, and `just fix -p
codex-tui2` which completed (Clippy fixes applied) as part of
verification.
- Ran `cargo test -p codex-protocol` which passed (28 tests).
- Ran `cargo test -p codex-core` which showed failures in a small set of
tests (not caused by the protocol type change directly):
`default_client::tests::test_create_client_sets_default_headers`,
several `models_manager::manager::tests::refresh_available_models_*`,
and `shell_snapshot::tests::linux_sh_snapshot_includes_sections` (these
tests failed in this CI run).
- Ran `cargo test -p codex-app-server` which reported several failing
integration tests (including
`suite::codex_message_processor_flow::test_codex_jsonrpc_conversation_flow`,
`suite::output_schema::send_user_turn_*`, and
`suite::user_agent::get_user_agent_returns_current_codex_user_agent`).
- `cargo test -p codex-tui` and `cargo test -p codex-tui2` were
attempted but aborted due to disk space exhaustion (`No space left on
device`).

------
[Codex
Task](https://chatgpt.com/codex/tasks/task_i_696bd8ce632483228d298cf07c7eb41c)
2026-01-17 16:02:28 -08:00
Ahmed Ibrahim
146d54cede
Add collaboration_mode override to turns (#9408) 2026-01-16 21:51:25 -08:00
Ahmed Ibrahim
246f506551
Introduce collaboration modes (#9340)
- Merge `model` and `reasoning_effort` under collaboration modes.
- Add additional instructions for custom collaboration mode
- Default to Custom to not change behavior
2026-01-17 00:28:22 +00:00
Anton Panasenko
c26fe64539
feat: show forked from session id in /status (#9330)
Summary:
- Add forked_from to SessionMeta/SessionConfiguredEvent and persist it
for forked sessions.
- Surface forked_from in /status for tui + tui2 and add snapshots.
2026-01-16 13:41:46 -08:00
viyatb-oai
f89a40a849
chore: upgrade to Rust 1.92.0 (#8860)
**Summary**
- Upgrade Rust toolchain used by CI to 1.92.0.
- Address new clippy `derivable_impls` warnings by deriving `Default`
for enums across protocol, core, backend openapi models, and
windows-sandbox setup.
- Tidy up related test/config behavior (originator header handling, env
override cleanup) and remove a now-unused assignment in TUI/TUI2 render
layout.

**Testing**
- `just fmt`
- `just fix -p codex-tui`
- `just fix -p codex-tui2`
- `just fix -p codex-windows-sandbox`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-windows-sandbox`
- `cargo test -p codex-core --test all`
- `cargo test -p codex-app-server --test all`
- `cargo test -p codex-mcp-server --test all`
- `cargo test --all-features`
2026-01-16 11:12:52 -08:00
jif-oai
c576756c81
feat: collab wait multiple IDs (#9294) 2026-01-16 12:05:04 +01:00
charley-oai
1fa8350ae7
Add text element metadata to protocol, app server, and core (#9331)
The second part of breaking up PR
https://github.com/openai/codex/pull/9116

Summary:

- Add `TextElement` / `ByteRange` to protocol user inputs and user
message events with defaults.
- Thread `text_elements` through app-server v1/v2 request handling and
history rebuild.
- Preserve UI metadata only in user input/events (not `ContentItem`)
while keeping local image attachments in user events for rehydration.

Details:

- Protocol: `UserInput::Text` carries `text_elements`;
`UserMessageEvent` carries `text_elements` + `local_images`.
Serialization includes empty vectors for backward compatibility.
- app-server-protocol: v1 defines `V1TextElement` / `V1ByteRange` in
camelCase with conversions; v2 uses its own camelCase wrapper.
- app-server: v1/v2 input mapping includes `text_elements`; thread
history rebuilds include them.
- Core: user event emission preserves UI metadata while model history
stays clean; history replay round-trips the metadata.
2026-01-15 17:26:41 -08:00
xl-openai
42fa4c237f
Support SKILL.toml file. (#9125)
We’re introducing a new SKILL.toml to hold skill metadata so Codex can
deliver a richer Skills experience.

Initial focus is the interface block:
```
[interface]
display_name = "Optional user-facing name"
short_description = "Optional user-facing description"
icon_small = "./assets/small-400px.png"
icon_large = "./assets/large-logo.svg"
brand_color = "#3B82F6"
default_prompt = "Optional surrounding prompt to use the skill with"
```

All fields are exposed via the app server API.
display_name and short_description are consumed by the TUI.
2026-01-15 11:20:04 -08:00
Ahmed Ibrahim
a09711332a
Add migration_markdown in model_info (#9219)
Next step would be to clean Model Upgrade in model presets

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: aibrahim-oai <219906144+aibrahim-oai@users.noreply.github.com>
2026-01-15 01:55:22 +00:00
charley-oai
4a9c2bcc5a
Add text element metadata to types (#9235)
Initial type tweaking PR to make the diff of
https://github.com/openai/codex/pull/9116 smaller

This should not change any behavior, just adds some fields to types
2026-01-14 16:41:50 -08:00
sayan-oai
5e426ac270
add WebSearchMode enum (#9216)
### What
Add `WebSearchMode` enum (disabled, cached live, defaults to cached) to
config + V2 protocol. This enum takes precedence over legacy flags:
`web_search_cached`, `web_search_request`, and `tools.web_search`.

Keep `--search` as live.

### Tests
Added tests
2026-01-14 12:51:42 -08:00
jif-oai
6a939ed7a4
feat: emit events around collab tools (#9095)
Emit the following events around the collab tools. On the `app-server`
this will be under `item/started` and `item/completed`
```
#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnBeginEvent {
    /// Identifier for the collab tool call.
    pub call_id: String,
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
    /// beginning.
    pub prompt: String,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentSpawnEndEvent {
    /// Identifier for the collab tool call.
    pub call_id: String,
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the newly spawned agent, if it was created.
    pub new_thread_id: Option<ThreadId>,
    /// Initial prompt sent to the agent. Can be empty to prevent CoT leaking at the
    /// beginning.
    pub prompt: String,
    /// Last known status of the new agent reported to the sender agent.
    pub status: AgentStatus,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionBeginEvent {
    /// Identifier for the collab tool call.
    pub call_id: String,
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the receiver.
    pub receiver_thread_id: ThreadId,
    /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
    /// leaking at the beginning.
    pub prompt: String,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabAgentInteractionEndEvent {
    /// Identifier for the collab tool call.
    pub call_id: String,
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the receiver.
    pub receiver_thread_id: ThreadId,
    /// Prompt sent from the sender to the receiver. Can be empty to prevent CoT
    /// leaking at the beginning.
    pub prompt: String,
    /// Last known status of the receiver agent reported to the sender agent.
    pub status: AgentStatus,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingBeginEvent {
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the receiver.
    pub receiver_thread_id: ThreadId,
    /// ID of the waiting call.
    pub call_id: String,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabWaitingEndEvent {
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the receiver.
    pub receiver_thread_id: ThreadId,
    /// ID of the waiting call.
    pub call_id: String,
    /// Last known status of the receiver agent reported to the sender agent.
    pub status: AgentStatus,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseBeginEvent {
    /// Identifier for the collab tool call.
    pub call_id: String,
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the receiver.
    pub receiver_thread_id: ThreadId,
}

#[derive(Debug, Clone, Deserialize, Serialize, PartialEq, JsonSchema, TS)]
pub struct CollabCloseEndEvent {
    /// Identifier for the collab tool call.
    pub call_id: String,
    /// Thread ID of the sender.
    pub sender_thread_id: ThreadId,
    /// Thread ID of the receiver.
    pub receiver_thread_id: ThreadId,
    /// Last known status of the receiver agent reported to the sender agent before
    /// the close.
    pub status: AgentStatus,
}
```
2026-01-14 17:55:57 +00:00
viyatb-oai
e1447c3009
feat: add support for read-only bind mounts in the linux sandbox (#9112)
### Motivation

- Landlock alone cannot prevent writes to sensitive in-repo files like
`.git/` when the repo root is writable, so explicit mount restrictions
are required for those paths.
- The sandbox must set up any mounts before calling Landlock so Landlock
can still be applied afterwards and the two mechanisms compose
correctly.

### Description

- Add a new `linux-sandbox` helper `apply_read_only_mounts` in
`linux-sandbox/src/mounts.rs` that: unshares namespaces, maps uids/gids
when required, makes mounts private, bind-mounts targets, and remounts
them read-only.
- Wire the mount step into the sandbox flow by calling
`apply_read_only_mounts(...)` before network/seccomp and before applying
Landlock rules in `linux-sandbox/src/landlock.rs`.
2026-01-14 08:30:46 -08:00
Ahmed Ibrahim
7e33ac7eb6
clean models manager (#9168)
Have only the following Methods:
- `list_models`: getting current available models
- `try_list_models`: sync version no refresh for tui use
- `get_default_model`: get the default model (should be tightened to
core and received on session configuration)
- `get_model_info`: get `ModelInfo` for a specific model (should be
tightened to core but used in tests)
- `refresh_if_new_etag`: trigger refresh on different etags

Also move the cache to its own struct
2026-01-13 16:55:33 -08:00
Ahmed Ibrahim
325ce985f1
Use markdown for migration screen (#8952)
Next steps will be routing this to model info
2026-01-13 07:41:42 +00:00
pakrym-oai
490c1c1fdd
Add model client sessions (#9102)
Maintain a long-running session.
2026-01-13 01:15:56 +00:00
Ahmed Ibrahim
87f7226cca
Assemble sandbox/approval/network prompts dynamically (#8961)
- Add a single builder for developer permissions messaging that accepts
SandboxPolicy and approval policy. This builder now drives the developer
“permissions” message that’s injected at session start and any time
sandbox/approval settings change.
- Trim EnvironmentContext to only include cwd, writable roots, and
shell; removed sandbox/approval/network duplication and adjusted XML
serialization and tests accordingly.

Follow-up: adding a config value to replace the developer permissions
message for custom sandboxes.
2026-01-12 23:12:59 +00:00
Shijie Rao
3e91a95ce1
feat: hot reload mcp servers (#8957)
### Summary
* Added `mcpServer/refresh` command to inform app servers and active
threads to refresh mcpServer on next turn event.
* Added `pending_mcp_server_refresh_config` to codex core so that if the
value is populated, we reinitialize the mcp server manager on the thread
level.
* The config is updated on `mcpServer/refresh` command which we iterate
through threads and provide with the latest config value after last
write.
2026-01-12 11:17:50 -08:00
jif-oai
623707ab58
feat: add wait tool implementation for collab (#9088)
Add implementation for the `wait` tool.

For this we consider all status different from `PendingInit` and
`Running` as terminal. The `wait` tool call will return either after a
given timeout or when the tool reaches a non-terminal status.

A few points to note:
* The usage of a channel is preferred to prevent some races (just
looping on `get_status()` could "miss" a terminal status)
* The order of operations is very important, we need to first subscribe
and then check the last known status to prevent race conditions
* If the channel gets dropped, we return an error on purpose
2026-01-12 12:16:24 +00:00
charley-oai
6709ad8975
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>

In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).

We also tweak the view_file tool description which seemed to help a bit

Results on a simple eval set of images:

Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>

After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>

```json
[
  {
    "id": "single_describe",
    "prompt": "Describe the attached image in one sentence.",
    "images": ["image_a.png"]
  },
  {
    "id": "single_color",
    "prompt": "What is the dominant color in the image? Answer with a single color word.",
    "images": ["image_b.png"]
  },
  {
    "id": "orientation_check",
    "prompt": "Is the image portrait or landscape? Answer in one sentence.",
    "images": ["image_c.png"]
  },
  {
    "id": "detail_request",
    "prompt": "Look closely at the image and call out any small details you notice.",
    "images": ["image_d.png"]
  },
  {
    "id": "two_images_compare",
    "prompt": "I attached two images. Are they the same or different? Briefly explain.",
    "images": ["image_a.png", "image_b.png"]
  },
  {
    "id": "two_images_captions",
    "prompt": "Provide a short caption for each image (Image 1, Image 2).",
    "images": ["image_c.png", "image_d.png"]
  },
  {
    "id": "multi_image_rank",
    "prompt": "Rank the attached images from most colorful to least colorful.",
    "images": ["image_a.png", "image_b.png", "image_c.png"]
  },
  {
    "id": "multi_image_choice",
    "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
    "images": ["image_b.png", "image_d.png"]
  }
]
```
2026-01-09 21:33:45 -08:00
zbarsky-openai
2a06d64bc9
feat: add support for building with Bazel (#8875)
This PR configures Codex CLI so it can be built with
[Bazel](https://bazel.build) in addition to Cargo. The `.bazelrc`
includes configuration so that remote builds can be done using
[BuildBuddy](https://www.buildbuddy.io).

If you are familiar with Bazel, things should work as you expect, e.g.,
run `bazel test //... --keep-going` to run all the tests in the repo,
but we have also added some new aliases in the `justfile` for
convenience:

- `just bazel-test` to run tests locally
- `just bazel-remote-test` to run tests remotely (currently, the remote
build is for x86_64 Linux regardless of your host platform). Note we are
currently seeing the following test failures in the remote build, so we
still need to figure out what is happening here:

```
failures:
    suite::compact::manual_compact_twice_preserves_latest_user_messages
    suite::compact_resume_fork::compact_resume_after_second_compaction_preserves_history
    suite::compact_resume_fork::compact_resume_and_fork_preserve_model_history_view
```

- `just build-for-release` to build release binaries for all
platforms/architectures remotely

To setup remote execution:
- [Create a buildbuddy account](https://app.buildbuddy.io/) (OpenAI
employees should also request org access at
https://openai.buildbuddy.io/join/ with their `@openai.com` email
address.)
- [Copy your API key](https://app.buildbuddy.io/docs/setup/) to
`~/.bazelrc` (add the line `build
--remote_header=x-buildbuddy-api-key=YOUR_KEY`)
- Use `--config=remote` in your `bazel` invocations (or add `common
--config=remote` to your `~/.bazelrc`, or use the `just` commands)

## CI

In terms of CI, this PR introduces `.github/workflows/bazel.yml`, which
uses Bazel to run the tests _locally_ on Mac and Linux GitHub runners
(we are working on supporting Windows, but that is not ready yet). Note
that the failures we are seeing in `just bazel-remote-test` do not occur
on these GitHub CI jobs, so everything in `.github/workflows/bazel.yml`
is green right now.

The `bazel.yml` uses extra config in `.github/workflows/ci.bazelrc` so
that macOS CI jobs build _remotely_ on Linux hosts (using the
`docker://docker.io/mbolin491/codex-bazel` Docker image declared in the
root `BUILD.bazel`) using cross-compilation to build the macOS
artifacts. Then these artifacts are downloaded locally to GitHub's macOS
runner so the tests can be executed natively. This is the relevant
config that enables this:

```
common:macos --config=remote
common:macos --strategy=remote
common:macos --strategy=TestRunner=darwin-sandbox,local
```

Because of the remote caching benefits we get from BuildBuddy, these new
CI jobs can be extremely fast! For example, consider these two jobs that
ran all the tests on Linux x86_64:

- Bazel 1m37s
https://github.com/openai/codex/actions/runs/20861063212/job/59940545209?pr=8875
- Cargo 9m20s
https://github.com/openai/codex/actions/runs/20861063192/job/59940559592?pr=8875

For now, we will continue to run both the Bazel and Cargo jobs for PRs,
but once we add support for Windows and running Clippy, we should be
able to cutover to using Bazel exclusively for PRs, which should still
speed things up considerably. We will probably continue to run the Cargo
jobs post-merge for commits that land on `main` as a sanity check.

Release builds will also continue to be done by Cargo for now.

Earlier attempt at this PR: https://github.com/openai/codex/pull/8832
Earlier attempt to add support for Buck2, now abandoned:
https://github.com/openai/codex/pull/8504

---------

Co-authored-by: David Zbarsky <dzbarsky@gmail.com>
Co-authored-by: Michael Bolin <mbolin@openai.com>
2026-01-09 11:09:43 -08:00
Helmut Januschka
7daaabc795
fix: add tui.alternate_screen config and --no-alt-screen CLI flag for Zellij scrollback (#8555)
Fixes #2558

Codex uses alternate screen mode (CSI 1049) which, per xterm spec,
doesn't support scrollback. Zellij follows this strictly, so users can't
scroll back through output.

**Changes:**
- Add `tui.alternate_screen` config: `auto` (default), `always`, `never`
- Add `--no-alt-screen` CLI flag
- Auto-detect Zellij and skip alt screen (uses existing `ZELLIJ` env var
detection)

**Usage:**
```bash
# CLI flag
codex --no-alt-screen

# Or in config.toml
[tui]
alternate_screen = "never"
```

With default `auto` mode, Zellij users get working scrollback without
any config changes.

---------

Co-authored-by: Josh McKinney <joshka@openai.com>
2026-01-09 18:38:26 +00:00
jif-oai
1aed01e99f
renaming: task to turn (#8963) 2026-01-09 17:31:17 +00:00
jif-oai
e2e3f4490e
chore: add approval metric (#8970) 2026-01-09 13:10:31 +00:00
jif-oai
c9c6560685
nit: parse_arguments (#8927) 2026-01-08 19:49:17 +00:00
Ahmed Ibrahim
9179c9deac
Merge Modelfamily into modelinfo (#8763)
- Merge ModelFamily into ModelInfo
- Remove logic for adding instructions to apply patch
- Add compaction limit and visible context window to `ModelInfo`
2026-01-07 10:35:09 -08:00
jif-oai
116059c3a0
chore: unify conversation with thread name (#8830)
Done and verified by Codex + refactor feature of RustRover
2026-01-07 17:04:53 +00:00
Owen Lin
8b7ec31ba7
feat(app-server): thread/rollback API (#8454)
Add `thread/rollback` to app-server to support IDEs undo-ing the last N
turns of a thread.

For context, an IDE partner will be supporting an "undo" capability
where the IDE (the app-server client) will be responsible for reverting
the local changes made during the last turn. To support this well, we
also need a way to drop the last turn (or more generally, the last N
turns) from the agent's context. This is what `thread/rollback` does.

**Core idea**: A Thread rollback is represented as a persisted event
message (EventMsg::ThreadRollback) in the rollout JSONL file, not by
rewriting history. On resume, both the model's context (core replay) and
the UI turn list (app-server v2's thread history builder) apply these
markers so the pruned history is consistent across live conversations
and `thread/resume`.

Implementation notes:
- Rollback only affects agent context and appends to the rollout file;
clients are responsible for reverting files on disk.
- If a thread rollback is currently in progress, subsequent
`thread/rollback` calls are rejected.
- Because we use `CodexConversation::submit` and codex core tracks
active turns, returning an error on concurrent rollbacks is communicated
via an `EventMsg::Error` with a new variant
`CodexErrorInfo::ThreadRollbackFailed`. app-server watches for that and
sends the BAD_REQUEST RPC response.

Tests cover thread rollbacks in both core and app-server, including when
`num_turns` > existing turns (which clears all turns).

**Note**: this explicitly does **not** behave like `/undo` which we just
removed from the CLI, which does the opposite of what `thread/rollback`
does. `/undo` reverts local changes via ghost commits/snapshots and does
not modify the agent's context / conversation history.
2026-01-06 21:23:48 +00:00
jif-oai
188f79afee
feat: drop agent bus and store the agent status in codex directly (#8788) 2026-01-06 19:44:39 +00:00
Anton Panasenko
807f8a43c2
feat: expose outputSchema to user_turn/turn_start app_server API (#8377)
What changed
- Added `outputSchema` support to the app-server APIs, mirroring `codex
exec --output-schema` behavior.
- V1 `sendUserTurn` now accepts `outputSchema` and constrains the final
assistant message for that turn.
- V2 `turn/start` now accepts `outputSchema` and constrains the final
assistant message for that turn (explicitly per-turn only).

Core behavior
- `Op::UserTurn` already supported `final_output_json_schema`; now V1
`sendUserTurn` forwards `outputSchema` into that field.
- `Op::UserInput` now carries `final_output_json_schema` for per-turn
settings updates; core maps it into
`SessionSettingsUpdate.final_output_json_schema` so it applies to the
created turn context.
- V2 `turn/start` does NOT persist the schema via `OverrideTurnContext`
(it’s applied only for the current turn). Other overrides
(cwd/model/etc) keep their existing persistent behavior.

API / docs
- `codex-rs/app-server-protocol/src/protocol/v1.rs`: add `output_schema:
Option<serde_json::Value>` to `SendUserTurnParams` (serialized as
`outputSchema`).
- `codex-rs/app-server-protocol/src/protocol/v2.rs`: add `output_schema:
Option<JsonValue>` to `TurnStartParams` (serialized as `outputSchema`).
- `codex-rs/app-server/README.md`: document `outputSchema` for
`turn/start` and clarify it applies only to the current turn.
- `codex-rs/docs/codex_mcp_interface.md`: document `outputSchema` for v1
`sendUserTurn` and v2 `turn/start`.

Tests added/updated
- New app-server integration tests asserting `outputSchema` is forwarded
into outbound `/responses` requests as `text.format`:
  - `codex-rs/app-server/tests/suite/output_schema.rs`
  - `codex-rs/app-server/tests/suite/v2/output_schema.rs`
- Added per-turn semantics tests (schema does not leak to the next
turn):
  - `send_user_turn_output_schema_is_per_turn_v1`
  - `turn_start_output_schema_is_per_turn_v2`
- Added protocol wire-compat tests for the merged op:
  - serialize omits `final_output_json_schema` when `None`
  - deserialize works when field is missing
  - serialize includes `final_output_json_schema` when `Some(schema)`

Call site updates (high level)
- Updated all `Op::UserInput { .. }` constructions to include
`final_output_json_schema`:
  - `codex-rs/app-server/src/codex_message_processor.rs`
  - `codex-rs/core/src/codex_delegate.rs`
  - `codex-rs/mcp-server/src/codex_tool_runner.rs`
  - `codex-rs/tui/src/chatwidget.rs`
  - `codex-rs/tui2/src/chatwidget.rs`
  - plus impacted core tests.

Validation
- `just fmt`
- `cargo test -p codex-core`
- `cargo test -p codex-app-server`
- `cargo test -p codex-mcp-server`
- `cargo test -p codex-tui`
- `cargo test -p codex-tui2`
- `cargo test -p codex-protocol`
- `cargo clippy --all-features --tests --profile dev --fix -- -D
warnings`
2026-01-05 10:27:00 -08:00
Ahmed Ibrahim
efd2d76484
Account for last token count on resume (#8677)
last token count in context manager is initialized to 0. Gets populated
only on events from server.

This PR populates it on resume so we can decide if we need to compact or
not.
2026-01-02 23:20:20 +00:00
pakrym-oai
7078a0b676
Log compaction request bodies (#8676)
We already log request bodies for normal requests, logging for
compaction helps with debugging.
2026-01-02 11:27:37 -08:00