Commit graph

4710 commits

Author SHA1 Message Date
pakrym-oai
86803ca9bf
Reuse connection between turns (#12294)
Add a pool of one to the model client to reuse connections across turns.
2026-02-20 10:09:46 -08:00
jif-oai
035c4c30bb
fix: nick name at thread/read (#12347) 2026-02-20 17:53:51 +00:00
Yaroslav Volovich
5b71246001
fix: simplify macOS sleep inhibitor FFI (#12340)
Summary
- simplify the macOS sleep inhibitor FFI by replacing `dlopen` / `dlsym`
/ `transmute` with normal IOKit extern calls and `SAFETY` comments
- switch to cfg-selected platform implementations
(`imp::SleepInhibitor`) instead of `Box<dyn ...>`
- check in minimal IOKit bindings generated with `bindgen` and include
them from the macOS backend
- enable direct IOKit linkage in Bazel macOS builds by registering
`IOKit` in the Bazel `osx.framework(...)` toolchain extension list
- update `Cargo.lock` and `MODULE.bazel.lock` after removing the
build-time `bindgen` dependency path

Testing
- `just fmt`
- `cargo clippy -p codex-utils-sleep-inhibitor --all-targets -- -D
warnings`
- `cargo test -p codex-utils-sleep-inhibitor`
- `bazel test //codex-rs/utils/sleep-inhibitor:all --test_output=errors`
- `just bazel-lock-update`
- `just bazel-lock-check`

Context
- follow-up to #11711 addressing Ryan's review comments
- `bindgen` is used to generate the checked-in bindings file, but not at
build time
2026-02-20 09:52:21 -08:00
jif-oai
fd67aba114
feat: do not enqueue phase 2 if not necessary (#12344) 2026-02-20 17:21:45 +00:00
jif-oai
5a30cd3f92
feat: better agent picker in TUI (#12332)
<img width="486" height="112" alt="Screenshot 2026-02-20 at 15 04 52"
src="https://github.com/user-attachments/assets/0d744f58-d902-4638-aeaf-27e7389ccd73"
/>
2026-02-20 15:40:34 +00:00
jif-oai
4d60c803ba
feat: cleaner TUI for sub-agents (#12327)
<img width="760" height="496" alt="Screenshot 2026-02-20 at 14 31 25"
src="https://github.com/user-attachments/assets/1983b825-bb47-417e-9925-6f727af56765"
/>
2026-02-20 15:26:33 +00:00
colby-oai
2036a5f5e0
Add MCP server context to otel tool_result logs (#12267)
Summary
- capture the origin for each configured MCP server and expose it via
the connection manager
- plumb MCP server name/origin into tool logging and emit
codex.tool_result events with those fields
- add unit coverage for origin parsing and extend OTEL tests to assert
empty MCP fields for non-MCP tools
- currently not logging full urls or url paths to prevent logging
potentially sensitive data

Testing
- Not run (not requested)
2026-02-20 10:26:19 -05:00
jif-oai
ede561b5d1
disable collab for phase 2 (#12326) 2026-02-20 14:51:17 +00:00
jif-oai
595665de35
chore: better agent names (#12328) 2026-02-20 14:51:09 +00:00
jif-oai
0f9eed3a6f
feat: add nick name to sub-agents (#12320)
Adding random nick name to sub-agents. Used for UX

At the same time, also storing and wiring the role of the sub-agent
2026-02-20 14:39:49 +00:00
jif-oai
03ff04cd65
chore: nit explorer (#12315) 2026-02-20 11:24:28 +00:00
jif-oai
a7632f68a6
Set memories phase reasoning effort constants (#12309)
## Summary
- add reasoning effort constants for the memories phase one and phase
two agents
- wire the constants into phase1 request creation and phase2 agent
configuration so the default efforts are always applied

## Testing
- Not run (not requested)
2026-02-20 09:25:35 +00:00
zuxin-oai
e747a8eb74
memories: add rollout_summary_file header to raw memories and tune prompts (#12221)
## Summary
- Add `rollout_summary_file: <generated>.md` to each thread header in
`raw_memories.md` so Phase 2 can reliably reference the canonical
rollout summary filename.
- Update the memory prompts/templates (`stage_one_system`,
`consolidation`, `read_path`) for the new task-oriented raw-memory /
MEMORY.md schema and stronger consolidation guidance.

## Details
- `codex-rs/core/src/memories/storage.rs`
- Writes the generated `rollout_summary_file` path into the per-thread
metadata header when rebuilding `raw_memories.md`.
- `codex-rs/core/src/memories/tests.rs`
- Verifies the canonical `rollout_summary_file` header is present and
ordered after `updated_at`/`cwd` in `raw_memories.md`.
- Verifies task-structured raw-memory content is preserved while the
canonical header is added.
- `codex-rs/core/templates/memories/*.md`
- Updates the stage-1 raw-memory format to task-grouped sections
(`task`, `task_group`, `task_outcome`).
- Updates Phase 2 consolidation guidance around recency (`updated_at`),
task-oriented `MEMORY.md` blocks, and richer evidence-backed
consolidation.
- Tweaks the quick memory pass wording to emphasize topics/workflows in
addition to keywords.

## Testing
- `cargo test -p codex-core memories`
2026-02-20 09:13:35 +00:00
Matthew Zeng
18bd6d2d71
[apps] Store apps tool cache in disk to reduce startup time. (#11822)
We now write MCP tools from installed apps to disk cache so that they
can be picked up instantly at startup. We still do a fresh fetch from
remote MCP server but it's non blocking unless there's a cache miss.

- [x] Store apps tool cache in disk to reduce startup time.
2026-02-19 22:06:51 -08:00
Max Johnson
b06f91c4fe
app-server: improve thread resume rejoin flow (#11776)
thread/resume response includes latest turn with all items, in band so
no events are stale or lost

Testing
- e2e tested using app-server-test-client using flow described in
"Testing Thread Rejoin Behavior" in
codex-rs/app-server-test-client/README.md
- e2e tested in codex desktop by reconnecting to a running turn
2026-02-20 05:29:05 +00:00
Michael Bolin
366ecaf17a
app-server: fix flaky list_apps_returns_connectors_with_accessible_flags test (#12286)
## Why

`app/list` emits `app/list/updated` after whichever async load finishes
first (directory connectors or accessible tools). This test assumed the
directory-backed update always arrived first because it injected a tools
delay, but that assumption is not stable when the process-global Codex
Apps tools cache is already warm. In that case the accessible-tools path
can return immediately and the first notification shape flips, which
makes the assertion flaky.

Relevant code paths:

-
[`codex-rs/app-server/src/codex_message_processor.rs`](13ec97d72e/codex-rs/app-server/src/codex_message_processor.rs (L4949-L5034))
(concurrent loads + per-load `app/list/updated` notifications)
-
[`codex-rs/core/src/mcp_connection_manager.rs`](13ec97d72e/codex-rs/core/src/mcp_connection_manager.rs (L1182-L1197))
(Codex Apps tools cache hit path)

## What Changed

Updated
`suite::v2::app_list::list_apps_returns_connectors_with_accessible_flags`
in `codex-rs/app-server/tests/suite/v2/app_list.rs` to accept either
valid first `app/list/updated` payload:

- the directory-first snapshot
- the accessible-tools-first snapshot

The test still keeps the later assertions strict:

- the second `app/list/updated` notification must be the fully merged
result
- the final `app/list` response must match the same merged result

I also added an inline comment explaining why the first notification is
intentionally order-insensitive.

## Verification

- `cargo test -p codex-app-server`
2026-02-20 02:27:18 +00:00
Michael Bolin
4fa304306b
tests: centralize in-flight turn cleanup helper (#12271)
## Why

Several tests intentionally exercise behavior while a turn is still
active. The cleanup sequence for those tests (`turn/interrupt` + waiting
for `codex/event/turn_aborted`) was duplicated across files, which made
the rationale easy to lose and the pattern easy to apply inconsistently.

This change centralizes that cleanup in one place with a single
explanatory doc comment.

## What Changed

### Added shared helper

In `codex-rs/app-server/tests/common/mcp_process.rs`:

- Added `McpProcess::interrupt_turn_and_wait_for_aborted(...)`.
- Added a doc comment explaining why explicit interrupt + terminal wait
is required for tests that intentionally leave a turn in-flight.

### Migrated call sites

Replaced duplicated interrupt/aborted blocks with the helper in:

- `codex-rs/app-server/tests/suite/v2/thread_resume.rs`
  - `thread_resume_rejects_history_when_thread_is_running`
  - `thread_resume_rejects_mismatched_path_when_thread_is_running`
- `codex-rs/app-server/tests/suite/v2/turn_start_zsh_fork.rs`
  - `turn_start_shell_zsh_fork_executes_command_v2`
-
`turn_start_shell_zsh_fork_subcommand_decline_marks_parent_declined_v2`
- `codex-rs/app-server/tests/suite/v2/turn_steer.rs`
  - `turn_steer_returns_active_turn_id`

### Existing cleanup retained

In `codex-rs/app-server/tests/suite/v2/turn_start.rs`:

- `turn_start_accepts_local_image_input` continues to explicitly wait
for `turn/completed` so the turn lifecycle is fully drained before test
exit.

## Verification

- `cargo test -p codex-app-server`
2026-02-20 01:47:34 +00:00
xl-openai
e4456840f5
skill-creator: lazy-load PyYAML in frontmatter parsing (#12080)
init-skill should work even without PyYAML
2026-02-19 15:09:12 -08:00
mjr-openai
3293538e12
Update pnpm versions to fix cve-2026-24842 (#12009)
Update pnpm versions to resolve CVE-2026-24842
2026-02-19 14:27:55 -08:00
Michael Bolin
7ed3e3760d
tests(thread_resume): interrupt running turns in resume error-path tests (#12269)
## Why

`thread_resume` tests can intentionally create an in-flight turn, assert
a `thread/resume` error path, and return immediately. That leaves turn
work active during teardown, which can surface as intermittent `LEAK`
failures.

Sample output that motivated this investigation (reported during test
runs):

```text
LEAK ... codex-app-server::all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch
```

## What Changed

Updated only `codex-rs/app-server/tests/suite/v2/thread_resume.rs`:

- `thread_resume_rejects_history_when_thread_is_running`
- `thread_resume_rejects_mismatched_path_when_thread_is_running`

Both tests now:

1. capture the running turn id from `TurnStartResponse`
2. assert the expected `thread/resume` error
3. call `turn/interrupt` for that running turn
4. wait for `codex/event/turn_aborted` before returning

## Why This Is The Correct Fix

These tests are specifically validating resume behavior while a turn is
active. They should also own cleanup of that active turn before exiting.
Explicitly interrupting and waiting for the terminal abort notification
removes teardown races and avoids relying on process-drop behavior to
clean up in-flight work.

## Repro / Verification

Repro command used for investigation:

```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 50 --status-level leak --final-status-level fail -E 'test(suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch) | test(suite::v2::thread_resume::thread_resume_rejects_history_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_rejects_mismatched_path_when_thread_is_running) | test(suite::v2::thread_resume::thread_resume_keeps_in_flight_turn_streaming)'
```

Observed before this change: intermittent `LEAK` in
`thread_resume_rejects_history_when_thread_is_running`.

Also verified with:

- `cargo test -p codex-app-server`


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with [ReviewStack](https://reviewstack.dev/openai/codex/pull/12269).
* #12271
* __->__ #12269
2026-02-19 21:51:18 +00:00
viyatb-oai
4edb1441a7
feat(config): add permissions.network proxy config wiring (#12054)
## Summary

Implements the `ConfigToml.permissions.network` and uses it to populate
`NetworkProxyConfig`. We now parse a new nested permissions/network
config shape which is converted into the proxy’s runtime config.

When managed requirements exist, we still apply those constraints on top
of user settings (so managed policy still wins).

* Cleaned up the old constructor path so it now accepts both user config
+ managed constraints directly.
* Updated the reload path so live proxy config reloads respect
[permissions.network] too, while still supporting the existing top-level
[network] format.

### Behavior
- User-defined `[permissions.network]` values are now honored.
- Managed constraints still take effect and are validated against the
resulting policy.
2026-02-19 13:44:55 -08:00
zbarsky-openai
2668789560
[bazel] Fix proc_macro_dep libs (#12274)
If a first-party proc_macro crate has tests/binaries that would get
autogenerated by the macro, it was being handled incorrectly. Found by
an external OS contributor!
2026-02-19 13:38:37 -08:00
dkumar-oai
1070a0a712
Add configurable MCP OAuth callback URL for MCP login (#11382)
## Summary

Implements a configurable MCP OAuth callback URL override for `codex mcp
login` and app-server OAuth login flows, including support for non-local
callback endpoints (for example, devbox ingress URLs).

## What changed

- Added new config key: `mcp_oauth_callback_url` in
`~/.codex/config.toml`.
- OAuth authorization now uses `mcp_oauth_callback_url` as
`redirect_uri` when set.
- Callback handling validates the callback path against the configured
redirect URI path.
- Listener bind behavior is now host-aware:
- local callback URL hosts (`localhost`, `127.0.0.1`, `::1`) bind to
`127.0.0.1`
  - non-local callback URL hosts bind to `0.0.0.0`
- `mcp_oauth_callback_port` remains supported and is used for the
listener port.
- Wired through:
  - CLI MCP login flow
  - App-server MCP OAuth login flow
  - Skill dependency OAuth login flow
- Updated config schema and config tests.

## Why

Some environments need OAuth callbacks to land on a specific reachable
URL (for example ingress in remote devboxes), not loopback. This change
allows that while preserving local defaults for existing users.

## Backward compatibility

- No behavior change when `mcp_oauth_callback_url` is unset.
- Existing `mcp_oauth_callback_port` behavior remains intact.
- Local callback flows continue binding to loopback by default.

## Testing

- `cargo test -p codex-rmcp-client callback -- --nocapture`
- `cargo test -p codex-core --lib mcp_oauth_callback -- --nocapture`
- `cargo check -p codex-cli -p codex-app-server -p codex-rmcp-client`

## Example config

```toml
mcp_oauth_callback_port = 5555
mcp_oauth_callback_url = "https://<devbox>-<namespace>.gateway.<cluster>.internal.api.openai.org/callback"
2026-02-19 13:32:10 -08:00
Alex Kwiatkowski
fe7054a346
fix(bazel): replace askama templates with include_str! in memories (#11778)
## Summary

- The experimental Bazel CI builds fail on all platforms because askama
resolves template paths relative to `CARGO_MANIFEST_DIR`, which points
outside the Bazel sandbox. This produces errors like:
  ```
error: couldn't read
`codex-rs/core/src/memories/../../../../../../../../../../../work/codex/codex/codex-rs/core/templates/memories/consolidation.md`:
No such file or directory
  ```
- Replaced `#[derive(Template)]` + `#[template(path = "...")]` with
`include_str!` + `str::replace()` for the three affected templates
(`consolidation.md`, `stage_one_input.md`, `read_path.md`).
`include_str!` resolves paths relative to the source file, which works
correctly in both Cargo and Bazel builds.
- The templates only use simple `{{ variable }}` substitution with no
control flow or filters, so no askama functionality is lost.
- Removes the `askama` dependency from `codex-core` since it was the
only crate using it. The workspace-level dependency definition is left
in place.
- This matches the existing pattern used throughout the codebase — e.g.
`codex-rs/core/src/memories/mod.rs` already uses
`include_str!("../../templates/memories/stage_one_system.md")` for the
fourth template file.

## Test plan

- [ ] Verify Bazel (experimental) CI passes on all platforms
- [ ] Verify rust-ci (Cargo) builds and tests continue to pass
- [ ] Verify `cargo test -p codex-core` passes locally
2026-02-19 16:29:26 -05:00
pash-openai
429cc4860e
ws turn metadata via client_metadata (#11953) 2026-02-19 12:28:15 -08:00
Michael Bolin
2f3d0b186b
app-server tests: reduce intermittent nextest LEAK via graceful child shutdown (#12266)
## Why
`cargo nextest` was intermittently reporting `LEAK` for
`codex-app-server` tests even when assertions passed. This adds noise
and flakiness to local/CI signals.

Sample output used as the basis of this investigation:

```text
LEAK [   7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1
LEAK [   7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model
LEAK [   7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests
LEAK [   8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2
LEAK [   8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item
LEAK [   8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2
LEAK [   6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2
LEAK [   6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2
```

## How I Reproduced
I focused on the suspect tests and ran them under `nextest` stress mode
with leak reporting enabled.

```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) | test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) | test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) | test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) | test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) | test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) | test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)'
```

This reproduced intermittent `LEAK` statuses while tests still passed.

## What Changed
In `codex-rs/app-server/tests/common/mcp_process.rs`:

- Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown
can explicitly close stdin.
- In `Drop`, close stdin first to trigger EOF-based graceful shutdown.
- Wait briefly for graceful exit.
- If still running, fall back to `start_kill()` and the existing bounded
`try_wait()` loop.
- Updated send-path handling to bail if stdin is already closed.

## Why This Is the Right Fix
The leak signal was caused by child-process teardown timing, not
test-logic assertion failure. The helper previously relied mostly on
force-kill timing in `Drop`; that can race with nextest leak detection.

Closing stdin first gives `codex-app-server` a deterministic, graceful
shutdown path before force-kill. Keeping the force-kill fallback
preserves robustness if graceful shutdown does not complete in time.

## Verification
- `cargo test -p codex-app-server`
- Re-ran the stress repro above after this change: no `LEAK` statuses
observed.
- Additional high-signal stress run also showed no leaks:

```bash
cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) | test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)'
```
2026-02-19 20:19:42 +00:00
Charley Cunningham
c3cb38eafb
Clarify cumulative proposed_plan behavior in Plan mode (#12265)
## Summary
- Require revised `<proposed_plan>` blocks in the same planning session
to be complete replacements, not partial/delta plans.
- Scope that cumulative replacement rule to the current planning session
only.
- Clarify that after leaving Plan mode (for example switching to Default
mode to implement) or when explicitly asked for a new plan, the model
should produce a new self-contained plan without inheriting prior plan
blocks unless requested.

## Testing
- Not run (prompt/template text-only change).
2026-02-19 12:18:23 -08:00
jif-oai
0362e12da6
Skip removed features during metrics emission (#12253)
Summary
- avoid emitting metrics for features marked as `Stage::Removed`
- keep feature metrics aligned with active and planned states only

Testing
- Not run (not requested)
2026-02-19 19:58:46 +00:00
Michael Bolin
425fff7ad6
feat: add Reject approval policy with granular prompt rejection controls (#12087)
## Why

We need a way to auto-reject specific approval prompt categories without
switching all approvals off.

The goal is to let users independently control:
- sandbox escalation approvals,
- execpolicy `prompt` rule approvals,
- MCP elicitation prompts.

## What changed

- Added a new primary approval mode in `protocol/src/protocol.rs`:

```rust
pub enum AskForApproval {
    // ...
    Reject(RejectConfig),
    // ...
}

pub struct RejectConfig {
    pub sandbox_approval: bool,
    pub rules: bool,
    pub mcp_elicitations: bool,
}
```

- Wired `RejectConfig` semantics through approval paths in `core`:
  - `core/src/exec_policy.rs`
    - rejects rule-driven prompts when `rules = true`
    - rejects sandbox/escalation prompts when `sandbox_approval = true`
- preserves rule priority when both rule and sandbox prompt conditions
are present
  - `core/src/tools/sandboxing.rs`
- applies `sandbox_approval` to default exec approval decisions and
sandbox-failure retry gating
  - `core/src/safety.rs`
- keeps `Reject { all false }` behavior aligned with `OnRequest` for
patch safety
    - rejects out-of-root patch approvals when `sandbox_approval = true`
  - `core/src/mcp_connection_manager.rs`
    - auto-declines MCP elicitations when `mcp_elicitations = true`

- Ensured approval policy used by MCP elicitation flow stays in sync
with constrained session policy updates.

- Updated app-server v2 conversions and generated schema/TypeScript
artifacts for the new `Reject` shape.

## Verification

Added focused unit coverage for the new behavior in:
- `core/src/exec_policy.rs`
- `core/src/tools/sandboxing.rs`
- `core/src/mcp_connection_manager.rs`
- `core/src/safety.rs`
- `core/src/tools/runtimes/apply_patch.rs`

Key cases covered include rule-vs-sandbox prompt precedence, MCP
auto-decline behavior, and patch/sandbox retry behavior under
`RejectConfig`.
2026-02-19 11:41:49 -08:00
jif-oai
f6c06108b1
try fix 2 (#12264) 2026-02-19 19:36:42 +00:00
Charley Cunningham
abb018383f
Undo stack size Bazel test hack (#12258)
Undo hack from https://github.com/openai/codex/pull/12203/changes
2026-02-19 11:04:45 -08:00
jif-oai
928be5f515
Revert "feat: no timeout mode on ue" (#12256)
Reverts openai/codex#12250
2026-02-19 19:02:29 +00:00
Michael Bolin
7cd2e84026
chore: consolidate new() and initialize() for McpConnectionManager (#12255)
## Why
`McpConnectionManager` used a two-phase setup (`new()` followed by
`initialize()`), which forced call sites to construct placeholder state
and then mutate it asynchronously. That made MCP startup/refresh flows
harder to follow and easier to misuse, especially around cancellation
token ownership.

## What changed
- Replaced the two-phase initialization flow with a single async
constructor: `McpConnectionManager::new(...) -> (Self,
CancellationToken)`.
- Added `McpConnectionManager::new_uninitialized()` for places that need
an empty manager before async startup begins.
- Added `McpConnectionManager::new_mcp_connection_manager_for_tests()`
for test-only construction.
- Updated MCP startup and refresh call sites in
`codex-rs/core/src/codex.rs` to build a fresh manager via `new(...)`,
swap it in, and update the startup cancellation token consistently.
- Updated MCP snapshot/connector call sites in
`codex-rs/core/src/mcp/mod.rs` and `codex-rs/core/src/connectors.rs` to
use the consolidated constructor.
- Removed the now-obsolete `reset_mcp_startup_cancellation_token()`
helper in favor of explicit token replacement at the call sites.

## Testing
- Not run (refactor-only change; no new behavior was intended).
2026-02-19 10:59:51 -08:00
jif-oai
9719dc502c
feat: no timeout mode on ue (#12250) 2026-02-19 18:58:13 +00:00
jif-oai
dae26c9e8b
chore: increase stack size for everyone (#12254) 2026-02-19 18:44:48 +00:00
jif-oai
d87cf7794c
Add configurable agent spawn depth (#12251)
Summary
- expose `agents.max_depth` in config schema and toml parsing, with
defaults and validation
- thread-spawn depth guards and multi-agent handler now respect the
configured limit instead of a hardcoded value
- ensure documentation and helpers account for agent depth limits
2026-02-19 18:40:41 +00:00
sayan-oai
d54999d006
client side modelinfo overrides (#12101)
TL;DR
Add top-level `model_catalog_json` config support so users can supply a
local model catalog override from a JSON file path (including adding new
models) without backend changes.

### Problem
Codex previously had no clean client-side way to replace/overlay model
catalog data for local testing of model metadata and new model entries.

### Fix
- Add top-level `model_catalog_json` config field (JSON file path).
- Apply catalog entries when resolving `ModelInfo`:
  1. Base resolved model metadata (remote/fallback)
  2. Catalog overlay from `model_catalog_json`
3. Existing global top-level overrides (`model_context_window`,
`model_supports_reasoning_summaries`, etc.)

### Note
Will revisit per-field overrides in a follow-up

### Tests
Added tests
2026-02-19 10:38:57 -08:00
Jack Mousseau
3a951f8096
Restore phase when loading from history (#12244) 2026-02-19 09:56:56 -08:00
Charley Cunningham
f2d5842ed1
Move previous turn context tracking into ContextManager history (#12179)
## Summary
- add `previous_context_item: Option<TurnContextItem>` to
`ContextManager`
- expose session/state accessors for reading and updating the stored
previous context item
- switch settings diffing to use `TurnContextItem` instead of
`TurnContext`
- remove submission-loop local `previous_context` and persist the
previous context item in history

## Testing
- `just fmt`
- `just fix -p codex-core`
- `cargo test -p codex-core --test all model_switching::`
- `cargo test -p codex-core --test all collaboration_instructions::`
- `cargo test -p codex-core --test all personality::`
- `cargo test -p codex-core --test all
permissions_messages::permissions_message_not_added_when_no_change`
2026-02-19 09:56:20 -08:00
colby-oai
f6fd4cb3f5
Adjust MCP tool approval handling for custom servers (#11787)
Summary
This PR expands MCP client-side approval behavior beyond codex_apps and
tightens elicitation capability signaling.

- Removed the codex_apps-only gate in MCP tool approval checks, so
local/custom MCP servers are now eligible for the same client-side
approval prompt flow when tool annotations indicate side effects.
- Updated approval memory keying to support tools without a connector ID
(connector_id: Option<String>), allowing “Approve this Session” to be
remembered even when connector metadata is missing.
- Updated prompt text for non-codex_apps tools to identify origin as The
<server> MCP server instead of This app.
- Added MCP initialization capability policy so only codex_apps
advertises MCP elicitation capability; other servers advertise no
elicitation support.
- Added regression tests for:
server-specific prompt copy behavior
codex-apps-only elicitation capability advertisement

Testing
- Not run (not requested)
2026-02-19 12:52:42 -05:00
jif-oai
547f462385
feat: add configurable write_stdin timeout (#12228)
Add max timeout as config for `write_stdin`. This is only used for empty
`write_stdin`.

Also increased the default value from 30s to 5mins.
2026-02-19 17:22:13 +00:00
viyatb-oai
f595e11723
docs: add codex security policy (#12193)
## Summary
Adds SECURITY.MD with Codex security policy and Bugcrowd reporting
guidance
2026-02-19 09:12:59 -08:00
jif-oai
743caea3a6
feat: add shell snapshot failure reason (#12233) 2026-02-19 13:49:12 +00:00
jif-oai
2daa3fd44f
feat: sub-agent injection (#12152)
This PR adds parent-thread sub-agent completion notifications and change
the prompt of the model to prevent if from being confused
2026-02-19 11:32:10 +00:00
jif-oai
f298c48cc6
Adjust memories rollout defaults (#12231)
- Summary
- raise `DEFAULT_MEMORIES_MAX_ROLLOUTS_PER_STARTUP` to 16 so more
rollouts are allowed per startup
- lower `DEFAULT_MEMORIES_MIN_ROLLOUT_IDLE_HOURS` to 6 to make rollouts
eligible sooner
- Testing
  - Not run (not requested)
2026-02-19 10:52:43 +00:00
Eric Traut
227352257c
Update docs links for feature flag notice (#12164)
Summary
- replace the stale `docs/config.md#feature-flags` reference in the
legacy feature notice with the canonical published URL
- align the deprecation notice test to expect the new link

This addresses #12123
2026-02-19 00:00:44 -08:00
viyatb-oai
4fe99b086f
fix(linux-sandbox): mount /dev in bwrap sandbox (#12081)
## Summary
- Updates the Linux bubblewrap sandbox args to mount a minimal `/dev`
using `--dev /dev` instead of only binding `/dev/null`. tools needing
entropy (git, crypto libs, etc.) can fail.

- Changed mount order so `--dev /dev` is added before writable-root
`--bind` mounts, preserving writable `/dev/*` submounts like `/dev/shm`

## Why
Fixes sandboxed command failures when reading `/dev/urandom` (and
similar standard device-node access).


Fixes https://github.com/openai/codex/issues/12056
2026-02-18 23:27:32 -08:00
Matthew Zeng
18eb640a47
[apps] Update apps allowlist. (#12211)
- [x] Update apps allowlist.
2026-02-18 23:21:32 -08:00
Charley Cunningham
16c3c47535
Stabilize app-server detached review and running-resume tests (#12203)
## Summary
- stabilize
`thread_resume_rejoins_running_thread_even_with_override_mismatch` by
using a valid delayed second SSE response instead of an intentionally
truncated stream
- set `RUST_MIN_STACK=4194304` for spawned app-server test processes in
`McpProcess` to avoid stack-sensitive CI overflows in detached review
tests

## Why
- the thread-resume assertion could race with a mocked stream-disconnect
error and intermittently observe `systemError`
- detached review startup is stack-sensitive in some CI environments;
pinning a larger stack in the test harness removes that flake without
changing product behavior

## Validation
- `just fmt`
- `cargo test -p codex-app-server --test all
suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch`
- `cargo test -p codex-app-server --test all
suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`
2026-02-18 19:05:35 -08:00
Charley Cunningham
7f3dbaeb25
state: enforce 10 MiB log caps for thread and threadless process logs (#12038)
## Summary
- enforce a 10 MiB cap per `thread_id` in state log storage
- enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS
NULL`) logs
- scope pruning to only keys affected by the current insert batch
- add a cheap per-key `SUM(...)` precheck so windowed prune queries only
run for keys that are currently over the cap
- add SQLite indexes used by the pruning queries
- add focused runtime tests covering both pruning behaviors

## Why
This keeps log growth bounded by the intended partition semantics while
preserving a small, readable implementation localized to the existing
insert path.

## Local Latency Snapshot (No Truncation-Pressure Run)
Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using
local-only (uncommitted) instrumentation, while not specifically
benchmarking the truncation-heavy regime.

### Percentiles By Query (ms)
| query | count | p50 | p90 | p95 | p99 | max |
|---|---:|---:|---:|---:|---:|---:|
| `insert_logs.insert_batch` | 110 | 0.332 | 0.999 | 1.811 | 2.978 |
3.493 |
| `insert_logs.precheck.process` | 106 | 0.074 | 0.152 | 0.206 | 0.258 |
0.426 |
| `insert_logs.precheck.thread` | 73 | 0.118 | 0.206 | 0.253 | 1.025 |
1.025 |
| `insert_logs.prune.process` | 58 | 0.291 | 0.576 | 0.607 | 1.088 |
1.088 |
| `insert_logs.prune.thread` | 44 | 0.318 | 0.467 | 0.728 | 0.797 |
0.797 |
| `insert_logs.prune_total` | 110 | 0.488 | 0.976 | 1.237 | 1.593 |
1.684 |
| `insert_logs.total` | 110 | 1.315 | 2.889 | 3.623 | 5.739 | 5.961 |
| `insert_logs.tx_begin` | 110 | 0.133 | 0.235 | 0.282 | 0.412 | 0.546 |
| `insert_logs.tx_commit` | 110 | 0.259 | 0.689 | 0.772 | 1.065 | 1.080
|

### `insert_logs.total` Histogram (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 7 |
| `<= 1.000` | 33 |
| `<= 2.000` | 40 |
| `<= 5.000` | 28 |
| `<= 10.000` | 2 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime)
Collected from a run where cap-hit behavior was frequent (`135/180`
insert calls), using local-only (uncommitted) instrumentation and a
temporary local cap of `10_000` bytes for stress testing (not the merged
`10 MiB` cap).

### Percentiles By Query (ms)
| query | count | p50 | p90 | p95 | p99 | max |
|---|---:|---:|---:|---:|---:|---:|
| `insert_logs.insert_batch` | 180 | 0.524 | 1.645 | 2.163 | 3.424 |
3.777 |
| `insert_logs.precheck.process` | 171 | 0.086 | 0.235 | 0.373 | 0.758 |
1.147 |
| `insert_logs.precheck.thread` | 100 | 0.105 | 0.251 | 0.291 | 1.176 |
1.622 |
| `insert_logs.prune.process` | 109 | 0.386 | 0.839 | 1.146 | 1.548 |
2.588 |
| `insert_logs.prune.thread` | 56 | 0.253 | 0.550 | 1.148 | 2.484 |
2.484 |
| `insert_logs.prune_total` | 180 | 0.511 | 1.221 | 1.695 | 4.548 |
5.512 |
| `insert_logs.total` | 180 | 1.631 | 3.902 | 5.103 | 8.901 | 9.095 |
| `insert_logs.total_cap_hit` | 135 | 1.876 | 4.501 | 5.547 | 8.902 |
9.096 |
| `insert_logs.total_no_cap_hit` | 45 | 0.520 | 1.700 | 2.079 | 3.294 |
3.294 |
| `insert_logs.tx_begin` | 180 | 0.109 | 0.253 | 0.287 | 1.088 | 1.406 |
| `insert_logs.tx_commit` | 180 | 0.267 | 0.813 | 1.170 | 2.497 | 2.574
|

### `insert_logs.total` Histogram (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 16 |
| `<= 1.000` | 39 |
| `<= 2.000` | 60 |
| `<= 5.000` | 54 |
| `<= 10.000` | 11 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

### `insert_logs.total` Histogram When Cap Was Hit (ms)
| bucket | count |
|---|---:|
| `<= 0.100` | 0 |
| `<= 0.250` | 0 |
| `<= 0.500` | 0 |
| `<= 1.000` | 22 |
| `<= 2.000` | 51 |
| `<= 5.000` | 51 |
| `<= 10.000` | 11 |
| `<= 20.000` | 0 |
| `<= 50.000` | 0 |
| `<= 100.000` | 0 |
| `> 100.000` | 0 |

### Performance Takeaways
- Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total`
stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`).
- Calls that did **not** hit the cap are materially cheaper
(`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls
(`insert_logs.total_cap_hit` p95 `5.547ms`).
- Compared to the earlier non-truncation-pressure run, overall
`insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms`
(+`1.48ms`), indicating bounded overhead when pruning is active.
- This truncation-heavy run used an intentionally low local cap for
stress testing; with the real 10 MiB cap, cap-hit frequency should be
much lower in normal sessions.

## Testing
- `just fmt` (in `codex-rs`)
- `cargo test -p codex-state` (in `codex-rs`)
2026-02-18 17:08:08 -08:00