core-agent-ide

Author	SHA1	Message	Date
Alex Kwiatkowski	fe7054a346	fix(bazel): replace askama templates with include_str! in memories (#11778 ) ## Summary - The experimental Bazel CI builds fail on all platforms because askama resolves template paths relative to `CARGO_MANIFEST_DIR`, which points outside the Bazel sandbox. This produces errors like: ``` error: couldn't read `codex-rs/core/src/memories/../../../../../../../../../../../work/codex/codex/codex-rs/core/templates/memories/consolidation.md`: No such file or directory ``` - Replaced `#[derive(Template)]` + `#[template(path = "...")]` with `include_str!` + `str::replace()` for the three affected templates (`consolidation.md`, `stage_one_input.md`, `read_path.md`). `include_str!` resolves paths relative to the source file, which works correctly in both Cargo and Bazel builds. - The templates only use simple `{{ variable }}` substitution with no control flow or filters, so no askama functionality is lost. - Removes the `askama` dependency from `codex-core` since it was the only crate using it. The workspace-level dependency definition is left in place. - This matches the existing pattern used throughout the codebase — e.g. `codex-rs/core/src/memories/mod.rs` already uses `include_str!("../../templates/memories/stage_one_system.md")` for the fourth template file. ## Test plan - [ ] Verify Bazel (experimental) CI passes on all platforms - [ ] Verify rust-ci (Cargo) builds and tests continue to pass - [ ] Verify `cargo test -p codex-core` passes locally	2026-02-19 16:29:26 -05:00
pash-openai	429cc4860e	ws turn metadata via client_metadata (#11953 )	2026-02-19 12:28:15 -08:00
Michael Bolin	2f3d0b186b	app-server tests: reduce intermittent nextest LEAK via graceful child shutdown (#12266 ) ## Why `cargo nextest` was intermittently reporting `LEAK` for `codex-app-server` tests even when assertions passed. This adds noise and flakiness to local/CI signals. Sample output used as the basis of this investigation: ```text LEAK [ 7.578s] ( 149/3663) codex-app-server::all suite::output_schema::send_user_turn_output_schema_is_per_turn_v1 LEAK [ 7.383s] ( 210/3663) codex-app-server::all suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model LEAK [ 7.768s] ( 213/3663) codex-app-server::all suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests LEAK [ 8.841s] ( 224/3663) codex-app-server::all suite::v2::output_schema::turn_start_accepts_output_schema_v2 LEAK [ 8.151s] ( 225/3663) codex-app-server::all suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item LEAK [ 8.230s] ( 232/3663) codex-app-server::all suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2 LEAK [ 6.472s] ( 273/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2 LEAK [ 6.107s] ( 275/3663) codex-app-server::all suite::v2::turn_start::turn_start_accepts_personality_override_v2 ``` ## How I Reproduced I focused on the suspect tests and ran them under `nextest` stress mode with leak reporting enabled. ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 25 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) \| test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model) \| test(suite::v2::dynamic_tools::thread_start_injects_dynamic_tools_into_model_requests) \| test(suite::v2::output_schema::turn_start_accepts_output_schema_v2) \| test(suite::v2::plan_item::plan_mode_uses_proposed_plan_block_for_plan_item) \| test(suite::v2::safety_check_downgrade::openai_model_header_mismatch_emits_model_rerouted_notification_v2) \| test(suite::v2::turn_start::turn_start_accepts_collaboration_mode_override_v2) \| test(suite::v2::turn_start::turn_start_accepts_personality_override_v2)' ``` This reproduced intermittent `LEAK` statuses while tests still passed. ## What Changed In `codex-rs/app-server/tests/common/mcp_process.rs`: - Changed `stdin: ChildStdin` to `stdin: Option<ChildStdin>` so teardown can explicitly close stdin. - In `Drop`, close stdin first to trigger EOF-based graceful shutdown. - Wait briefly for graceful exit. - If still running, fall back to `start_kill()` and the existing bounded `try_wait()` loop. - Updated send-path handling to bail if stdin is already closed. ## Why This Is the Right Fix The leak signal was caused by child-process teardown timing, not test-logic assertion failure. The helper previously relied mostly on force-kill timing in `Drop`; that can race with nextest leak detection. Closing stdin first gives `codex-app-server` a deterministic, graceful shutdown path before force-kill. Keeping the force-kill fallback preserves robustness if graceful shutdown does not complete in time. ## Verification - `cargo test -p codex-app-server` - Re-ran the stress repro above after this change: no `LEAK` statuses observed. - Additional high-signal stress run also showed no leaks: ```bash cargo nextest run -p codex-app-server -j 2 --no-fail-fast --stress-count 100 --status-level leak --final-status-level fail -E 'test(suite::output_schema::send_user_turn_output_schema_is_per_turn_v1) \| test(suite::v2::dynamic_tools::dynamic_tool_call_round_trip_sends_text_content_items_to_model)' ```	2026-02-19 20:19:42 +00:00
Charley Cunningham	c3cb38eafb	Clarify cumulative proposed_plan behavior in Plan mode (#12265 ) ## Summary - Require revised `<proposed_plan>` blocks in the same planning session to be complete replacements, not partial/delta plans. - Scope that cumulative replacement rule to the current planning session only. - Clarify that after leaving Plan mode (for example switching to Default mode to implement) or when explicitly asked for a new plan, the model should produce a new self-contained plan without inheriting prior plan blocks unless requested. ## Testing - Not run (prompt/template text-only change).	2026-02-19 12:18:23 -08:00
jif-oai	0362e12da6	Skip removed features during metrics emission (#12253 ) Summary - avoid emitting metrics for features marked as `Stage::Removed` - keep feature metrics aligned with active and planned states only Testing - Not run (not requested)	2026-02-19 19:58:46 +00:00
Michael Bolin	425fff7ad6	feat: add Reject approval policy with granular prompt rejection controls (#12087 ) ## Why We need a way to auto-reject specific approval prompt categories without switching all approvals off. The goal is to let users independently control: - sandbox escalation approvals, - execpolicy `prompt` rule approvals, - MCP elicitation prompts. ## What changed - Added a new primary approval mode in `protocol/src/protocol.rs`: ```rust pub enum AskForApproval { // ... Reject(RejectConfig), // ... } pub struct RejectConfig { pub sandbox_approval: bool, pub rules: bool, pub mcp_elicitations: bool, } ``` - Wired `RejectConfig` semantics through approval paths in `core`: - `core/src/exec_policy.rs` - rejects rule-driven prompts when `rules = true` - rejects sandbox/escalation prompts when `sandbox_approval = true` - preserves rule priority when both rule and sandbox prompt conditions are present - `core/src/tools/sandboxing.rs` - applies `sandbox_approval` to default exec approval decisions and sandbox-failure retry gating - `core/src/safety.rs` - keeps `Reject { all false }` behavior aligned with `OnRequest` for patch safety - rejects out-of-root patch approvals when `sandbox_approval = true` - `core/src/mcp_connection_manager.rs` - auto-declines MCP elicitations when `mcp_elicitations = true` - Ensured approval policy used by MCP elicitation flow stays in sync with constrained session policy updates. - Updated app-server v2 conversions and generated schema/TypeScript artifacts for the new `Reject` shape. ## Verification Added focused unit coverage for the new behavior in: - `core/src/exec_policy.rs` - `core/src/tools/sandboxing.rs` - `core/src/mcp_connection_manager.rs` - `core/src/safety.rs` - `core/src/tools/runtimes/apply_patch.rs` Key cases covered include rule-vs-sandbox prompt precedence, MCP auto-decline behavior, and patch/sandbox retry behavior under `RejectConfig`.	2026-02-19 11:41:49 -08:00
jif-oai	f6c06108b1	try fix 2 (#12264 )	2026-02-19 19:36:42 +00:00
Charley Cunningham	abb018383f	Undo stack size Bazel test hack (#12258 ) Undo hack from https://github.com/openai/codex/pull/12203/changes	2026-02-19 11:04:45 -08:00
jif-oai	928be5f515	Revert "feat: no timeout mode on ue" (#12256 ) Reverts openai/codex#12250	2026-02-19 19:02:29 +00:00
Michael Bolin	7cd2e84026	chore: consolidate new() and initialize() for McpConnectionManager (#12255 ) ## Why `McpConnectionManager` used a two-phase setup (`new()` followed by `initialize()`), which forced call sites to construct placeholder state and then mutate it asynchronously. That made MCP startup/refresh flows harder to follow and easier to misuse, especially around cancellation token ownership. ## What changed - Replaced the two-phase initialization flow with a single async constructor: `McpConnectionManager::new(...) -> (Self, CancellationToken)`. - Added `McpConnectionManager::new_uninitialized()` for places that need an empty manager before async startup begins. - Added `McpConnectionManager::new_mcp_connection_manager_for_tests()` for test-only construction. - Updated MCP startup and refresh call sites in `codex-rs/core/src/codex.rs` to build a fresh manager via `new(...)`, swap it in, and update the startup cancellation token consistently. - Updated MCP snapshot/connector call sites in `codex-rs/core/src/mcp/mod.rs` and `codex-rs/core/src/connectors.rs` to use the consolidated constructor. - Removed the now-obsolete `reset_mcp_startup_cancellation_token()` helper in favor of explicit token replacement at the call sites. ## Testing - Not run (refactor-only change; no new behavior was intended).	2026-02-19 10:59:51 -08:00
jif-oai	9719dc502c	feat: no timeout mode on ue (#12250 )	2026-02-19 18:58:13 +00:00
jif-oai	dae26c9e8b	chore: increase stack size for everyone (#12254 )	2026-02-19 18:44:48 +00:00
jif-oai	d87cf7794c	Add configurable agent spawn depth (#12251 ) Summary - expose `agents.max_depth` in config schema and toml parsing, with defaults and validation - thread-spawn depth guards and multi-agent handler now respect the configured limit instead of a hardcoded value - ensure documentation and helpers account for agent depth limits	2026-02-19 18:40:41 +00:00
sayan-oai	d54999d006	client side modelinfo overrides (#12101 ) TL;DR Add top-level `model_catalog_json` config support so users can supply a local model catalog override from a JSON file path (including adding new models) without backend changes. ### Problem Codex previously had no clean client-side way to replace/overlay model catalog data for local testing of model metadata and new model entries. ### Fix - Add top-level `model_catalog_json` config field (JSON file path). - Apply catalog entries when resolving `ModelInfo`: 1. Base resolved model metadata (remote/fallback) 2. Catalog overlay from `model_catalog_json` 3. Existing global top-level overrides (`model_context_window`, `model_supports_reasoning_summaries`, etc.) ### Note Will revisit per-field overrides in a follow-up ### Tests Added tests	2026-02-19 10:38:57 -08:00
Jack Mousseau	3a951f8096	Restore phase when loading from history (#12244 )	2026-02-19 09:56:56 -08:00
Charley Cunningham	f2d5842ed1	Move previous turn context tracking into ContextManager history (#12179 ) ## Summary - add `previous_context_item: Option<TurnContextItem>` to `ContextManager` - expose session/state accessors for reading and updating the stored previous context item - switch settings diffing to use `TurnContextItem` instead of `TurnContext` - remove submission-loop local `previous_context` and persist the previous context item in history ## Testing - `just fmt` - `just fix -p codex-core` - `cargo test -p codex-core --test all model_switching::` - `cargo test -p codex-core --test all collaboration_instructions::` - `cargo test -p codex-core --test all personality::` - `cargo test -p codex-core --test all permissions_messages::permissions_message_not_added_when_no_change`	2026-02-19 09:56:20 -08:00
colby-oai	f6fd4cb3f5	Adjust MCP tool approval handling for custom servers (#11787 ) Summary This PR expands MCP client-side approval behavior beyond codex_apps and tightens elicitation capability signaling. - Removed the codex_apps-only gate in MCP tool approval checks, so local/custom MCP servers are now eligible for the same client-side approval prompt flow when tool annotations indicate side effects. - Updated approval memory keying to support tools without a connector ID (connector_id: Option<String>), allowing “Approve this Session” to be remembered even when connector metadata is missing. - Updated prompt text for non-codex_apps tools to identify origin as The <server> MCP server instead of This app. - Added MCP initialization capability policy so only codex_apps advertises MCP elicitation capability; other servers advertise no elicitation support. - Added regression tests for: server-specific prompt copy behavior codex-apps-only elicitation capability advertisement Testing - Not run (not requested)	2026-02-19 12:52:42 -05:00
jif-oai	547f462385	feat: add configurable write_stdin timeout (#12228 ) Add max timeout as config for `write_stdin`. This is only used for empty `write_stdin`. Also increased the default value from 30s to 5mins.	2026-02-19 17:22:13 +00:00
viyatb-oai	f595e11723	docs: add codex security policy (#12193 ) ## Summary Adds SECURITY.MD with Codex security policy and Bugcrowd reporting guidance	2026-02-19 09:12:59 -08:00
jif-oai	743caea3a6	feat: add shell snapshot failure reason (#12233 )	2026-02-19 13:49:12 +00:00
jif-oai	2daa3fd44f	feat: sub-agent injection (#12152 ) This PR adds parent-thread sub-agent completion notifications and change the prompt of the model to prevent if from being confused	2026-02-19 11:32:10 +00:00
jif-oai	f298c48cc6	Adjust memories rollout defaults (#12231 ) - Summary - raise `DEFAULT_MEMORIES_MAX_ROLLOUTS_PER_STARTUP` to 16 so more rollouts are allowed per startup - lower `DEFAULT_MEMORIES_MIN_ROLLOUT_IDLE_HOURS` to 6 to make rollouts eligible sooner - Testing - Not run (not requested)	2026-02-19 10:52:43 +00:00
Eric Traut	227352257c	Update docs links for feature flag notice (#12164 ) Summary - replace the stale `docs/config.md#feature-flags` reference in the legacy feature notice with the canonical published URL - align the deprecation notice test to expect the new link This addresses #12123	2026-02-19 00:00:44 -08:00
viyatb-oai	4fe99b086f	fix(linux-sandbox): mount /dev in bwrap sandbox (#12081 ) ## Summary - Updates the Linux bubblewrap sandbox args to mount a minimal `/dev` using `--dev /dev` instead of only binding `/dev/null`. tools needing entropy (git, crypto libs, etc.) can fail. - Changed mount order so `--dev /dev` is added before writable-root `--bind` mounts, preserving writable `/dev/*` submounts like `/dev/shm` ## Why Fixes sandboxed command failures when reading `/dev/urandom` (and similar standard device-node access). Fixes https://github.com/openai/codex/issues/12056	2026-02-18 23:27:32 -08:00
Matthew Zeng	18eb640a47	[apps] Update apps allowlist. (#12211 ) - [x] Update apps allowlist.	2026-02-18 23:21:32 -08:00
Charley Cunningham	16c3c47535	Stabilize app-server detached review and running-resume tests (#12203 ) ## Summary - stabilize `thread_resume_rejoins_running_thread_even_with_override_mismatch` by using a valid delayed second SSE response instead of an intentionally truncated stream - set `RUST_MIN_STACK=4194304` for spawned app-server test processes in `McpProcess` to avoid stack-sensitive CI overflows in detached review tests ## Why - the thread-resume assertion could race with a mocked stream-disconnect error and intermittently observe `systemError` - detached review startup is stack-sensitive in some CI environments; pinning a larger stack in the test harness removes that flake without changing product behavior ## Validation - `just fmt` - `cargo test -p codex-app-server --test all suite::v2::thread_resume::thread_resume_rejoins_running_thread_even_with_override_mismatch` - `cargo test -p codex-app-server --test all suite::v2::review::review_start_with_detached_delivery_returns_new_thread_id`	2026-02-18 19:05:35 -08:00
Charley Cunningham	7f3dbaeb25	state: enforce 10 MiB log caps for thread and threadless process logs (#12038 ) ## Summary - enforce a 10 MiB cap per `thread_id` in state log storage - enforce a 10 MiB cap per `process_uuid` for threadless (`thread_id IS NULL`) logs - scope pruning to only keys affected by the current insert batch - add a cheap per-key `SUM(...)` precheck so windowed prune queries only run for keys that are currently over the cap - add SQLite indexes used by the pruning queries - add focused runtime tests covering both pruning behaviors ## Why This keeps log growth bounded by the intended partition semantics while preserving a small, readable implementation localized to the existing insert path. ## Local Latency Snapshot (No Truncation-Pressure Run) Collected from session `019c734f-1d16-7002-9e00-c966c9fbbcae` using local-only (uncommitted) instrumentation, while not specifically benchmarking the truncation-heavy regime. ### Percentiles By Query (ms) \| query \| count \| p50 \| p90 \| p95 \| p99 \| max \| \|---\|---:\|---:\|---:\|---:\|---:\|---:\| \| `insert_logs.insert_batch` \| 110 \| 0.332 \| 0.999 \| 1.811 \| 2.978 \| 3.493 \| \| `insert_logs.precheck.process` \| 106 \| 0.074 \| 0.152 \| 0.206 \| 0.258 \| 0.426 \| \| `insert_logs.precheck.thread` \| 73 \| 0.118 \| 0.206 \| 0.253 \| 1.025 \| 1.025 \| \| `insert_logs.prune.process` \| 58 \| 0.291 \| 0.576 \| 0.607 \| 1.088 \| 1.088 \| \| `insert_logs.prune.thread` \| 44 \| 0.318 \| 0.467 \| 0.728 \| 0.797 \| 0.797 \| \| `insert_logs.prune_total` \| 110 \| 0.488 \| 0.976 \| 1.237 \| 1.593 \| 1.684 \| \| `insert_logs.total` \| 110 \| 1.315 \| 2.889 \| 3.623 \| 5.739 \| 5.961 \| \| `insert_logs.tx_begin` \| 110 \| 0.133 \| 0.235 \| 0.282 \| 0.412 \| 0.546 \| \| `insert_logs.tx_commit` \| 110 \| 0.259 \| 0.689 \| 0.772 \| 1.065 \| 1.080 \| ### `insert_logs.total` Histogram (ms) \| bucket \| count \| \|---\|---:\| \| `<= 0.100` \| 0 \| \| `<= 0.250` \| 0 \| \| `<= 0.500` \| 7 \| \| `<= 1.000` \| 33 \| \| `<= 2.000` \| 40 \| \| `<= 5.000` \| 28 \| \| `<= 10.000` \| 2 \| \| `<= 20.000` \| 0 \| \| `<= 50.000` \| 0 \| \| `<= 100.000` \| 0 \| \| `> 100.000` \| 0 \| ## Local Latency Snapshot (Truncation-Heavy / Cap-Hit Regime) Collected from a run where cap-hit behavior was frequent (`135/180` insert calls), using local-only (uncommitted) instrumentation and a temporary local cap of `10_000` bytes for stress testing (not the merged `10 MiB` cap). ### Percentiles By Query (ms) \| query \| count \| p50 \| p90 \| p95 \| p99 \| max \| \|---\|---:\|---:\|---:\|---:\|---:\|---:\| \| `insert_logs.insert_batch` \| 180 \| 0.524 \| 1.645 \| 2.163 \| 3.424 \| 3.777 \| \| `insert_logs.precheck.process` \| 171 \| 0.086 \| 0.235 \| 0.373 \| 0.758 \| 1.147 \| \| `insert_logs.precheck.thread` \| 100 \| 0.105 \| 0.251 \| 0.291 \| 1.176 \| 1.622 \| \| `insert_logs.prune.process` \| 109 \| 0.386 \| 0.839 \| 1.146 \| 1.548 \| 2.588 \| \| `insert_logs.prune.thread` \| 56 \| 0.253 \| 0.550 \| 1.148 \| 2.484 \| 2.484 \| \| `insert_logs.prune_total` \| 180 \| 0.511 \| 1.221 \| 1.695 \| 4.548 \| 5.512 \| \| `insert_logs.total` \| 180 \| 1.631 \| 3.902 \| 5.103 \| 8.901 \| 9.095 \| \| `insert_logs.total_cap_hit` \| 135 \| 1.876 \| 4.501 \| 5.547 \| 8.902 \| 9.096 \| \| `insert_logs.total_no_cap_hit` \| 45 \| 0.520 \| 1.700 \| 2.079 \| 3.294 \| 3.294 \| \| `insert_logs.tx_begin` \| 180 \| 0.109 \| 0.253 \| 0.287 \| 1.088 \| 1.406 \| \| `insert_logs.tx_commit` \| 180 \| 0.267 \| 0.813 \| 1.170 \| 2.497 \| 2.574 \| ### `insert_logs.total` Histogram (ms) \| bucket \| count \| \|---\|---:\| \| `<= 0.100` \| 0 \| \| `<= 0.250` \| 0 \| \| `<= 0.500` \| 16 \| \| `<= 1.000` \| 39 \| \| `<= 2.000` \| 60 \| \| `<= 5.000` \| 54 \| \| `<= 10.000` \| 11 \| \| `<= 20.000` \| 0 \| \| `<= 50.000` \| 0 \| \| `<= 100.000` \| 0 \| \| `> 100.000` \| 0 \| ### `insert_logs.total` Histogram When Cap Was Hit (ms) \| bucket \| count \| \|---\|---:\| \| `<= 0.100` \| 0 \| \| `<= 0.250` \| 0 \| \| `<= 0.500` \| 0 \| \| `<= 1.000` \| 22 \| \| `<= 2.000` \| 51 \| \| `<= 5.000` \| 51 \| \| `<= 10.000` \| 11 \| \| `<= 20.000` \| 0 \| \| `<= 50.000` \| 0 \| \| `<= 100.000` \| 0 \| \| `> 100.000` \| 0 \| ### Performance Takeaways - Even in a cap-hit-heavy run (`75%` cap-hit calls), `insert_logs.total` stays sub-10ms at p99 (`8.901ms`) and max (`9.095ms`). - Calls that did not hit the cap are materially cheaper (`insert_logs.total_no_cap_hit` p95 `2.079ms`) than cap-hit calls (`insert_logs.total_cap_hit` p95 `5.547ms`). - Compared to the earlier non-truncation-pressure run, overall `insert_logs.total` rose from p95 `3.623ms` to p95 `5.103ms` (+`1.48ms`), indicating bounded overhead when pruning is active. - This truncation-heavy run used an intentionally low local cap for stress testing; with the real 10 MiB cap, cap-hit frequency should be much lower in normal sessions. ## Testing - `just fmt` (in `codex-rs`) - `cargo test -p codex-state` (in `codex-rs`)	2026-02-18 17:08:08 -08:00
Ruslan Nigmatullin	1f54496c48	app-server: expose loaded thread status via read/list and notifications (#11786 ) Motivation - Today, a newly connected client has no direct way to determine the current runtime status of threads from read/list responses alone. - This forces clients to infer state from transient events, which can lead to stale or inconsistent UI when reconnecting or attaching late. Changes - Add `status` to `thread/read` responses. - Add `statuses` to `thread/list` responses. - Emit `thread/status/changed` notifications with `threadId` and the new status. - Track runtime status for all loaded threads and default unknown threads to `idle`. - Update protocol/docs/tests/schema fixtures for the revised API. Testing - Validated protocol API changes with automated protocol tests and regenerated schema/type fixtures. - Validated app-server behavior with unit and integration test suites, including status transitions and notifications.	2026-02-18 15:20:03 -08:00
Matthew Zeng	216fe7f2ef	[apps] Temporary app block. (#12180 ) - [x] Temporary app block.	2026-02-18 15:09:30 -08:00
zuxin-oai	f8ee18c8cf	fix: Remove citation (#12187 ) Remove citation requirement until we figure out a better visualization	2026-02-18 21:13:33 +00:00
iceweasel-oai	292542616a	app-server support for Windows sandbox setup. (#12025 ) app-server support for initiating Windows sandbox setup. server responds quickly to setup request and makes a future RPC call back to client when the setup finishes. The TUI implementation is unaffected but in a future PR I'll update the TUI to use the shared setup helper (`windows_sandbox.run_windows_sandbox_setup`)	2026-02-18 13:03:16 -08:00
Curtis 'Fjord' Hawthorne	cc248e4681	js_repl: canonicalize paths for node_modules boundary checks (#12177 ) ## Summary Fix `js_repl` package-resolution boundary checks for macOS temp directory path aliasing (`/var` vs `/private/var`). ## Problem `js_repl` verifies that resolved bare-package imports stay inside a configured `node_modules` root. On macOS, temp directories are commonly exposed as `/var/...` but canonicalize to `/private/var/...`. Because the boundary check compared raw paths with `path.relative(...)`, valid resolutions under temp dirs could be misclassified as escaping the allowed base, causing false `Module not found` errors. ## Changes - Add `fs` import in the JS kernel. - Add `canonicalizePath()` using `fs.realpathSync.native(...)` (with safe fallback). - Canonicalize both `base` and `resolvedPath` before running the `node_modules` containment check. ## Impact - Fixes false-negative boundary checks for valid package resolutions in macOS temp-dir scenarios. - Keeps the existing security boundary behavior intact. - Scope is limited to `js_repl` kernel module path validation logic. #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/12177 - ⏳ `2` https://github.com/openai/codex/pull/10673	2026-02-18 11:56:45 -08:00
zuxin-oai	82d82d9ca5	memories: bump rollout summary slug cap to 60 (#12167 ) ## Summary Increase the rollout summary filename slug cap from 20 to 60 characters in memory storage. ## What changed - Updated `ROLLOUT_SLUG_MAX_LEN` from `20` to `60` in: - `codex-rs/core/src/memories/storage.rs` - Updated slug truncation test to verify 60-char behavior. ## Why This preserves more semantic context in rollout summary filenames while keeping existing normalization behavior unchanged. ## Testing - `just fmt` - `cargo test -p codex-core memories::storage::tests::rollout_summary_file_stem_sanitizes_and_truncates_slug -- --exact`	2026-02-18 19:15:07 +00:00
jif-oai	f675bf9334	fix: file watcher (#12105 ) The issue was that the file_watcher never unsubscribe a file watch. All of them leave in the owning of the ThreadManager. As a result, for each newly created thread we create a new file watcher but this one never get deleted even if we close the thread. On Unix system, a file watcher uses an `inotify` and after some time we end up having consumed all of them. This PR adds a mechanism to unsubscribe a file watcher when a thread is dropped	2026-02-18 18:28:34 +00:00
Eric Traut	999576f7b8	Fixed a hole in token refresh logic for app server (#11802 ) We've continued to receive reports from users that they're seeing the error message "Your access token could not be refreshed because your refresh token was already used. Please log out and sign in again." This PR fixes two holes in the token refresh logic that lead to this condition. Background: A previous change in token refresh introduced the `UnauthorizedRecovery` object. It implements a state machine in the core agent loop that first performs a load of the on-disk auth information guarded by a check for matching account ID. If it finds that the on-disk version has been updated by another instance of codex, it uses the reloaded auth tokens. If the on-disk version hasn't been updated, it issues a refresh request from the token authority. There are two problems that this PR addresses: Problem 1: We weren't doing the same thing for the code path used by the app server interface. This PR effectively replicates the `UnauthorizedRecovery` logic for that code path. Problem 2: The `UnauthorizedRecovery` logic contained a hole in the `ReloadOutcome::Skipped` case. Here's the scenario. A user starts two instances of the CLI. Instance 1 is active (working on a task), instance 2 is idle. Both instances have the same in-memory cached tokens. The user then runs `codex logout` or `codex login` to log in to a separate account, which overwrites the `auth.json` file. Instance 1 receives a 401 and refreshes its token, but it doesn't write the new token to the `auth.json` file because the account ID doesn't match. Instance 2 is later activated and presented with a new task. It immediately hits a 401 and attempts to refresh its token but fails because its cached refresh token is now invalid. To avoid this situation, I've changed the logic to immediately fail a token refresh if the user has since logged out or logged in to another account. This will still be seen as an error by the user, but the cause will be clearer. I also took this opportunity to clean up the names of existing functions to make their roles clearer. * `try_refresh_token` is renamed `request_chatgpt_token_refresh` * the existing `refresh_token` is renamed `refresh_token_from_authority` (there's a new higher-level function named `refresh_token` now) * `refresh_tokens` is renamed `refresh_and_persist_chatgpt_token`, and it now implicitly reloads * `update_tokens` is renamed `persist_tokens`	2026-02-18 09:27:04 -08:00
jif-oai	9f5b17de0d	Disable collab tools during review delegation (#12157 ) Summary - prevent delegated review agents from re-enabling blocked tools by explicitly disabling the Collab feature alongside web search and view image controls Testing - Not run (not requested)	2026-02-18 17:02:49 +00:00
jif-oai	18206a9c1e	feat: better slug for rollout summaries (#12135 )	2026-02-18 16:39:38 +00:00
Curtis 'Fjord' Hawthorne	491b4946ae	Stop filtering model tools in js_repl_tools_only mode (#12069 ) ## Summary This change removes tool-list filtering in `js_repl_tools_only` mode and relies on the normal model tool descriptions, while still enforcing that tool execution must go through `js_repl` + `codex.tool(...)`. ## Motivation The previous `js_repl_tools_only` filtering hid most tools from the model request, which diverged from standard tool-list behavior and made signatures less discoverable. I tested that this filtering is not needed, and the model can follow the prompt to only call tools via `js_repl`. ## What Changed - `filter_tools_for_model(...)` in `core/src/tools/spec.rs` is now a pass-through (no filtering when `js_repl_tools_only` is enabled). - Updated tests to assert that model tools are not filtered in `js_repl_tools_only` mode. - Updated dynamic-tool test to assert dynamic tools remain visible in model tool specs. - Removed obsolete test helper used only by the old filtering assertions. ## Safety / Behavior - This commit does not relax execution policy. - Direct model tool calls remain blocked in `js_repl_tools_only` mode (except internal `js_repl` tools), and callers are instructed to use `js_repl` + `codex.tool(...)`. ## Testing - `cargo test -p codex-core js_repl_tools_only` - Manual rollout validation showed the model can follow the `js_repl` routing instructions without needing filtered tool lists. #### [git stack](https://github.com/magus/git-stack-cli) - 👉 `1` https://github.com/openai/codex/pull/12069 - ⏳ `2` https://github.com/openai/codex/pull/10673 - ⏳ `3` https://github.com/openai/codex/pull/10670	2026-02-18 07:31:15 -08:00
jif-oai	cc3bbd7852	nit: change model for phase 1 (#12137 )	2026-02-18 13:55:30 +00:00
jif-oai	7b65b05e87	feat: validate agent config file paths (#12133 )	2026-02-18 13:48:52 +00:00
jif-oai	a9f5f633b2	feat: memory usage metrics (#12120 )	2026-02-18 12:45:19 +00:00
jif-oai	2293ab0e21	feat: phase 2 usage (#12121 )	2026-02-18 11:33:55 +00:00
jif-oai	f0ee2d9f67	feat: phase 1 and phase 2 e2e latencies (#12124 )	2026-02-18 11:30:20 +00:00
jif-oai	0dcf8d9c8f	Enable default status line indicators in TUI config (#12015 ) Default statusline to something <img width="307" height="83" alt="Screenshot 2026-02-17 at 18 16 12" src="https://github.com/user-attachments/assets/44e16153-0aa2-4c1a-9b4a-02e2feb8b7f6" />	2026-02-18 09:51:15 +00:00
Leo Shimonaka	1946a4c48b	fix: Restricted Read: /System is too permissive for macOS platform de… (#11798 ) …fault Update the list of platform defaults included for `ReadOnlyAccess`. When `ReadOnlyAccess::Restricted::include_platform_defaults` is `true`, the policy defined in `codex-rs/core/src/seatbelt_platform_defaults.sbpl` is appended to enable macOS programs to function properly.	2026-02-17 23:56:35 -08:00
aaronl-openai	f600453699	[js_repl] paths for node module resolution can be specified for js_repl (#11944 ) # External (non-OpenAI) Pull Request Requirements In `js_repl` mode, module resolution currently starts from `js_repl_kernel.js`, which is written to a per-kernel temp dir. This effectively means that bare imports will not resolve. This PR adds a new config option, `js_repl_node_module_dirs`, which is a list of dirs that are used (in order) to resolve a bare import. If none of those work, the current working directory of the thread is used. For example: ```toml js_repl_node_module_dirs = [ "/path/to/node_modules/", "/other/path/to/node_modules/", ] ```	2026-02-17 23:29:49 -08:00
Eric Traut	57f4e37539	Updated issue labeler script to include safety-check label (#12096 ) Also deleted obsolete prompt file	2026-02-17 22:44:42 -08:00
Charley Cunningham	c16f9daaaf	Add model-visible context layout snapshot tests (#12073 ) ## Summary - add a dedicated `core/tests/suite/model_visible_layout.rs` snapshot suite to materialize model-visible request layout in high-value scenarios - add three reviewer-focused snapshot scenarios: - turn-level context updates (cwd / permissions / personality) - first post-resume turn with model hydration + personality change - first post-resume turn where pre-turn model override matches rollout model - wire the new suite into `core/tests/suite/mod.rs` - commit generated `insta` snapshots under `core/tests/suite/snapshots/` ## Why This creates a stable, reviewable baseline of model-visible context layout against `main` before follow-on context-management refactors. It lets subsequent PRs show focused snapshot diffs for behavior changes instead of introducing the test surface and behavior changes at once. ## Testing - `just fmt` - `INSTA_UPDATE=always cargo test -p codex-core model_visible_layout`	2026-02-17 22:30:29 -08:00
Ahmed Ibrahim	03ce01e71f	codex-api: realtime websocket session.create + typed inbound events (#12036 ) ## Summary - add realtime websocket client transport in codex-api - send session.create on connect with backend prompt and optional conversation_id - keep session.update for prompt changes after connect - switch inbound event parsing to a tagged enum (typed variants instead of optional field bag) - add a websocket e2e integration test in codex-rs/codex-api/tests/realtime_websocket_e2e.rs ## Why This moves the realtime transport to an explicit session-create handshake and improves protocol safety with typed inbound events. ## Testing - Added e2e integration test coverage for session create + event flow in the API crate.	2026-02-17 22:17:01 -08:00
won-openai	189f592014	got rid of experimental_mode for configtoml (#12077 )	2026-02-17 21:10:30 -08:00

1 2 3 4 5 ...

3837 commits