Users were reporting that when they were actively editing a skill file,
they would see frequent errors (one per second) across all of their
active session until they fixed all frontmatter parse errors. This
change will reduce the chatter at the expense of a slightly longer delay
before skills are updated in the UI.
This addresses #11385
1. Move Windows Sandbox NUX to right after trust directory screen
2. Don't offer read-only as an option in Sandbox NUX.
Elevated/Legacy/Quit
3. Don't allow new untrusted directories. It's trust or quit
4. move experimental sandbox features to `[windows]
sandbox="elevated|unelevatd"`
5. Copy tweaks = elevated -> default, non-elevated -> non-admin
## Summary
- Increases `PASTE_BURST_CHAR_INTERVAL` from 8ms to 30ms on Windows to
fix multi-line paste issues in VS Code integrated terminal
- Follows existing pattern of platform-specific timing (like
`PASTE_BURST_ACTIVE_IDLE_TIMEOUT`)
## Problem
When pasting multi-line text in Codex CLI on Windows (especially VS Code
integrated terminal), only the first portion is captured before
auto-submit. The rest arrives as a separate message.
**Root cause**: VS Code's terminal emulation adds latency (~10-15ms per
character) between key events. The 8ms `PASTE_BURST_CHAR_INTERVAL`
threshold is too tight - characters arrive slower than expected, so
burst detection fails and Enter submits instead of inserting a newline.
## Solution
Use Windows-specific timing (30ms) for `PASTE_BURST_CHAR_INTERVAL`,
following the same pattern already used for
`PASTE_BURST_ACTIVE_IDLE_TIMEOUT` (60ms on Windows vs 8ms on Unix).
30ms is still fast enough to distinguish paste from typing (humans type
~200ms between keystrokes).
## Test plan
- [x] All existing paste_burst tests pass
- [ ] Test multi-line paste in VS Code integrated PowerShell on Windows
- [ ] Test multi-line paste in standalone Windows PowerShell
- [ ] Verify no regression on macOS/Linux
Fixes#2137
Co-authored-by: Josh McKinney <joshka@openai.com>
Reapply "Add app-server transport layer with websocket support" with
additional fixes from https://github.com/openai/codex/pull/11313/changes
to avoid deadlocking.
This reverts commit 47356ff83c.
## Summary
To avoid deadlocking when queues are full, we maintain separate tokio
tasks dedicated to incoming vs outgoing event handling
- split the app-server main loop into two tasks in
`run_main_with_transport`
- inbound handling (`transport_event_rx`)
- outbound handling (`outgoing_rx` + `thread_created_rx`)
- separate incoming and outgoing websocket tasks
## Validation
Integration tests, testing thoroughly e2e in codex app w/ >10 concurrent
requests
<img width="1365" height="979" alt="Screenshot 2026-02-10 at 2 54 22 PM"
src="https://github.com/user-attachments/assets/47ca2c13-f322-4e5c-bedd-25859cbdc45f"
/>
---------
Co-authored-by: jif-oai <jif@openai.com>
`codex-core` had accumulated config loading, requirements parsing,
constraint logic, and config-layer state handling in a single crate.
This change extracts that subsystem into `codex-config` to reduce
`codex-core` rebuild/test surface area and isolate future config work.
## What Changed
### Added `codex-config`
- Added new workspace crate `codex-rs/config` (`codex-config`).
- Added workspace/build wiring in:
- `codex-rs/Cargo.toml`
- `codex-rs/config/Cargo.toml`
- `codex-rs/config/BUILD.bazel`
- Updated lockfiles (`codex-rs/Cargo.lock`, `MODULE.bazel.lock`).
- Added `codex-core` -> `codex-config` dependency in
`codex-rs/core/Cargo.toml`.
### Moved config internals from `core` into `config`
Moved modules to `codex-rs/config/src/`:
- `core/src/config/constraint.rs` -> `config/src/constraint.rs`
- `core/src/config_loader/cloud_requirements.rs` ->
`config/src/cloud_requirements.rs`
- `core/src/config_loader/config_requirements.rs` ->
`config/src/config_requirements.rs`
- `core/src/config_loader/fingerprint.rs` -> `config/src/fingerprint.rs`
- `core/src/config_loader/merge.rs` -> `config/src/merge.rs`
- `core/src/config_loader/overrides.rs` -> `config/src/overrides.rs`
- `core/src/config_loader/requirements_exec_policy.rs` ->
`config/src/requirements_exec_policy.rs`
- `core/src/config_loader/state.rs` -> `config/src/state.rs`
`codex-config` now re-exports this surface from `config/src/lib.rs` at
the crate top level.
### Updated `core` to consume/re-export `codex-config`
- `core/src/config_loader/mod.rs` now imports/re-exports config-loader
types/functions from top-level `codex_config::*`.
- Local moved modules were removed from `core/src/config_loader/`.
- `core/src/config/mod.rs` now re-exports constraint types from
`codex_config`.
We're loading these from the web on every startup. This puts them in a
local file with a 1hr TTL.
We sign the downloaded requirements with a key compiled into the Codex
CLI to prevent unsophisticated tampering (determined circumvention is
outside of our threat model: after all, one could just compile Codex
without any of these checks).
If any of the following are true, we ignore the local cache and re-fetch
from Cloud:
* The signature is invalid for the payload (== requirements, sign time,
ttl, user identity)
* The identity does not match the auth'd user's identity
* The TTL has expired
* We cannot parse requirements.toml from the payload
We are removing feature-gated shared crates from the `codex-rs`
workspace. `codex-common` grouped several unrelated utilities behind
`[features]`, which made dependency boundaries harder to reason about
and worked against the ongoing effort to eliminate feature flags from
workspace crates.
Splitting these utilities into dedicated crates under `utils/` aligns
this area with existing workspace structure and keeps each dependency
explicit at the crate boundary.
## What changed
- Removed `codex-rs/common` (`codex-common`) from workspace members and
workspace dependencies.
- Added six new utility crates under `codex-rs/utils/`:
- `codex-utils-cli`
- `codex-utils-elapsed`
- `codex-utils-sandbox-summary`
- `codex-utils-approval-presets`
- `codex-utils-oss`
- `codex-utils-fuzzy-match`
- Migrated the corresponding modules out of `codex-common` into these
crates (with tests), and added matching `BUILD.bazel` targets.
- Updated direct consumers to use the new crates instead of
`codex-common`:
- `codex-rs/cli`
- `codex-rs/tui`
- `codex-rs/exec`
- `codex-rs/app-server`
- `codex-rs/mcp-server`
- `codex-rs/chatgpt`
- `codex-rs/cloud-tasks`
- Updated workspace lockfile entries to reflect the new dependency graph
and removal of `codex-common`.
Improve listing by doing:
1. List using the rollout file system
2. Upsert the result in the DB (if present)
3. Return the result of a DB listing
4. Fallback on the result of 1
+ some metrics on top of this
stage1_concurrent_claims_respect_running_cap was flaky due to SQLite
lock contention, not cap logic correctness. The claim flow used deferred
transactions (BEGIN) with read-then-write behavior, which can fail under
concurrency with SQLITE_BUSY_SNAPSHOT/database is locked when upgrading
a read transaction to a write transaction. We fixed this by using BEGIN
IMMEDIATE for stage1 and phase2 claim paths, so lock acquisition happens
up front and contenders serialize cleanly instead of failing during
upgrade. After the change, codex-state tests pass and stress reruns of
the flaky path no longer reproduced the failure.
## Why
`codex-core` was being built in multiple feature-resolved permutations
because test-only behavior was modeled as crate features. For a large
crate, those permutations increase compile cost and reduce cache reuse.
## Net Change
- Removed the `test-support` crate feature and related feature wiring so
`codex-core` no longer needs separate feature shapes for test consumers.
- Standardized cross-crate test-only access behind
`codex_core::test_support`.
- External test code now imports helpers from
`codex_core::test_support`.
- Underlying implementation hooks are kept internal (`pub(crate)`)
instead of broadly public.
## Outcome
- Fewer `codex-core` build permutations.
- Better incremental cache reuse across test targets.
- No intended production behavior change.
The debug output listed non-file-backed layers such as session flags and
MDM managed config, but it did not show their values. That made it
difficult to explain unexpected effective settings because users could
not inspect those layers on disk.
Now `/debug-config` might include output like this:
```
Config layer stack (lowest precedence first):
1. system (/etc/codex/config.toml) (enabled)
2. user (/Users/mbolin/.codex/config.toml) (enabled)
3. legacy managed_config.toml (mdm) (enabled)
MDM value:
# Production Codex configuration file.
[otel]
log_user_prompt = true
environment = "prod"
exporter = { otlp-http = {
endpoint = "https://example.com/otel",
protocol = "binary"
}}
```
Added multi-limit support end-to-end by carrying limit_name in
rate-limit snapshots and handling multiple buckets instead of only
codex.
Extended /usage client parsing to consume additional_rate_limits
Updated TUI /status and in-memory state to store/render per-limit
snapshots
Extended app-server rate-limit read response: kept rate_limits and added
rate_limits_by_name.
Adjusted usage-limit error messaging for non-default codex limit buckets
Problem:
1. turn id is constructed in-memory;
2. on resuming threads, turn_id might not be unique;
3. client cannot no the boundary of a turn from rollout files easily.
This PR does three things:
1. persist `task_started` and `task_complete` events;
1. persist `turn_id` in rollout turn events;
5. generate turn_id as unique uuids instead of incrementing it in
memory.
This helps us resolve the issue of clients wanting to have unique turn
ids for resuming a thread, and knowing the boundry of each turn in
rollout files.
example debug logs
```
2026-02-11T00:32:10.746876Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=8 turn=Turn { id: "019c4a07-d809-74c3-bc4b-fd9618487b4b", items: [UserMessage { id: "item-24", content: [Text { text: "hi", text_elements: [] }] }, AgentMessage { id: "item-25", text: "Hi. I’m in the workspace with your current changes loaded and ready. Send the next task and I’ll execute it end-to-end." }], status: Completed, error: None }
2026-02-11T00:32:10.746888Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=9 turn=Turn { id: "019c4a18-1004-76c0-a0fb-a77610f6a9b8", items: [UserMessage { id: "item-26", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-27", text: "Hello. Ready for the next change in `codex-rs`; I can continue from the current in-progress diff or start a new task." }], status: Completed, error: None }
2026-02-11T00:32:10.746899Z DEBUG codex_app_server_protocol::protocol::thread_history: built turn from rollout items turn_index=10 turn=Turn { id: "019c4a19-41f0-7db0-ad78-74f1503baeb8", items: [UserMessage { id: "item-28", content: [Text { text: "hello", text_elements: [] }] }, AgentMessage { id: "item-29", text: "Hello. Send the specific change you want in `codex-rs`, and I’ll implement it and run the required checks." }], status: Completed, error: None }
```
backward compatibility:
if you try to resume an old session without task_started and
task_complete event populated, the following happens:
- If you resume and do nothing: those reconstructed historical IDs can
differ next time you resume.
- If you resume and send a new turn: the new turn gets a fresh UUID from
live submission flow and is persisted, so that new turn’s ID is stable
on later resumes.
I think this behavior is fine, because we only care about deterministic
turn id once a turn is triggered.
## Summary
This should rarely, if ever, happen in practice. But regardless, we
should never provide an empty list of `commands` to ExecPolicy. This PR
is almost entirely adding test around these cases.
## Testing
- [x] Adds a bunch of unit tests for this
## Why
`codex-core` enabled `deterministic_process_ids` through a self
dev-dependency.
That forced a second feature-resolved build of the same crate, which
increased
compile time and test latency.
## What Changed
- Removed the `deterministic_process_ids` feature from
`codex-rs/core/Cargo.toml`.
- Removed the self dev-dependency on `codex-core` that enabled that
feature.
- Removed the Bazel `deterministic_process_ids` crate feature for
`codex-core`.
- Added a test-only `AtomicBool` override in unified exec process-id
allocation.
- Added a test-support setter for that override and re-exported it from
`codex-core`.
- Enabled deterministic process IDs in integration tests via
`core_test_support` ctor.
## Behavior
- Production behavior remains random process IDs.
- Unit tests remain deterministic via `cfg(test)`.
- Integration tests remain deterministic via explicit test-support
initialization.
## Validation
- `just fmt`
- `cargo test -p codex-core unified_exec::`
- `cargo test -p codex-core --test all unified_exec -- --test-threads=1`
- `cargo tree -p codex-core -e features` (verified the removed feature
path)
## Summary
This PR fixes TUI transcript-sync behavior for
`EventMsg::ThreadRolledBack` and makes rollback application order
deterministic.
Previously, rollback handling depended on `pending_rollback`:
- if `pending_rollback` was set (local backtrack), TUI trimmed correctly
- otherwise, replayed/external rollbacks were either ignored or could be
applied at the wrong time relative to queued transcript inserts
This change keeps the local backtrack path intact and routes non-pending
rollbacks through the app event queue so rollback trims are applied in
FIFO order with transcript cell inserts.
## What changed
- Added/used `trim_transcript_cells_drop_last_n_user_turns(...)` for
rollback-by-`num_turns` semantics.
- Renamed rollback app event:
- `AppEvent::ApplyReplayedThreadRollback` ->
`AppEvent::ApplyThreadRollback`
- Replay path (`ChatWidget`) now emits `ApplyThreadRollback`.
- Live non-pending rollback path (`App::handle_backtrack_event`) now
emits `ApplyThreadRollback` instead of trimming immediately.
- App-level event handler applies `ApplyThreadRollback` after queued
`InsertHistoryCell` events and schedules redraw only when a trim
occurred.
- When a trim occurs with an overlay open, TUI now syncs transcript
overlay committed cells, clamps backtrack preview selection, and clears
stale `deferred_history_lines` so closed overlays do not re-append
rolled-back lines.
- Clarified inline comments around the `pending_rollback` branch so
future readers can reason about why there are two paths.
## Why queueing matters
During resume/replay, transcript cells are populated via queued
`InsertHistoryCell` app events. If a rollback is applied immediately
outside that queue, it can run against an incomplete transcript and
under-trim. Queueing non-pending rollbacks ensures consistent ordering
and correct final transcript state.
## Behavior by rollback source
- `pending_rollback = Some(...)` (local backtrack requested by this
TUI):
- use `finish_pending_backtrack()` and the stored selection boundary
- `pending_rollback = None` (replay/external/non-local rollback):
- enqueue `AppEvent::ApplyThreadRollback { num_turns }` and trim in
app-event order
## Tests
Added/updated tests covering ordering and semantics:
-
`app_backtrack::tests::trim_drop_last_n_user_turns_applies_rollback_semantics`
- `app_backtrack::tests::trim_drop_last_n_user_turns_allows_overflow`
- `app::tests::replayed_initial_messages_apply_rollback_in_queue_order`
-
`app::tests::live_rollback_during_replay_is_applied_in_app_event_order`
-
`app::tests::queued_rollback_syncs_overlay_and_clears_deferred_history`
- `chatwidget::tests::replayed_thread_rollback_emits_ordered_app_event`
Validation run:
- `just fmt`
- `cargo test -p codex-tui`
Summary
- add a `prefer_websockets` field to `ModelInfo`, defaulting to `false`
in all fixtures and constructors
- wire the new flag into websocket selection so models that opt in
always use websocket transport even when the feature gate is off
Testing
- Not run (not requested)
## Summary
Add a DB-backed lease to prevent duplicate `.sqlite` backfill workers
from running concurrently.
### What changed
- Added StateRuntime::try_claim_backfill(lease_seconds) that atomically
claims backfill only when:
- backfill is not complete, and
- no fresh running worker currently owns it.
- Updated backfill_sessions to use the claim API and exit early when
another worker already holds the lease.
- Added runtime tests covering:
- singleton claim behavior,
- stale lease takeover,
- claim blocked after complete.
- Set backfill lease to 900s in production and 1s in tests.
### Why
This avoids duplicate backfill work and reduces backfill status churn
under concurrent startup, while preserving
current best-effort fallback behavior.
## Summary
- enable local-use defaults in network proxy settings: SOCKS5 on, SOCKS5
UDP on, upstream proxying on, and local binding on
- add a regression test that asserts the full
`NetworkProxySettings::default()` baseline
- Fixed managed listener reservation behavior.
Before: we always reserved a loopback SOCKS listener, even when
enable_socks5 = false.
Now: SOCKS listener is only reserved when SOCKS is enabled.
- Fixed /debug-config env output for SOCKS-disabled sessions.
ALL_PROXY now shows the HTTP proxy URL when SOCKS is disabled (instead
of incorrectly showing socks5h://...).
## Validation
- just fmt
- cargo test -p codex-network-proxy
- cargo clippy -p codex-network-proxy --all-targets
# Memories migration plan (simplified global workflow)
## Target behavior
- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
## Current code map
- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
## PR plan
## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)
- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.
Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.
## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)
- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.
Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.
## PR 3: Remove per-cwd memories and move to one global memory root
- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).
Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.
## PR 4: Phase-2 global lock simplification and cleanup
- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.
Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.
## PR 5: Final cleanup and docs
- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.
Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.
## Notes on sequencing
- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
`codex-core` had accumulated command parsing and command safety logic
(`bash`, `powershell`, `parse_command`, and `command_safety`) that is
logically cohesive but orthogonal to most core session/runtime logic.
Keeping this code in `codex-core` made the crate increasingly monolithic
and raised iteration cost for unrelated core changes.
This change extracts that surface into a dedicated crate,
`codex-command`, while preserving existing `codex_core::...` call sites
via re-exports.
## Why this refactor
During analysis, command parsing/safety stood out as a good first split
because it has:
- a clear domain boundary (shell parsing + safety classification)
- relatively self-contained dependencies (notably `tree-sitter` /
`tree-sitter-bash`)
- a meaningful standalone test surface (`134` tests moved with the
crate)
- many downstream uses that benefit from independent compilation and
caching
The practical problem was build latency from a large `codex-core`
compile/test graph. Clean-build timings before and after this split
showed measurable wins:
- `cargo check -p codex-core`: `57.08s` -> `53.54s` (~`6.2%` faster)
- `cargo test -p codex-core --no-run`: `2m39.9s` -> `2m20s` (~`12.4%`
faster)
- `codex-core lib` compile unit: `57.18s` -> `49.67s` (~`13.1%` faster)
- `codex-core lib(test)` compile unit: `60.87s` -> `53.21s` (~`12.6%`
faster)
This gives a concrete reduction in core build overhead without changing
behavior.
## What changed
### New crate
- Added `codex-rs/command` as workspace crate `codex-command`.
- Added:
- `command/src/lib.rs`
- `command/src/bash.rs`
- `command/src/powershell.rs`
- `command/src/parse_command.rs`
- `command/src/command_safety/*`
- `command/src/shell_detect.rs`
- `command/BUILD.bazel`
### Code moved out of `codex-core`
- Moved modules from `core/src` into `command/src`:
- `bash.rs`
- `powershell.rs`
- `parse_command.rs`
- `command_safety/*`
### Dependency graph updates
- Added workspace member/dependency entries for `codex-command` in
`codex-rs/Cargo.toml`.
- Added `codex-command` dependency to `codex-rs/core/Cargo.toml`.
- Removed `tree-sitter` and `tree-sitter-bash` from `codex-core` direct
deps (now owned by `codex-command`).
### API compatibility for callers
To avoid immediate downstream churn, `codex-core` now re-exports the
moved modules/functions:
- `codex_command::bash`
- `codex_command::powershell`
- `codex_command::parse_command`
- `codex_command::is_safe_command`
- `codex_command::is_dangerous_command`
This keeps existing `codex_core::...` paths working while enabling
gradual migration to direct `codex-command` usage.
### Internal decoupling detail
- Added `command::shell_detect` so moved `bash`/`powershell` logic no
longer depends on core shell internals.
- Adjusted PowerShell helper visibility in `codex-command` for existing
core test usage (`UTF8` prefix helper + executable discovery functions).
## Validation
- `just fmt`
- `just fix -p codex-command -p codex-core`
- `cargo test -p codex-command` (`134` passed)
- `cargo test -p codex-core --no-run`
- `cargo test -p codex-core shell_command_handler`
## Notes / follow-up
This commit intentionally prioritizes boundary extraction and
compatibility. A follow-up can migrate downstream crates to depend
directly on `codex-command` (instead of through `codex-core` re-exports)
to realize additional incremental build wins.
# Memories migration plan (simplified global workflow)
## Target behavior
- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
## Current code map
- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
## PR plan
## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)
- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.
Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.
## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)
- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.
Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.
## PR 3: Remove per-cwd memories and move to one global memory root
- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).
Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.
## PR 4: Phase-2 global lock simplification and cleanup
- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.
Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.
## PR 5: Final cleanup and docs
- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.
Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.
## Notes on sequencing
- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
# Memories migration plan (simplified global workflow)
## Target behavior
- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
## Current code map
- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
## PR plan
## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)
- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.
Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.
## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)
- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.
Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.
## PR 3: Remove per-cwd memories and move to one global memory root
- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).
Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.
## PR 4: Phase-2 global lock simplification and cleanup
- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.
Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.
## PR 5: Final cleanup and docs
- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.
Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.
## Notes on sequencing
- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.