## Summary
This is a purely mechanical refactor of `OtelManager` ->
`SessionTelemetry` to better convey what the struct is doing. No
behavior change.
## Why
`OtelManager` ended up sounding much broader than what this type
actually does. It doesn't manage OTEL globally; it's the session-scoped
telemetry surface for emitting log/trace events and recording metrics
with consistent session metadata (`app_version`, `model`, `slug`,
`originator`, etc.).
`SessionTelemetry` is a more accurate name, and updating the call sites
makes that boundary a lot easier to follow.
## Validation
- `just fmt`
- `cargo test -p codex-otel`
- `cargo test -p codex-core`
This branch:
* Avoid flushing DB when not necessary
* Filter events for which we perfom an `upsert` into the DB
* Add a dedicated update function of the `thread:updated_at` that is
lighter
This should significantly reduce the DB lock contention. If it is not
sufficient, we can de-sync the flush of the DB for `updated_at`
## Summary
- move sqlite log reads and writes onto a dedicated `logs_1.sqlite`
database to reduce lock contention with the main state DB
- add a dedicated logs migrator and route `codex-state-logs` to the new
database path
- leave the old `logs` table in the existing state DB untouched for now
## Testing
- just fmt
- cargo test -p codex-state
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- keep the SQLite schema unchanged (no migrations)
- add timestamps to SQLite-backed `/feedback` log exports
- keep the existing SQL-side byte cap behavior and newline handling
- document the remaining fidelity gap (span prefixes + structured
fields) with TODOs
## Details
- update `query_feedback_logs` to format each exported line as:
- `YYYY-MM-DDTHH:MM:SS.ffffffZ {level} {message}`
- continue scoping rows to requested-thread + same-process threadless
logs
- continue capping in SQL before returning rows
- keep the existing fallback behavior unchanged when SQLite returns no
rows
- update parity tests to normalize away the new timestamp prefix while
we still only store `message`
## Follow-up
- TODO already in code: persist enough span/event metadata in SQLite to
reproduce span prefixes and structured fields in `/feedback` exports
## Testing
- `cargo test -p codex-state`
- `just fmt`
---------
Co-authored-by: Codex <noreply@openai.com>
## Summary
- add the v2 `thread/metadata/update` API, including
protocol/schema/TypeScript exports and app-server docs
- patch stored thread `gitInfo` in sqlite without resuming the thread,
with validation plus support for explicit `null` clears
- repair missing sqlite thread rows from rollout data before patching,
and make those repairs safe by inserting only when absent and updating
only git columns so newer metadata is not clobbered
- keep sqlite authoritative for mutable thread git metadata by
preserving existing sqlite git fields during reconcile/backfill and only
using rollout `SessionMeta` git fields to fill gaps
- add regression coverage for the endpoint, repair paths, concurrent
sqlite writes, clearing git fields, and rollout/backfill reconciliation
- fix the login server shutdown race so cancelling before the waiter
starts still terminates `block_until_done()` correctly
## Testing
- `cargo test -p codex-state
apply_rollout_items_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-state
update_thread_git_info_preserves_newer_non_git_metadata`
- `cargo test -p codex-core
backfill_sessions_preserves_existing_git_branch_and_fills_missing_git_fields`
- `cargo test -p codex-app-server thread_metadata_update`
- `cargo test`
- currently fails in existing `codex-core` grep-files tests with
`unsupported call: grep_files`:
- `suite::grep_files::grep_files_tool_collects_matches`
- `suite::grep_files::grep_files_tool_reports_empty_results`
## Summary
- write app-server SQLite logs at TRACE level when SQLite is enabled
- source app-server `/feedback` log attachments from SQLite for the
requested thread when available
- flush buffered SQLite log writes before `/feedback` queries them so
newly emitted events are not lost behind the async inserter
- include same-process threadless SQLite rows in those `/feedback` logs
so the attachment matches the process-wide feedback buffer more closely
- keep the existing in-memory ring buffer fallback unchanged, including
when the SQLite query returns no rows
## Details
- add a byte-bounded `query_feedback_logs` helper in `codex-state` so
`/feedback` does not fetch all rows before truncating
- scope SQLite feedback logs to the requested thread plus threadless
rows from the same `process_uuid`
- format exported SQLite feedback lines with the log level prefix to
better match the in-memory feedback formatter
- add an explicit `LogDbLayer::flush()` control path and await it in
app-server before querying SQLite for feedback logs
- pass optional SQLite log bytes through `codex-feedback` as the
`codex-logs.log` attachment override
- leave TUI behavior unchanged apart from the updated `upload_feedback`
call signature
- add regression coverage for:
- newest-within-budget ordering
- excluding oversized newest rows
- including same-process threadless rows
- keeping the newest suffix across mixed thread and threadless rows
- matching the feedback formatter shape aside from span prefixes
- falling back to the in-memory snapshot when SQLite returns no logs
- flushing buffered SQLite rows before querying
## Follow-up
- SQLite feedback exports still do not reproduce span prefixes like
`feedback-thread{thread_id=...}:`; there is a `TODO(ccunningham)` in
`codex-rs/state/src/log_db.rs` for that follow-up.
## Testing
- `cd codex-rs && cargo test -p codex-state`
- `cd codex-rs && cargo test -p codex-app-server`
- `cd codex-rs && just fmt`
#### what
adds a `codex debug clear-memories` command to help with clearing all
memories state from disk, sqlite db, and marking threads as
`memory_mode=disabled` so they don't get resummarized when the
`memories` feature is re-enabled.
#### tests
add tests
stage1_concurrent_claims_respect_running_cap was flaky due to SQLite
lock contention, not cap logic correctness. The claim flow used deferred
transactions (BEGIN) with read-then-write behavior, which can fail under
concurrency with SQLITE_BUSY_SNAPSHOT/database is locked when upgrading
a read transaction to a write transaction. We fixed this by using BEGIN
IMMEDIATE for stage1 and phase2 claim paths, so lock acquisition happens
up front and contenders serialize cleanly instead of failing during
upgrade. After the change, codex-state tests pass and stress reruns of
the flaky path no longer reproduced the failure.
# Memories migration plan (simplified global workflow)
## Target behavior
- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
## Current code map
- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
## PR plan
## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)
- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.
Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.
## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)
- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.
Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.
## PR 3: Remove per-cwd memories and move to one global memory root
- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).
Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.
## PR 4: Phase-2 global lock simplification and cleanup
- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.
Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.
## PR 5: Final cleanup and docs
- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.
Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.
## Notes on sequencing
- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
# Memories migration plan (simplified global workflow)
## Target behavior
- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
## Current code map
- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
## PR plan
## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)
- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.
Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.
## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)
- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.
Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.
## PR 3: Remove per-cwd memories and move to one global memory root
- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).
Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.
## PR 4: Phase-2 global lock simplification and cleanup
- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.
Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.
## PR 5: Final cleanup and docs
- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.
Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.
## Notes on sequencing
- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.
# Memories migration plan (simplified global workflow)
## Target behavior
- One shared memory root only: `~/.codex/memories/`.
- No per-cwd memory buckets, no cwd hash handling.
- Phase 1 candidate rules:
- Not currently being processed unless the job lease is stale.
- Rollout updated within the max-age window (currently 30 days).
- Rollout idle for at least 12 hours (new constant).
- Global cap: at most 64 stage-1 jobs in `running` state at any time
(new invariant).
- Stage-1 model output shape (new):
- `rollout_slug` (accepted but ignored for now).
- `rollout_summary`.
- `raw_memory`.
- Phase-1 artifacts written under the shared root:
- `rollout_summaries/<thread_id>.md` for each rollout summary.
- `raw_memories.md` containing appended/merged raw memory paragraphs.
- Phase 2 runs one consolidation agent for the shared `memories/`
directory.
- Phase-2 lock is DB-backed with 1 hour lease and heartbeat/expiry.
## Current code map
- Core startup pipeline: `core/src/memories/startup/mod.rs`.
- Stage-1 request+parse: `core/src/memories/startup/extract.rs`,
`core/src/memories/stage_one.rs`, templates in
`core/templates/memories/`.
- File materialization: `core/src/memories/storage.rs`,
`core/src/memories/layout.rs`.
- Scope routing (cwd/user): `core/src/memories/scope.rs`,
`core/src/memories/startup/mod.rs`.
- DB job lifecycle and scope queueing: `state/src/runtime/memory.rs`.
## PR plan
## PR 1: Correct phase-1 selection invariants (no behavior-breaking
layout changes yet)
- Add `PHASE_ONE_MIN_ROLLOUT_IDLE_HOURS: i64 = 12` in
`core/src/memories/mod.rs`.
- Thread this into `state::claim_stage1_jobs_for_startup(...)`.
- Enforce idle-time filter in DB selection logic (not only in-memory
filtering after `scan_limit`) so eligible threads are not starved by
very recent threads.
- Enforce global running cap of 64 at claim time in DB logic:
- Count fresh `memory_stage1` running jobs.
- Only allow new claims while count < cap.
- Keep stale-lease takeover behavior intact.
- Add/adjust tests in `state/src/runtime.rs`:
- Idle filter inclusion/exclusion around 12h boundary.
- Global running-cap guarantee.
- Existing stale/fresh ownership behavior still passes.
Acceptance criteria:
- Startup never creates more than 64 fresh `memory_stage1` running jobs.
- Threads updated <12h ago are skipped.
- Threads older than 30d are skipped.
## PR 2: Stage-1 output contract + storage artifacts
(forward-compatible)
- Update parser/types to accept the new structured output while keeping
backward compatibility:
- Add `rollout_slug` (optional for now).
- Add `rollout_summary`.
- Keep alias support for legacy `summary` and `rawMemory` until prompt
swap completes.
- Update stage-1 schema generator in `core/src/memories/stage_one.rs` to
include the new keys.
- Update prompt templates:
- `core/templates/memories/stage_one_system.md`.
- `core/templates/memories/stage_one_input.md`.
- Replace storage model in `core/src/memories/storage.rs`:
- Introduce `rollout_summaries/` directory writer (`<thread_id>.md`
files).
- Introduce `raw_memories.md` aggregator writer from DB rows.
- Keep deterministic rebuild behavior from DB outputs so files can
always be regenerated.
- Update consolidation prompt template to reference `rollout_summaries/`
+ `raw_memories.md` inputs.
Acceptance criteria:
- Stage-1 accepts both old and new output keys during migration.
- Phase-1 artifacts are generated in new format from DB state.
- No dependence on per-thread files in `raw_memories/`.
## PR 3: Remove per-cwd memories and move to one global memory root
- Simplify layout in `core/src/memories/layout.rs`:
- Single root: `codex_home/memories`.
- Remove cwd-hash bucket helpers and normalization logic used only for
memory pathing.
- Remove scope branching from startup phase-2 dispatch path:
- No cwd/user mapping in `core/src/memories/startup/mod.rs`.
- One target root for consolidation.
- In `state/src/runtime/memory.rs`, stop enqueueing/handling cwd
consolidation scope.
- Keep one logical consolidation scope/job key (global/user) to avoid a
risky schema rewrite in same PR.
- Add one-time migration helper (core side) to preserve current shared
memory output:
- If `~/.codex/memories/user/memory` exists and new root is empty,
move/copy contents into `~/.codex/memories`.
- Leave old hashed cwd buckets untouched for now (safe/no-destructive
migration).
Acceptance criteria:
- New runs only read/write `~/.codex/memories`.
- No new cwd-scoped consolidation jobs are enqueued.
- Existing user-shared memory content is preserved.
## PR 4: Phase-2 global lock simplification and cleanup
- Replace multi-scope dispatch with a single global consolidation claim
path:
- Either reuse jobs table with one fixed key, or add a tiny dedicated
lock helper; keep 1h lease.
- Ensure at most one consolidation agent can run at once.
- Keep heartbeat + stale lock recovery semantics in
`core/src/memories/startup/watch.rs`.
- Remove dead scope code and legacy constants no longer used.
- Update tests:
- One-agent-at-a-time behavior.
- Lock expiry allows takeover after stale lease.
Acceptance criteria:
- Exactly one phase-2 consolidation agent can be active cluster-wide
(per local DB).
- Stale lock recovers automatically.
## PR 5: Final cleanup and docs
- Remove legacy artifacts and references:
- `raw_memories/` and `memory_summary.md` assumptions from
prompts/comments/tests.
- Scope constants for cwd memory pathing in core/state if fully unused.
- Update docs under `docs/` for memory workflow and directory layout.
- Add a brief operator note for rollout: compatibility window for old
stage-1 JSON keys and when to remove aliases.
Acceptance criteria:
- Code and docs reflect only the simplified global workflow.
- No stale references to per-cwd memory buckets.
## Notes on sequencing
- PR 1 is safest first because it improves correctness without changing
external artifact layout.
- PR 2 keeps parser compatibility so prompt deployment can happen
independently.
- PR 3 and PR 4 split filesystem/scope simplification from locking
simplification to reduce blast radius.
- PR 5 is intentionally cleanup-only.