core-agent-ide/docs/tui2/performance-testing.md

# Performance testing (`codex-tui2`)

This doc captures a repeatable workflow for investigating `codex-tui2` performance issues
(especially high idle CPU and high CPU while streaming) and validating optimizations to the draw
hot path.

## Scope (this round)

The current focus is the transcript draw hot path, specifically the cost of repeatedly rendering
the same visible transcript lines via Ratatui’s `Line::render_ref` (notably grapheme segmentation
and span layout).

The intended mitigation is a **rasterization cache**: render a wrapped transcript `Line` into a
row of `Cell`s once, cache it, and on subsequent redraws copy cached cells into the frame buffer.

Key invariants:

- The cache is width-scoped (invalidate on terminal width changes).
- The cache stores **base content** only; selection highlight and copy affordances are applied
  after rendering, so they don’t pollute cached rows.

## Roles

- Human: runs `codex-tui2` in an interactive terminal (e.g. Ghostty), triggers “idle” and
  “streaming” scenarios, and captures profiles.
- Assistant (or a script): reads profile output and extracts hotspots and deltas.

## Baseline setup

Build from a clean checkout:

```sh
cd codex-rs
cargo build -p codex-tui2
```

Run `codex-tui2` in a terminal and get a PID (macOS):

```sh
pgrep -n codex-tui2
```

Track CPU quickly while reproducing:

```sh
top -pid "$(pgrep -n codex-tui2)"
```

## Capture profiles (macOS)

Capture both an “idle” and a “streaming” profile so hotspots are not conflated:

```sh
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.idle.sample.txt
sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.streaming.sample.txt
```

For the streaming sample, trigger a response that emits many deltas (e.g. “Tell me a story”) so
the stream runs long enough to sample.

## Quick hotspot extraction

These `rg` patterns keep the investigation grounded in the data:

```sh
# Buffer diff hot path (idle)
rg -n "custom_terminal::diff_buffers|diff_buffers" /tmp/tui2.*.sample.txt | head -n 80

# Transcript rendering hot path (streaming)
rg -n "App::render_transcript_cells|Line::render|render_spans|styled_graphemes|GraphemeCursor::next_boundary" /tmp/tui2.*.sample.txt | head -n 120
```

## Rasterization-cache validation checklist

After implementing a transcript rasterization cache, re-run the same scenarios and confirm:

- Streaming sample shifts away from `unicode_segmentation::grapheme::GraphemeCursor::next_boundary`
  stacks dominating the main thread.
- CPU during streaming drops materially vs baseline for the same streaming load.
- Idle CPU does not regress (redraw gating changes can mask rendering improvements; always measure
  both idle and streaming).

## Notes to record per run

- Terminal size: width × height
- Scenario: idle vs streaming (prompt + approximate response length)
- CPU snapshot: `top` (directional)
- Profile excerpt: 20–50 relevant lines for the dominant stacks

## Code pointers

- `codex-rs/tui2/src/transcript_view_cache.rs`: wrapped transcript memoization + per-line
  rasterization cache (cached `Cell` rows).
- `codex-rs/tui2/src/transcript_render.rs`: incremental helper used by the wrapped-line cache
  (`append_wrapped_transcript_cell`).
- `codex-rs/tui2/src/app.rs`: wiring in `App::render_transcript_cells` (uses cached rows instead of
  calling `Line::render_ref` every frame).
-												perf(tui2): cache transcript view rendering (#8693)

The transcript viewport draws every frame. Ratatui's Line::render_ref
does grapheme segmentation and span layout, so repeated redraws can burn
CPU during streaming even when the visible transcript hasn't changed.

Introduce TranscriptViewCache to reduce per-frame work:
- WrappedTranscriptCache memoizes flattened+wrapped transcript lines per
width, appends incrementally as new cells arrive, and rebuilds on width
change, truncation (backtrack), or transcript replacement.
- TranscriptRasterCache caches rasterized rows (Vec<Cell>) per line
index and user-row styling; redraws copy cells instead of rerendering
spans.

The caches are width-scoped and store base transcript content only;
selection highlighting and copy affordances are applied after drawing.
User rows include the row-wide base style in the cached raster.

Refactor transcript_render to expose append_wrapped_transcript_cell for
incremental building and add a test that incremental append matches the
full build.

Add docs/tui2/performance-testing.md as a playbook for macOS sample
profiles and hotspot greps.

Expand transcript_view_cache tests to cover rebuild conditions, raster
equivalence vs direct rendering, user-row caching, and eviction.

Test: cargo test -p codex-tui2
											
										
										
											2026-01-03 11:44:27 -08:00
+								# Performance testing (`codex-tui2`)
 								This doc captures a repeatable workflow for investigating `codex-tui2` performance issues
 								(especially high idle CPU and high CPU while streaming) and validating optimizations to the draw
 								hot path.
 								## Scope (this round)
 								The current focus is the transcript draw hot path, specifically the cost of repeatedly rendering
 								the same visible transcript lines via Ratatui’s `Line::render_ref` (notably grapheme segmentation
 								and span layout).
 								The intended mitigation is a **rasterization cache**: render a wrapped transcript `Line` into a
 								row of `Cell`s once, cache it, and on subsequent redraws copy cached cells into the frame buffer.
 								Key invariants:
 								- The cache is width-scoped (invalidate on terminal width changes).
 								- The cache stores **base content** only; selection highlight and copy affordances are applied
 								  after rendering, so they don’t pollute cached rows.
 								## Roles
 								- Human: runs `codex-tui2` in an interactive terminal (e.g. Ghostty), triggers “idle” and
 								  “streaming” scenarios, and captures profiles.
 								- Assistant (or a script): reads profile output and extracts hotspots and deltas.
 								## Baseline setup
 								Build from a clean checkout:
 								```sh
 								cd codex-rs
 								cargo build -p codex-tui2
 								```
 								Run `codex-tui2` in a terminal and get a PID (macOS):
 								```sh
 								pgrep -n codex-tui2
 								```
 								Track CPU quickly while reproducing:
 								```sh
 								top -pid "$(pgrep -n codex-tui2)"
 								```
 								## Capture profiles (macOS)
 								Capture both an “idle” and a “streaming” profile so hotspots are not conflated:
 								```sh
 								sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.idle.sample.txt
 								sample "$(pgrep -n codex-tui2)" 1 -file /tmp/tui2.streaming.sample.txt
 								```
 								For the streaming sample, trigger a response that emits many deltas (e.g. “Tell me a story”) so
 								the stream runs long enough to sample.
 								## Quick hotspot extraction
 								These `rg` patterns keep the investigation grounded in the data:
 								```sh
 								# Buffer diff hot path (idle)
 								rg -n "custom_terminal::diff_buffers|diff_buffers" /tmp/tui2.*.sample.txt | head -n 80
 								# Transcript rendering hot path (streaming)
 								rg -n "App::render_transcript_cells|Line::render|render_spans|styled_graphemes|GraphemeCursor::next_boundary" /tmp/tui2.*.sample.txt | head -n 120
 								```
 								## Rasterization-cache validation checklist
 								After implementing a transcript rasterization cache, re-run the same scenarios and confirm:
 								- Streaming sample shifts away from `unicode_segmentation::grapheme::GraphemeCursor::next_boundary`
 								  stacks dominating the main thread.
 								- CPU during streaming drops materially vs baseline for the same streaming load.
 								- Idle CPU does not regress (redraw gating changes can mask rendering improvements; always measure
 								  both idle and streaming).
 								## Notes to record per run
 								- Terminal size: width × height
 								- Scenario: idle vs streaming (prompt + approximate response length)
 								- CPU snapshot: `top` (directional)
 								- Profile excerpt: 20–50 relevant lines for the dominant stacks
 								## Code pointers
 								- `codex-rs/tui2/src/transcript_view_cache.rs`: wrapped transcript memoization + per-line
 								  rasterization cache (cached `Cell` rows).
 								- `codex-rs/tui2/src/transcript_render.rs`: incremental helper used by the wrapped-line cache
 								  (`append_wrapped_transcript_cell`).
 								- `codex-rs/tui2/src/app.rs`: wiring in `App::render_transcript_cells` (uses cached rows instead of
 								  calling `Line::render_ref` every frame).