core-agent-ide/codex-rs/utils/stream-parser/README.md

# codex-utils-stream-parser

Small, dependency-free utilities for parsing streamed text incrementally.

**Disclaimer**: This code is pretty complex and Codex did not manage to write it so before updating the code, make
sure to deeply understand it and don't blindly trust Codex on it. Feel free to update the documentation as you
modify the code

## What it provides

- `StreamTextParser`: trait for incremental parsers that consume string chunks
- `InlineHiddenTagParser<T>`: generic parser that hides inline tags and extracts their contents
- `CitationStreamParser`: convenience wrapper for `<oai-mem-citation>...</oai-mem-citation>`
- `strip_citations(...)`: one-shot helper for non-streamed strings
- `Utf8StreamParser<P>`: adapter for raw `&[u8]` streams that may split UTF-8 code points

## Why this exists

Some model outputs arrive as a stream and may contain hidden markup (for example
`<oai-mem-citation>...</oai-mem-citation>`) split across chunk boundaries. Parsing each chunk
independently is incorrect because tags can be split (`<oai-mem-` + `citation>`).

This crate keeps parser state across chunks, returns visible text safe to render
immediately, and extracts hidden payloads separately.

## Example: citation streaming

```rust
use codex_utils_stream_parser::CitationStreamParser;
use codex_utils_stream_parser::StreamTextParser;

let mut parser = CitationStreamParser::new();

let first = parser.push_str("Hello <oai-mem-");
assert_eq!(first.visible_text, "Hello ");
assert!(first.extracted.is_empty());

let second = parser.push_str("citation>doc A</oai-mem-citation> world");
assert_eq!(second.visible_text, " world");
assert_eq!(second.extracted, vec!["doc A".to_string()]);

let tail = parser.finish();
assert!(tail.visible_text.is_empty());
assert!(tail.extracted.is_empty());
```

## Example: raw byte streaming with split UTF-8 code points

```rust
use codex_utils_stream_parser::CitationStreamParser;
use codex_utils_stream_parser::Utf8StreamParser;

# fn demo() -> Result<(), codex_utils_stream_parser::Utf8StreamParserError> {
let mut parser = Utf8StreamParser::new(CitationStreamParser::new());

// "é" split across chunks: 0xC3 + 0xA9
let first = parser.push_bytes(&[b'H', 0xC3])?;
assert_eq!(first.visible_text, "H");

let second = parser.push_bytes(&[0xA9, b'!'])?;
assert_eq!(second.visible_text, "é!");

let tail = parser.finish()?;
assert!(tail.visible_text.is_empty());
# Ok(())
# }
```

## Example: custom hidden tags

```rust
use codex_utils_stream_parser::InlineHiddenTagParser;
use codex_utils_stream_parser::InlineTagSpec;
use codex_utils_stream_parser::StreamTextParser;

#[derive(Clone, Debug, PartialEq, Eq)]
enum Tag {
    Secret,
}

let mut parser = InlineHiddenTagParser::new(vec![InlineTagSpec {
    tag: Tag::Secret,
    open: "<secret>",
    close: "</secret>",
}]);

let out = parser.push_str("a<secret>x</secret>b");
assert_eq!(out.visible_text, "ab");
assert_eq!(out.extracted.len(), 1);
assert_eq!(out.extracted[0].content, "x");
```

## Known limitations

- Tags are matched literally and case-sensitively
- No nested tag support
- A stream can return empty objects.
feat: adding stream parser (#12666) Add a stream parser to extract citations (and others) from a stream. This support cases where markers are split in differen tokens. Codex never manage to make this code work so everything was done manually. Please review correctly and do not touch this part of the code without a very clear understanding of it 2026-02-25 13:27:58 +00:00			`# codex-utils-stream-parser`

			`Small, dependency-free utilities for parsing streamed text incrementally.`

			`Disclaimer: This code is pretty complex and Codex did not manage to write it so before updating the code, make`
			`sure to deeply understand it and don't blindly trust Codex on it. Feel free to update the documentation as you`
			`modify the code`

			`## What it provides`

			- `StreamTextParser`: trait for incremental parsers that consume string chunks
			- `InlineHiddenTagParser<T>`: generic parser that hides inline tags and extracts their contents
			- `CitationStreamParser`: convenience wrapper for `<oai-mem-citation>...</oai-mem-citation>`
			- `strip_citations(...)`: one-shot helper for non-streamed strings
			- `Utf8StreamParser<P>`: adapter for raw `&[u8]` streams that may split UTF-8 code points

			`## Why this exists`

			`Some model outputs arrive as a stream and may contain hidden markup (for example`
			`<oai-mem-citation>...</oai-mem-citation>`) split across chunk boundaries. Parsing each chunk
			independently is incorrect because tags can be split (`<oai-mem-` + `citation>`).

			`This crate keeps parser state across chunks, returns visible text safe to render`
			`immediately, and extracts hidden payloads separately.`

			`## Example: citation streaming`

			```rust
			`use codex_utils_stream_parser::CitationStreamParser;`
			`use codex_utils_stream_parser::StreamTextParser;`

			`let mut parser = CitationStreamParser::new();`

			`let first = parser.push_str("Hello <oai-mem-");`
			`assert_eq!(first.visible_text, "Hello ");`
			`assert!(first.extracted.is_empty());`

			`let second = parser.push_str("citation>doc A</oai-mem-citation> world");`
			`assert_eq!(second.visible_text, " world");`
			`assert_eq!(second.extracted, vec!["doc A".to_string()]);`

			`let tail = parser.finish();`
			`assert!(tail.visible_text.is_empty());`
			`assert!(tail.extracted.is_empty());`
			```

			`## Example: raw byte streaming with split UTF-8 code points`

			```rust
			`use codex_utils_stream_parser::CitationStreamParser;`
			`use codex_utils_stream_parser::Utf8StreamParser;`

			`# fn demo() -> Result<(), codex_utils_stream_parser::Utf8StreamParserError> {`
			`let mut parser = Utf8StreamParser::new(CitationStreamParser::new());`

			`// "é" split across chunks: 0xC3 + 0xA9`
			`let first = parser.push_bytes(&[b'H', 0xC3])?;`
			`assert_eq!(first.visible_text, "H");`

			`let second = parser.push_bytes(&[0xA9, b'!'])?;`
			`assert_eq!(second.visible_text, "é!");`

			`let tail = parser.finish()?;`
			`assert!(tail.visible_text.is_empty());`
			`# Ok(())`
			`# }`
			```

			`## Example: custom hidden tags`

			```rust
			`use codex_utils_stream_parser::InlineHiddenTagParser;`
			`use codex_utils_stream_parser::InlineTagSpec;`
			`use codex_utils_stream_parser::StreamTextParser;`

			`#[derive(Clone, Debug, PartialEq, Eq)]`
			`enum Tag {`
			`Secret,`
			`}`

			`let mut parser = InlineHiddenTagParser::new(vec![InlineTagSpec {`
			`tag: Tag::Secret,`
			`open: "<secret>",`
			`close: "</secret>",`
			`}]);`

			`let out = parser.push_str("a<secret>x</secret>b");`
			`assert_eq!(out.visible_text, "ab");`
			`assert_eq!(out.extracted.len(), 1);`
			`assert_eq!(out.extracted[0].content, "x");`
			```

			`## Known limitations`

			`- Tags are matched literally and case-sensitively`
			`- No nested tag support`
			`- A stream can return empty objects.`