core-agent-ide/codex-rs/protocol/src/items.rs
Andrei Eternal 267499bed8
[hooks] use a user message > developer message for prompt continuation (#14867)
## Summary

Persist Stop-hook continuation prompts as `user` messages instead of
hidden `developer` messages + some requested integration tests

This is a followup to @pakrym 's comment in
https://github.com/openai/codex/pull/14532 to make sure stop-block
continuation prompts match training for turn loops

- Stop continuation now writes `<hook_prompt hook_run_id="...">stop
hook's user prompt<hook_prompt>`
- Introduces quick-xml dependency, though we already indirectly depended
on it anyway via syntect
- This PR only has about 500 lines of actual logic changes, the rest is
tests/schema

## Testing

Example run (with a sessionstart hook and 3 stop hooks) - this shows
context added by session start, then two stop hooks sending their own
additional prompts in a new turn. The model responds with a single
message addressing both. Then when that turn ends, the hooks detect that
they just ran using `stop_hook_active` and decide not to infinite loop

test files for this (unzip, move codex -> .codex):
[codex.zip](https://github.com/user-attachments/files/26075806/codex.zip)

```
› cats


• Running SessionStart hook: lighting the observatory

SessionStart hook (completed)
  warning: Hi, I'm a session start hook for wizard-tower (startup).
  hook context: A wimboltine stonpet is an exotic cuisine from hyperspace

• Cats are tiny zen wizards, my friend: equal parts nap, mystery, and chaos. If you want, we can talk
  cat facts, cat breeds, cat names, or build something cat-themed in this repo.

• Running Stop hook: checking the tower wards

• Running Stop hook: sacking the guards

• Running Stop hook: hiring the guards

Stop hook (completed)
  warning: Wizard Tower Stop hook reviewed the completed reply (177 chars).

Stop hook (blocked)
  warning: Wizard Tower Stop hook continuing conversation
  feedback: cook the stonpet

Stop hook (blocked)
  warning: Wizard Tower Stop hook continuing conversation
  feedback: eat the cooked stonpet

• Stonpet’s cooked, aloha style: flash-seared over a blue quasiflame, glazed with nebula salt, and
  rested until the hyperspace juices settle.

  Now we eat with gratitude, my friend. One mindful bite in, and the flavor is pure cosmic surf:
  smoky, bright, and totally out of this dimension.

• Running Stop hook: checking the tower wards

• Running Stop hook: sacking the guards

• Running Stop hook: hiring the guards

Stop hook (completed)
  warning: Wizard Tower Stop hook reviewed the completed reply (285 chars).

Stop hook (completed)
  warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.

Stop hook (completed)
  warning: Wizard Tower Stop hook saw a second pass and stayed calm to avoid a loop.
```
2026-03-19 10:53:08 -07:00

448 lines
13 KiB
Rust

use crate::memory_citation::MemoryCitation;
use crate::models::ContentItem;
use crate::models::MessagePhase;
use crate::models::ResponseItem;
use crate::models::WebSearchAction;
use crate::protocol::AgentMessageEvent;
use crate::protocol::AgentReasoningEvent;
use crate::protocol::AgentReasoningRawContentEvent;
use crate::protocol::ContextCompactedEvent;
use crate::protocol::EventMsg;
use crate::protocol::ImageGenerationEndEvent;
use crate::protocol::UserMessageEvent;
use crate::protocol::WebSearchEndEvent;
use crate::user_input::ByteRange;
use crate::user_input::TextElement;
use crate::user_input::UserInput;
use quick_xml::de::from_str as from_xml_str;
use quick_xml::se::to_string as to_xml_string;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use ts_rs::TS;
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
#[serde(tag = "type")]
#[ts(tag = "type")]
pub enum TurnItem {
UserMessage(UserMessageItem),
HookPrompt(HookPromptItem),
AgentMessage(AgentMessageItem),
Plan(PlanItem),
Reasoning(ReasoningItem),
WebSearch(WebSearchItem),
ImageGeneration(ImageGenerationItem),
ContextCompaction(ContextCompactionItem),
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct UserMessageItem {
pub id: String,
pub content: Vec<UserInput>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema, PartialEq, Eq)]
pub struct HookPromptItem {
pub id: String,
pub fragments: Vec<HookPromptFragment>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema, PartialEq, Eq)]
#[serde(rename_all = "camelCase")]
#[ts(rename_all = "camelCase")]
pub struct HookPromptFragment {
pub text: String,
pub hook_run_id: String,
}
#[derive(Debug, Deserialize, Serialize)]
#[serde(rename = "hook_prompt")]
struct HookPromptXml {
#[serde(rename = "@hook_run_id")]
hook_run_id: String,
#[serde(rename = "$text")]
text: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
#[serde(tag = "type")]
#[ts(tag = "type")]
pub enum AgentMessageContent {
Text { text: String },
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
/// Assistant-authored message payload used in turn-item streams.
///
/// `phase` is optional because not all providers/models emit it. Consumers
/// should use it when present, but retain legacy completion semantics when it
/// is `None`.
pub struct AgentMessageItem {
pub id: String,
pub content: Vec<AgentMessageContent>,
/// Optional phase metadata carried through from `ResponseItem::Message`.
///
/// This is currently used by TUI rendering to distinguish mid-turn
/// commentary from a final answer and avoid status-indicator jitter.
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub phase: Option<MessagePhase>,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub memory_citation: Option<MemoryCitation>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct PlanItem {
pub id: String,
pub text: String,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct ReasoningItem {
pub id: String,
pub summary_text: Vec<String>,
#[serde(default)]
pub raw_content: Vec<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema, PartialEq)]
pub struct WebSearchItem {
pub id: String,
pub query: String,
pub action: WebSearchAction,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema, PartialEq)]
pub struct ImageGenerationItem {
pub id: String,
pub status: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub revised_prompt: Option<String>,
pub result: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
#[ts(optional)]
pub saved_path: Option<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize, TS, JsonSchema)]
pub struct ContextCompactionItem {
pub id: String,
}
impl ContextCompactionItem {
pub fn new() -> Self {
Self {
id: uuid::Uuid::new_v4().to_string(),
}
}
pub fn as_legacy_event(&self) -> EventMsg {
EventMsg::ContextCompacted(ContextCompactedEvent {})
}
}
impl Default for ContextCompactionItem {
fn default() -> Self {
Self::new()
}
}
impl UserMessageItem {
pub fn new(content: &[UserInput]) -> Self {
Self {
id: uuid::Uuid::new_v4().to_string(),
content: content.to_vec(),
}
}
pub fn as_legacy_event(&self) -> EventMsg {
// Legacy user-message events flatten only text inputs into `message` and
// rebase text element ranges onto that concatenated text.
EventMsg::UserMessage(UserMessageEvent {
message: self.message(),
images: Some(self.image_urls()),
local_images: self.local_image_paths(),
text_elements: self.text_elements(),
})
}
pub fn message(&self) -> String {
self.content
.iter()
.map(|c| match c {
UserInput::Text { text, .. } => text.clone(),
_ => String::new(),
})
.collect::<Vec<String>>()
.join("")
}
pub fn text_elements(&self) -> Vec<TextElement> {
let mut out = Vec::new();
let mut offset = 0usize;
for input in &self.content {
if let UserInput::Text {
text,
text_elements,
} = input
{
// Text element ranges are relative to each text chunk; offset them so they align
// with the concatenated message returned by `message()`.
for elem in text_elements {
let byte_range = ByteRange {
start: offset + elem.byte_range.start,
end: offset + elem.byte_range.end,
};
out.push(TextElement::new(
byte_range,
elem.placeholder(text).map(str::to_string),
));
}
offset += text.len();
}
}
out
}
pub fn image_urls(&self) -> Vec<String> {
self.content
.iter()
.filter_map(|c| match c {
UserInput::Image { image_url } => Some(image_url.clone()),
_ => None,
})
.collect()
}
pub fn local_image_paths(&self) -> Vec<std::path::PathBuf> {
self.content
.iter()
.filter_map(|c| match c {
UserInput::LocalImage { path } => Some(path.clone()),
_ => None,
})
.collect()
}
}
impl HookPromptItem {
pub fn from_fragments(id: Option<&String>, fragments: Vec<HookPromptFragment>) -> Self {
Self {
id: id
.cloned()
.unwrap_or_else(|| uuid::Uuid::new_v4().to_string()),
fragments,
}
}
}
impl HookPromptFragment {
pub fn from_single_hook(text: impl Into<String>, hook_run_id: impl Into<String>) -> Self {
Self {
text: text.into(),
hook_run_id: hook_run_id.into(),
}
}
}
pub fn build_hook_prompt_message(fragments: &[HookPromptFragment]) -> Option<ResponseItem> {
let content = fragments
.iter()
.filter(|fragment| !fragment.hook_run_id.trim().is_empty())
.filter_map(|fragment| {
serialize_hook_prompt_fragment(&fragment.text, &fragment.hook_run_id)
.map(|text| ContentItem::InputText { text })
})
.collect::<Vec<_>>();
if content.is_empty() {
return None;
}
Some(ResponseItem::Message {
id: Some(uuid::Uuid::new_v4().to_string()),
role: "user".to_string(),
content,
end_turn: None,
phase: None,
})
}
pub fn parse_hook_prompt_message(
id: Option<&String>,
content: &[ContentItem],
) -> Option<HookPromptItem> {
let fragments = content
.iter()
.map(|content_item| {
let ContentItem::InputText { text } = content_item else {
return None;
};
parse_hook_prompt_fragment(text)
})
.collect::<Option<Vec<_>>>()?;
if fragments.is_empty() {
return None;
}
Some(HookPromptItem::from_fragments(id, fragments))
}
pub fn parse_hook_prompt_fragment(text: &str) -> Option<HookPromptFragment> {
let trimmed = text.trim();
let HookPromptXml { text, hook_run_id } = from_xml_str::<HookPromptXml>(trimmed).ok()?;
if hook_run_id.trim().is_empty() {
return None;
}
Some(HookPromptFragment { text, hook_run_id })
}
fn serialize_hook_prompt_fragment(text: &str, hook_run_id: &str) -> Option<String> {
if hook_run_id.trim().is_empty() {
return None;
}
to_xml_string(&HookPromptXml {
text: text.to_string(),
hook_run_id: hook_run_id.to_string(),
})
.ok()
}
impl AgentMessageItem {
pub fn new(content: &[AgentMessageContent]) -> Self {
Self {
id: uuid::Uuid::new_v4().to_string(),
content: content.to_vec(),
phase: None,
memory_citation: None,
}
}
pub fn as_legacy_events(&self) -> Vec<EventMsg> {
self.content
.iter()
.map(|c| match c {
AgentMessageContent::Text { text } => EventMsg::AgentMessage(AgentMessageEvent {
message: text.clone(),
phase: self.phase.clone(),
memory_citation: self.memory_citation.clone(),
}),
})
.collect()
}
}
impl ReasoningItem {
pub fn as_legacy_events(&self, show_raw_agent_reasoning: bool) -> Vec<EventMsg> {
let mut events = Vec::new();
for summary in &self.summary_text {
events.push(EventMsg::AgentReasoning(AgentReasoningEvent {
text: summary.clone(),
}));
}
if show_raw_agent_reasoning {
for entry in &self.raw_content {
events.push(EventMsg::AgentReasoningRawContent(
AgentReasoningRawContentEvent {
text: entry.clone(),
},
));
}
}
events
}
}
impl WebSearchItem {
pub fn as_legacy_event(&self) -> EventMsg {
EventMsg::WebSearchEnd(WebSearchEndEvent {
call_id: self.id.clone(),
query: self.query.clone(),
action: self.action.clone(),
})
}
}
impl ImageGenerationItem {
pub fn as_legacy_event(&self) -> EventMsg {
EventMsg::ImageGenerationEnd(ImageGenerationEndEvent {
call_id: self.id.clone(),
status: self.status.clone(),
revised_prompt: self.revised_prompt.clone(),
result: self.result.clone(),
saved_path: self.saved_path.clone(),
})
}
}
impl TurnItem {
pub fn id(&self) -> String {
match self {
TurnItem::UserMessage(item) => item.id.clone(),
TurnItem::HookPrompt(item) => item.id.clone(),
TurnItem::AgentMessage(item) => item.id.clone(),
TurnItem::Plan(item) => item.id.clone(),
TurnItem::Reasoning(item) => item.id.clone(),
TurnItem::WebSearch(item) => item.id.clone(),
TurnItem::ImageGeneration(item) => item.id.clone(),
TurnItem::ContextCompaction(item) => item.id.clone(),
}
}
pub fn as_legacy_events(&self, show_raw_agent_reasoning: bool) -> Vec<EventMsg> {
match self {
TurnItem::UserMessage(item) => vec![item.as_legacy_event()],
TurnItem::HookPrompt(_) => Vec::new(),
TurnItem::AgentMessage(item) => item.as_legacy_events(),
TurnItem::Plan(_) => Vec::new(),
TurnItem::WebSearch(item) => vec![item.as_legacy_event()],
TurnItem::ImageGeneration(item) => vec![item.as_legacy_event()],
TurnItem::Reasoning(item) => item.as_legacy_events(show_raw_agent_reasoning),
TurnItem::ContextCompaction(item) => vec![item.as_legacy_event()],
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use pretty_assertions::assert_eq;
#[test]
fn hook_prompt_roundtrips_multiple_fragments() {
let original = vec![
HookPromptFragment::from_single_hook("Retry with care & joy.", "hook-run-1"),
HookPromptFragment::from_single_hook("Then summarize cleanly.", "hook-run-2"),
];
let message = build_hook_prompt_message(&original).expect("hook prompt");
let ResponseItem::Message { content, .. } = message else {
panic!("expected hook prompt message");
};
let parsed = parse_hook_prompt_message(None, &content).expect("parsed hook prompt");
assert_eq!(parsed.fragments, original);
}
#[test]
fn hook_prompt_parses_legacy_single_hook_run_id() {
let parsed = parse_hook_prompt_fragment(
r#"<hook_prompt hook_run_id="hook-run-1">Retry with tests.</hook_prompt>"#,
)
.expect("legacy hook prompt");
assert_eq!(
parsed,
HookPromptFragment {
text: "Retry with tests.".to_string(),
hook_run_id: "hook-run-1".to_string(),
}
);
}
}