core-agent-ide/codex-rs/protocol/src
charley-oai 6709ad8975
Label attached images so agent can understand in-message labels (#8950)
Agent wouldn't "see" attached images and would instead try to use the
view_file tool:
<img width="1516" height="504" alt="image"
src="https://github.com/user-attachments/assets/68a705bb-f962-4fc1-9087-e932a6859b12"
/>

In this PR, we wrap image content items in XML tags with the name of
each image (now just a numbered name like `[Image #1]`), so that the
model can understand inline image references (based on name). We also
put the image content items above the user message which the model seems
to prefer (maybe it's more used to definitions being before references).

We also tweak the view_file tool description which seemed to help a bit

Results on a simple eval set of images:

Before
<img width="980" height="310" alt="image"
src="https://github.com/user-attachments/assets/ba838651-2565-4684-a12e-81a36641bf86"
/>

After
<img width="918" height="322" alt="image"
src="https://github.com/user-attachments/assets/10a81951-7ee6-415e-a27e-e7a3fd0aee6f"
/>

```json
[
  {
    "id": "single_describe",
    "prompt": "Describe the attached image in one sentence.",
    "images": ["image_a.png"]
  },
  {
    "id": "single_color",
    "prompt": "What is the dominant color in the image? Answer with a single color word.",
    "images": ["image_b.png"]
  },
  {
    "id": "orientation_check",
    "prompt": "Is the image portrait or landscape? Answer in one sentence.",
    "images": ["image_c.png"]
  },
  {
    "id": "detail_request",
    "prompt": "Look closely at the image and call out any small details you notice.",
    "images": ["image_d.png"]
  },
  {
    "id": "two_images_compare",
    "prompt": "I attached two images. Are they the same or different? Briefly explain.",
    "images": ["image_a.png", "image_b.png"]
  },
  {
    "id": "two_images_captions",
    "prompt": "Provide a short caption for each image (Image 1, Image 2).",
    "images": ["image_c.png", "image_d.png"]
  },
  {
    "id": "multi_image_rank",
    "prompt": "Rank the attached images from most colorful to least colorful.",
    "images": ["image_a.png", "image_b.png", "image_c.png"]
  },
  {
    "id": "multi_image_choice",
    "prompt": "Which image looks more vibrant? Answer with 'Image 1' or 'Image 2'.",
    "images": ["image_b.png", "image_d.png"]
  }
]
```
2026-01-09 21:33:45 -08:00
..
account.rs fix: move account struct to app-server-protocol and use camelCase (#5829) 2025-10-27 14:06:13 -07:00
approvals.rs Removed experimental "command risk assessment" feature (#7799) 2025-12-10 09:48:11 -08:00
config_types.rs fix: add tui.alternate_screen config and --no-alt-screen CLI flag for Zellij scrollback (#8555) 2026-01-09 18:38:26 +00:00
custom_prompts.rs [app-server] remove serde(skip_serializing_if = "Option::is_none") annotations (#5939) 2025-10-30 18:18:53 +00:00
items.rs [app-server] small fixes for JSON schema export and one-of types (#6614) 2025-11-13 16:25:17 -08:00
lib.rs chore: unify conversation with thread name (#8830) 2026-01-07 17:04:53 +00:00
message_history.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
models.rs Label attached images so agent can understand in-message labels (#8950) 2026-01-09 21:33:45 -08:00
num_format.rs Auto compact at ~90% (#5292) 2025-10-20 11:29:49 -07:00
openai_models.rs Merge Modelfamily into modelinfo (#8763) 2026-01-07 10:35:09 -08:00
parse_command.rs [app-server] remove serde(skip_serializing_if = "Option::is_none") annotations (#5939) 2025-10-30 18:18:53 +00:00
plan_tool.rs [app-server] remove serde(skip_serializing_if = "Option::is_none") annotations (#5939) 2025-10-30 18:18:53 +00:00
protocol.rs renaming: task to turn (#8963) 2026-01-09 17:31:17 +00:00
thread_id.rs chore: unify conversation with thread name (#8830) 2026-01-07 17:04:53 +00:00
user_input.rs Inject SKILL.md when it's explicitly mentioned. (#7763) 2025-12-10 13:59:17 -08:00