core-agent-ide/codex-rs/protocol/src
Eric Traut f8af4f5c8d
Added model summary and risk assessment for commands that violate sandbox policy (#5536)
This PR adds support for a model-based summary and risk assessment for
commands that violate the sandbox policy and require user approval. This
aids the user in evaluating whether the command should be approved.

The feature works by taking a failed command and passing it back to the
model and asking it to summarize the command, give it a risk level (low,
medium, high) and a risk category (e.g. "data deletion" or "data
exfiltration"). It uses a new conversation thread so the context in the
existing thread doesn't influence the answer. If the call to the model
fails or takes longer than 5 seconds, it falls back to the current
behavior.

For now, this is an experimental feature and is gated by a config key
`experimental_sandbox_command_assessment`.

Here is a screen shot of the approval prompt showing the risk assessment
and summary.

<img width="723" height="282" alt="image"
src="https://github.com/user-attachments/assets/4597dd7c-d5a0-4e9f-9d13-414bd082fd6b"
/>
2025-10-24 15:23:44 -07:00
..
account.rs [app-server] add new account method API stubs (#5527) 2025-10-22 15:36:11 -07:00
approvals.rs Added model summary and risk assessment for commands that violate sandbox policy (#5536) 2025-10-24 15:23:44 -07:00
config_types.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
conversation_id.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
custom_prompts.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
items.rs Add new thread items and rewire event parsing to use them (#5418) 2025-10-22 10:14:50 -07:00
lib.rs Added model summary and risk assessment for commands that violate sandbox policy (#5536) 2025-10-24 15:23:44 -07:00
message_history.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
models.rs Add ItemStarted/ItemCompleted events for UserInputItem (#5306) 2025-10-20 13:34:44 -07:00
num_format.rs Auto compact at ~90% (#5292) 2025-10-20 11:29:49 -07:00
parse_command.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
plan_tool.rs Generate JSON schema for app-server protocol (#5063) 2025-10-20 11:45:11 -07:00
protocol.rs Added model summary and risk assessment for commands that violate sandbox policy (#5536) 2025-10-24 15:23:44 -07:00
user_input.rs Add ItemStarted/ItemCompleted events for UserInputItem (#5306) 2025-10-20 13:34:44 -07:00