Stabilize approval matrix write-file command (#14968)
## What is flaky The approval-matrix `WriteFile` scenario is flaky. It sometimes fails in CI even though the approval logic is unchanged, because the test delegates the file write and readback to shell parsing instead of deterministic file I/O. ## Why it was flaky The test generated a command shaped like `printf ... > file && cat file`. That means the scenario depended on shell quoting, redirection, newline handling, and encoding behavior in addition to the approval system it was actually trying to validate. If the shell interpreted the payload differently, the test would report an approval failure even though the product logic was fine. That also made failures hard to diagnose, because the test did not log the exact generated command or the parsed result payload. ## How this PR fixes it This PR replaces the shell-redirection path with a deterministic `python3 -c` script that writes the file with `Path.write_text(..., encoding='utf-8')` and then reads it back with the same UTF-8 path. It also logs the generated command and the resulting exit code/stdout for the approval scenario so any future failure is directly attributable. ## Why this fix fixes the flakiness The scenario no longer depends on shell parsing and redirection semantics. The file contents are produced and read through explicit UTF-8 file I/O, so the approval test is measuring approval behavior instead of shell behavior. The added diagnostics mean a future failure will show the exact command/result pair instead of looking like a generic intermittent mismatch. Co-authored-by: Ahmed Ibrahim <219906144+aibrahim-oai@users.noreply.github.com> Co-authored-by: Codex <noreply@openai.com>
This commit is contained in:
parent
23a44ddbe8
commit
4d9d4b7b0f
1 changed files with 12 additions and 1 deletions
|
|
@ -122,7 +122,11 @@ impl ActionKind {
|
|||
ActionKind::WriteFile { target, content } => {
|
||||
let (path, _) = target.resolve_for_patch(test);
|
||||
let _ = fs::remove_file(&path);
|
||||
let command = format!("printf {content:?} > {path:?} && cat {path:?}");
|
||||
let path_str = path.display().to_string();
|
||||
let script = format!(
|
||||
"from pathlib import Path; path = Path({path_str:?}); content = {content:?}; path.write_text(content, encoding='utf-8'); print(path.read_text(encoding='utf-8'), end='')",
|
||||
);
|
||||
let command = format!("python3 -c {script:?}");
|
||||
let event = shell_event(call_id, &command, 5_000, sandbox_permissions)?;
|
||||
Ok((event, Some(command)))
|
||||
}
|
||||
|
|
@ -1611,6 +1615,9 @@ async fn run_scenario(scenario: &ScenarioSpec) -> Result<()> {
|
|||
.action
|
||||
.prepare(&test, &server, call_id, scenario.sandbox_permissions)
|
||||
.await?;
|
||||
if let Some(command) = expected_command.as_deref() {
|
||||
eprintln!("approval scenario {} command: {command}", scenario.name);
|
||||
}
|
||||
|
||||
let _ = mount_sse_once(
|
||||
&server,
|
||||
|
|
@ -1692,6 +1699,10 @@ async fn run_scenario(scenario: &ScenarioSpec) -> Result<()> {
|
|||
|
||||
let output_item = results_mock.single_request().function_call_output(call_id);
|
||||
let result = parse_result(&output_item);
|
||||
eprintln!(
|
||||
"approval scenario {} result: exit_code={:?} stdout={:?}",
|
||||
scenario.name, result.exit_code, result.stdout
|
||||
);
|
||||
scenario.expectation.verify(&test, &result)?;
|
||||
|
||||
Ok(())
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue