core-agent-ide/codex-cli
Tomas Cupr e307d007aa
fix: retry on OpenAI server_error even without status code (#814)
Fix: retry on server_error responses that lack an HTTP status code

### What happened

1. An OpenAI endpoint returned a **5xx** (transient server-side
failure).
2. The SDK surfaced it as an `APIError` with

{ "type": "server_error", "message": "...", "status": undefined }

           (The SDK does not always populate `status` for these cases.)
3. Our retry logic in `src/utils/agent/agent-loop.ts` determined

isServerError = typeof status === "number" && status >= 500;

Because `status` was *undefined*, the error was **not** recognised as
retriable, the exception bubbled out, and the CLI crashed with a stack
           trace similar to:

               Error: An error occurred while processing the request.
                   at .../cli.js:474:1514

### Root cause

The transient-error detector ignored the semantic flag type ===
"server_error" that the SDK provides when the numeric status is missing.

#### Fix (1 loc + comment)

Extend the check:

const status = errCtx?.status ?? errCtx?.httpStatus ??
errCtx?.statusCode;

const isServerError = (typeof status === "number" && status >= 500) ||
// classic 5xx
errCtx?.type === "server_error";                   // <-- NEW

Now the agent:

* Retries up to **5** times (existing logic) when the backend reports a
transient failure, even if `status` is absent.
* If all retries fail, surfaces the existing friendly system message
instead of an uncaught exception.

### Tests & validation

pnpm test # all suites green (17 agent-level tests now include this
path)
pnpm run lint    # 0 errors / warnings
pnpm run typecheck

A new unit-test file isn’t required—the behaviour is already covered by
tests/agent-server-retry.test.ts, which stubs type: "server_error" and
now passes with the updated logic.

### Impact

* No API-surface changes.
* Prevents CLI crashes on intermittent OpenAI outages.
* Adds robust handling for other providers that may follow the same
error-shape.
2025-05-10 15:43:03 -07:00
..
bin fix: /bug report command, thinking indicator (#381) 2025-04-18 18:13:34 -07:00
examples fix: typos in prompts and comments (#195) 2025-04-17 07:12:39 -07:00
scripts chore: make build process a single script to run (#757) 2025-05-01 08:36:07 -07:00
src fix: retry on OpenAI server_error even without status code (#814) 2025-05-10 15:43:03 -07:00
tests fix: increase output limits for truncating collector (#575) 2025-05-05 10:26:55 -07:00
.dockerignore (fix) update Docker container scripts (#47) 2025-04-16 12:02:41 -07:00
.editorconfig Initial commit 2025-04-16 12:56:08 -04:00
.eslintrc.cjs Initial commit 2025-04-16 12:56:08 -04:00
.gitignore chore: make build process a single script to run (#757) 2025-05-01 08:36:07 -07:00
build.mjs chore(build): cleanup dist before build (#477) 2025-04-21 12:35:25 -04:00
Dockerfile fix: only allow running without sandbox if explicitly marked in safe container (#699) 2025-04-28 07:48:38 -07:00
HUSKY.md Feat/add husky (#223) 2025-04-17 07:18:43 -07:00
ignore-react-devtools-plugin.js Initial commit 2025-04-16 12:56:08 -04:00
package.json Configure HTTPS agent for proxies (#775) 2025-05-02 12:08:13 -07:00
require-shim.js Initial commit 2025-04-16 12:56:08 -04:00
tsconfig.json fix: /bug report command, thinking indicator (#381) 2025-04-18 18:13:34 -07:00
vite.config.ts fix: add empty vite config file to prevent resolving to parent (#273) 2025-04-17 17:03:15 -07:00