Haeminway haeminway
한국어
Back to Tech Notes
1 min read

External Calls Fail Sometimes: Exponential Backoff and a Retry Budget

Transient failures like 429, 503, and timeouts can be recovered by retrying. But never retry a non-idempotent write without an idempotency key, and stop before 6 minutes.

External and service calls fail intermittently: build retries in from the start. But don’t retry just anything. First classify the operation: idempotent (safe) / safe-with-key / non-idempotent (risky).

Why it matters

Without retries, a momentary 429 fails the whole job. Retry a non-idempotent write blindly and the same payment or the same email goes out twice. Both are expensive.

Transient failures only, with backoff

Retry on: 408, 429, 500, 502, 503, 504, and network timeouts. Everything else (400, 401, 403) will fail the same way, so stop immediately.

function withBackoff(fn, { max = 5 } = {}) {
  let wait = 400;
  for (let attempt = 1; ; attempt++) {
    try {
      return fn();
    } catch (err) {
      if (attempt >= max || !isTransient(err)) throw err;
      Utilities.sleep(wait + Math.floor(Math.random() * 200)); // jitter
      wait = Math.min(wait * 2, 8000); // exponential, capped
    }
  }
}
  • Add jitter so multiple runs don’t retry at the same instant.
  • Keep max elapsed time under the 6-minute limit. If it might exceed, save a cursor and continue on the next run.

Non-idempotent writes need an idempotency key

// process once even if the same request arrives twice
if (props.getProperty("done:" + idemKey)) return cached;
const result = doWrite();
props.setProperty("done:" + idemKey, "1");

Deeper: normalize partial failure

Batch calls with fetchAll and some may fail. Normalize results as { ok, status, body, error }, and instead of failing everything, proceed with the successes and retry only the failures. Log a correlation id, not the raw payload.

One line to keep: retry only transient failures with backoff + jitter, and guard non-idempotent writes with an idempotency key.

Frequently asked questions

Which HTTP status codes should trigger a retry?
Retry only on transient failures: 408, 429, 500, 502, 503, 504, and network timeouts. Client errors like 400, 401, and 403 will fail the same way on retry, so stop immediately.
Why is retrying a non-idempotent write dangerous?
The same payment or the same email can be sent twice. Use an idempotency key to check whether the operation was already completed before executing the write.
Why add jitter to exponential backoff?
Without jitter, multiple failed runs retry at exactly the same moment and hammer the server again. Adding a small random delay spreads the retries out over time.