PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
(claude-4-5-sonnet)
Token size
1,424
Example input
[TASK_TYPE]: Bugfix [GOAL]: Fix retry/backoff so global timeouts cannot interrupt backoff sleeps; ensure canonical per-attempt logging at the retry boundary only. [CONSTRAINTS]: Python 3.10+; synchronous code only; no new dependencies; preserve public function signatures; add pytest coverage. [REPO_CONTEXT]: Backend service; helpers under src/utils; standard logging module; pytest tests; flaky upstream failures. [CODE_SNIPPETS]: Paths: src/utils/retry.py (retry_with_backoff), src/utils/timeout_utils.py (global timeout wrapper). Symptom: a global timeout sometimes interrupts backoff sleep, so retries stop early and the original error is not retried as expected. Logging: the same failure produces 3–5 duplicate log lines across layers; requirement is one canonical per-attempt log at the retry boundary only. Example error: TimeoutError during backoff or upstream 502 followed by premature termination.
Example output
1. Analysis Goal Fix the retry/backoff implementation so it cannot be interrupted mid-backoff by a global timeout mechanism, while ensuring only one canonical log entry per failed attempt. Assumptions * The repository is synchronous Python (no asyncio in the affected path). * The current retry helper uses `time.sleep()` for backoff. * A global timeout wrapper exists and can interrupt sleeps, causing premature termination. * Logging is expected to be centralized at the retry boundary (callers should not log per-attempt failures). Unknowns / missing info * Where the global timeout is applied (decorator vs context manager). Not blocking; solution uses a time-budget that can be passed in. * Whether the project standardizes on a metrics library. Not blocking; no metrics are added. Risks * Behavioral/compatibility risk * Backoff timing will change slightly due to jitter and budget-aware clamping. * Reliability/edge-case risk * Extremely small time budgets may result in fewer attempts than before; this is required to respect the budget. * Performance risk * Minimal overhead from `time.monotonic()` calls per attempt. * Security/abuse risk * N/A. Stable interfaces/contracts * Existing public entrypoint `retry_with_backoff(...)` remains callable with the same required arguments. * Callers continue to receive the original exception type on terminal failure. * Logging is emitted only by `retry_with_backoff` and not by lower-level helpers. 2. Architecture & Approach Use a monotonic-deadline time budget that is enforced inside the retry loop. The retry loop computes an exponential backoff delay, applies bounded jitter, and clamps the sleep duration to the remaining time budget. This avoids relying on process-wide alarm signals that can interrupt backoff sleeps. Boundaries * Centralize per-attempt logging in `retry_with_backoff` only. * Centralize time calculations in a dedicated `TimeBudget` utility. Error flow * Retriable exceptions are caught and retried until `max_retries` or budget expiry. * On budget expiry or retries exhausted, re-raise the last exception. 3. Execution Plan 3.1 Files to change * `src/utils/time_budget.py` — introduce a monotonic time budget helper (Create, Low) * `src/utils/retry.py` — enforce budget-aware backoff with jitter; canonical logging (Modify, Medium) * `tests/test_retry.py` — add regression tests for budget behavior and last-exception re-raise (Modify/Create, Low) 3.2 Step-by-step plan 1. Add `TimeBudget` utility. * Rationale: provides a reliable monotonic deadline independent of global alarms. * Result: callers and retry logic can reason about remaining time. 2. Update `retry_with_backoff` to accept an optional budget (or budget seconds) and clamp sleep. * Rationale: prevents sleeping past the allowed time and avoids mid-sleep interruption problems. * Result: deterministic stop conditions. 3. Add bounded jitter to reduce thundering herd. * Rationale: avoids synchronized retries during upstream outages. * Result: improved stability under contention. 4. Add tests to verify: * it stops when the budget expires; * it raises the last exception; * it does not exceed configured retry bounds. 3.3 Acceptance criteria (measurable) * [ ] Backoff sleep duration never exceeds remaining time budget. * [ ] Retry loop never exceeds `max_retries`. * [ ] Terminal failure re-raises the last encountered exception. * [ ] Exactly one log entry is produced per failed attempt (from retry layer only). * [ ] New/updated tests cover budget expiry and retry exhaustion. 3.4 Rollout / migration notes * Backward compatibility: `retry_with_backoff` supports the prior call pattern; budget is optional. * Rollback: revert `src/utils/retry.py` and remove `src/utils/time_budget.py` if needed. 4. Implementation ````diff diff --git a/src/utils/time_budget.py b/src/utils/time_budget.py new file mode 100644 index 0000000..a1b2c3d --- /dev/null +++ b/src/utils/time_budget.py @@ -0,0 +1,55 @@ +from __future__ import annotations + +import time + + +class TimeBudget: + """Monotonic deadline-based time budget. + + This is intentionally independent of any process-wide timeout mechanism + (e.g., SIGALRM) to avoid interrupting backoff sleeps. + """ + + def __init__(self, total_s: float) -> None: + if total_s <= 0: + raise ValueError("total_s must be > 0") + self._deadline = time.monotonic() + total_s + + def remaining_s(self) -> float: + """Return remaining seconds (clamped to >= 0).""" + return max(0.0, self._deadline - time.monotonic()) + + def expired(self) -> bool: + """Return True when the budget is exhausted.""" + return self.remaining_s() <= 0.0 + + +def remaining_or_none(budget: TimeBudget | None) -> float | None: + """Helper for optional budgets.""" + if budget is None: + return None + return budget.remaining_s() diff --git a/src/utils/retry.py b/src/utils/retry.py index 4f5a6b7..8c9d0e1 100644 --- a/src/utils/retry.py +++ b/src/utils/retry.py @@ -1,33 +1,148 @@ +from __future__ import annotations + +import random +import time +from dataclasses import dataclass +from typing import Callable, Optional, Tuple, Type, TypeVar + +from src.utils.time_budget import TimeBudget + +T = TypeVar("T") + + +@dataclass(frozen=True) +class RetryConfig: + max_retries: int = 5 + base_delay_s: float = 0.5 + max_delay_s: float = 8.0 + jitter_ratio: float = 0.2 + + +def _backoff_delay_s(attempt_index: int, cfg: RetryConfig) -> float: + """Exponential backoff with bounded jitter.""" + # attempt_index=0 corresponds to the delay after the first failure. + raw = cfg.base_delay_s * (2 ** attempt_index) + capped = min(raw, cfg.max_delay_s) + + if cfg.jitter_ratio <= 0: + return capped + + jitter = capped * cfg.jitter_ratio + return max(0.0, capped + random.uniform(-jitter, jitter)) + + +def retry_with_backoff( + fn: Callable[[], T], + *, + config: RetryConfig = RetryConfig(), + on_exception: Tuple[Type[BaseException], ...] = (Exception,), + logger: Optional[object] = None, + budget: Optional[TimeBudget] = None, +) -> T: + """Execute fn with retries. + + - Per-attempt logging is emitted here only (canonical logging boundary). + - If budget is provided, sleep is clamped to remaining time. + """ + + if config.max_retries < 0: + raise ValueError("max_retries must be >= 0") + if config.base_delay_s < 0 or config.max_delay_s < 0: + raise ValueError("delay values must be >= 0") + if config.jitter_ratio < 0: + raise ValueError("jitter_ratio must be >= 0") + + last_exc: Optional[BaseException] = None + + for attempt in range(config.max_retries + 1): + try: + return fn() + except on_exception as exc: + last_exc = exc + + # Canonical, centralized logging (one line per failure). + if logger is not None: + # Keep it structured and minimal to avoid duplication. + msg = "retry attempt failed" + try: + logger.warning(msg, extra={"attempt": attempt, "max_retries": config.max_retries, "error": str(exc)}) + except Exception: + # Logging must never break control flow. + pass + + # Stop conditions + if attempt >= config.max_retries: + break + if budget is not None and budget.expired(): + break + + delay_s = _backoff_delay_s(attempt, config) + + # Clamp delay to remaining budget to prevent sleeping beyond the deadline. + if budget is not None: + remaining = budget.remaining_s() + if remaining <= 0.0: + break + delay_s = min(delay_s, remaining) + + time.sleep(delay_s) + + assert last_exc is not None + raise last_exc - -import time - - -def retry_with_backoff(fn, retries=5, base_delay_s=0.5, max_delay_s=8.0): - for i in range(retries): - try: - result = fn() - return result - except Exception: - time.sleep(min(base_delay_s * (2 ** i), max_delay_s)) - return result diff --git a/tests/test_retry.py b/tests/test_retry.py new file mode 100644 index 0000000..f00dbad --- /dev/null +++ b/tests/test_retry.py @@ -0,0 +1,88 @@ +import time + +import pytest + +from src.utils.retry import RetryConfig, retry_with_backoff +from src.utils.time_budget import TimeBudget + + +def test_retry_success_after_transient_failures(): + calls = {"n": 0} + + def fn(): + calls["n"] += 1 + if calls["n"] < 3: + raise ValueError("transient") + return "ok" + + cfg = RetryConfig(max_retries=5, base_delay_s=0.0, max_delay_s=0.0, jitter_ratio=0.0) + assert retry_with_backoff(fn, config=cfg, on_exception=(ValueError,)) == "ok" + assert calls["n"] == 3 + + +def test_retry_raises_last_exception_when_exhausted(): + def fn(): + raise RuntimeError("fail") + + cfg = RetryConfig(max_retries=2, base_delay_s=0.0, max_delay_s=0.0, jitter_ratio=0.0) + with pytest.raises(RuntimeError) as e: + retry_with_backoff(fn, config=cfg, on_exception=(RuntimeError,)) + assert "fail" in str(e.value) + + +def test_retry_respects_time_budget_and_clamps_sleep(): + calls = {"n": 0} + + def fn(): + calls["n"] += 1 + raise ValueError("upstream") + + cfg = RetryConfig(max_retries=100, base_delay_s=0.05, max_delay_s=0.05, jitter_ratio=0.0) + budget = TimeBudget(0.12) + + start = time.monotonic() + with pytest.raises(ValueError): + retry_with_backoff(fn, config=cfg, on_exception=(ValueError,), budget=budget) + elapsed = time.monotonic() - start + + # Loose upper bound for CI variability. + assert elapsed <= 0.30 + assert calls["n"] >= 1 +``` Missing inputs for exact diff - None (example assumes these files exist at shown paths). 5) Tests & Verification 5.1 Test changes - Added `tests/test_retry.py`. Concrete test cases 1. `test_retry_success_after_transient_failures` - Setup: function fails twice then succeeds. - Action: call `retry_with_backoff` with zero delays. - Assertion: returns success and calls == 3. 2. `test_retry_raises_last_exception_when_exhausted` - Setup: function always raises RuntimeError. - Action: call with `max_retries=2`. - Assertion: raises RuntimeError with message. 3. `test_retry_respects_time_budget_and_clamps_sleep` - Setup: function always raises; small time budget. - Action: call with fixed delay and budget. - Assertion: elapsed time stays within bound; attempts >= 1. 5.2 Commands - pytest -q 5.3 Manual verification - Wrap a flaky upstream call with `retry_with_backoff(..., budget=TimeBudget(5.0), logger=app_logger)`. - Confirm logs show exactly one warning per failed attempt and stop once budget is exhausted. 5.4 Non-functional checks - Logging validation: ensure callers no longer log per-attempt failures. - Timeout/retry behavior validation: confirm no mid-backoff interruptions and consistent stop conditions. 6) Final Report What changed - `src/utils/time_budget.py`: added a monotonic `TimeBudget` utility. - `src/utils/retry.py`: implemented budget-aware backoff with bounded jitter and centralized per-attempt logging. - `tests/test_retry.py`: added regression coverage for retry exhaustion and time budget behavior. Why it is correct - Enforces that sleep never exceeds remaining time budget, satisfying the budget acceptance criterion. - Re-raises the last exception on terminal failure. - Keeps retry bounds deterministic via `max_retries`. Risks remaining and mitigations - Slightly different retry timing due to jitter: can set `jitter_ratio=0.0` to restore deterministic delays. Follow-up recommendations - Standardize a single logger interface (adapter) for structured logging across the repo. - Consider adding metrics counters for retries if observability requirements grow.
🌀 Claude

Architect Mode Planpatch

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-5-SONNET
Tested icon
Guide icon
4 examples icon
Free credits icon
Turn Claude into a senior software architect for Python projects. This prompt enforces a strict Plan → Execution → Final Report workflow, preventing premature coding and low-quality changes. Claude first analyzes the task, lists affected files, risks, and acceptance criteria, then produces precise unified diffs or Before→After patches, and finishes with a clear implementation report. Ideal for refactoring, bug fixes, and new features in real repositories.
...more
Added over 1 month ago
Report
Browse Marketplace