如何重试特定异常的请求

How to retry requests on specific exception

我目前一直在使用我自己的“重试”功能,我想重试直到请求有效。在某些情况下,如果我遇到任何 5xx 状态,我应该延迟很长时间重试。

如果我点击特定的状态代码,例如200 或 404,它不应该引发状态码,否则引发它。

所以我做了这样的事情:

import time

import requests
from bs4 import BeautifulSoup
from requests import (
    RequestException,
    Timeout
)


def do_request():
    try:
        # There is some scenarios where I would use my own proxies by doing
        # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
        while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
            print("sleeping")
            time.sleep(20)

        if response.status_code not in (200, 404):
            response.raise_for_status()

        print("Successful requests!")

        soup = BeautifulSoup(response.text, 'html.parser')

        for link in soup.find_all("a", {"class": "media__link"}):
            yield link.get('href')

    except Timeout as err:
        print(f"Retry due to timed out: {err}")

    except RequestException as err:
        raise RequestException("Unexpected request error")


# ----------------------------------------------------#

if __name__ == '__main__':
    for found_links in do_request():
        print(found_links)

我现在的问题是我故意将超时设置为 0.1 以触发异常 Timeout 发生,我希望它在这里发生的是它应该再次重试请求成功了。

目前它停在那里,我想知道如果遇到超时但我没有引发错误,我应该怎么做才能再次重试请求?

您可以在您的案例中从自身递归调用函数,但要注意意外的边缘情况:

def do_request(retry: int = 3):
    try:
        # There is some scenarios where I would use my own proxies by doing
        # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
        while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
            print("sleeping")
            time.sleep(20)

        if response.status_code not in (200, 404):
            response.raise_for_status()

        print("Successful requests!")

        soup = BeautifulSoup(response.text, 'html.parser')

        for link in soup.find_all("a", {"class": "media__link"}):
            yield link.get('href')

    except Timeout as err:
        if retry:
            print(f"Retry due to timed out: {err}")
            yield from do_request(retry=retry - 1)
        else:
            raise

    except RequestException as err:
        raise RequestException("Unexpected request error")

这将尝试 3 次(或您在参数中设置的次数)直到 retry 等于 0 或直到遇到另一个错误

我会把它放在一个 while 循环中,并在完成操作时打破循环。

样本:

def do_request():
    while True:
        try:
            # There is some scenarios where I would use my own proxies by doing
            # requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
            while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
                print("sleeping")
                time.sleep(20)

            if response.status_code not in (200, 404):
                response.raise_for_status()

            print("Successful requests!")

            soup = BeautifulSoup(response.text, 'html.parser')

            for link in soup.find_all("a", {"class": "media__link"}):
                yield link.get('href')
            break
        except Timeout as err:
            print(f"Retry due to timed out: {err}")

        except RequestException as err:
            raise RequestException("Unexpected request error")

您还可以在每次试验之间添加 time.sleep(0.1)

tenacity 优雅地解决了各种重试问题。

对于你的问题,只需像这样添加一个装饰器:

@retry(retry=retry_if_exception_type(Timeout))
def do_request():
    while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
        print("sleeping")
        time.sleep(20)

    if response.status_code not in (200, 404):
        response.raise_for_status()

    print("Successful requests!")

    soup = BeautifulSoup(response.text, 'html.parser')

    for link in soup.find_all("a", {"class": "media__link"}):
        yield link.get('href')