如何重试特定异常的请求
How to retry requests on specific exception
我目前一直在使用我自己的“重试”功能,我想重试直到请求有效。在某些情况下,如果我遇到任何 5xx 状态,我应该延迟很长时间重试。
如果我点击特定的状态代码,例如200 或 404,它不应该引发状态码,否则引发它。
所以我做了这样的事情:
import time
import requests
from bs4 import BeautifulSoup
from requests import (
RequestException,
Timeout
)
def do_request():
try:
# There is some scenarios where I would use my own proxies by doing
# requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
except Timeout as err:
print(f"Retry due to timed out: {err}")
except RequestException as err:
raise RequestException("Unexpected request error")
# ----------------------------------------------------#
if __name__ == '__main__':
for found_links in do_request():
print(found_links)
我现在的问题是我故意将超时设置为 0.1 以触发异常 Timeout
发生,我希望它在这里发生的是它应该再次重试请求成功了。
目前它停在那里,我想知道如果遇到超时但我没有引发错误,我应该怎么做才能再次重试请求?
您可以在您的案例中从自身递归调用函数,但要注意意外的边缘情况:
def do_request(retry: int = 3):
try:
# There is some scenarios where I would use my own proxies by doing
# requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
except Timeout as err:
if retry:
print(f"Retry due to timed out: {err}")
yield from do_request(retry=retry - 1)
else:
raise
except RequestException as err:
raise RequestException("Unexpected request error")
这将尝试 3 次(或您在参数中设置的次数)直到 retry
等于 0
或直到遇到另一个错误
我会把它放在一个 while 循环中,并在完成操作时打破循环。
样本:
def do_request():
while True:
try:
# There is some scenarios where I would use my own proxies by doing
# requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
break
except Timeout as err:
print(f"Retry due to timed out: {err}")
except RequestException as err:
raise RequestException("Unexpected request error")
您还可以在每次试验之间添加 time.sleep(0.1)
。
包 tenacity 优雅地解决了各种重试问题。
对于你的问题,只需像这样添加一个装饰器:
@retry(retry=retry_if_exception_type(Timeout))
def do_request():
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
我目前一直在使用我自己的“重试”功能,我想重试直到请求有效。在某些情况下,如果我遇到任何 5xx 状态,我应该延迟很长时间重试。
如果我点击特定的状态代码,例如200 或 404,它不应该引发状态码,否则引发它。
所以我做了这样的事情:
import time
import requests
from bs4 import BeautifulSoup
from requests import (
RequestException,
Timeout
)
def do_request():
try:
# There is some scenarios where I would use my own proxies by doing
# requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
except Timeout as err:
print(f"Retry due to timed out: {err}")
except RequestException as err:
raise RequestException("Unexpected request error")
# ----------------------------------------------------#
if __name__ == '__main__':
for found_links in do_request():
print(found_links)
我现在的问题是我故意将超时设置为 0.1 以触发异常 Timeout
发生,我希望它在这里发生的是它应该再次重试请求成功了。
目前它停在那里,我想知道如果遇到超时但我没有引发错误,我应该怎么做才能再次重试请求?
您可以在您的案例中从自身递归调用函数,但要注意意外的边缘情况:
def do_request(retry: int = 3):
try:
# There is some scenarios where I would use my own proxies by doing
# requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
except Timeout as err:
if retry:
print(f"Retry due to timed out: {err}")
yield from do_request(retry=retry - 1)
else:
raise
except RequestException as err:
raise RequestException("Unexpected request error")
这将尝试 3 次(或您在参数中设置的次数)直到 retry
等于 0
或直到遇到另一个错误
我会把它放在一个 while 循环中,并在完成操作时打破循环。
样本:
def do_request():
while True:
try:
# There is some scenarios where I would use my own proxies by doing
# requests.get("https://www.bbc.com/", timeout=0.1, proxies={'https': 'xxx.xxxx.xxx.xx'))
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')
break
except Timeout as err:
print(f"Retry due to timed out: {err}")
except RequestException as err:
raise RequestException("Unexpected request error")
您还可以在每次试验之间添加 time.sleep(0.1)
。
包 tenacity 优雅地解决了各种重试问题。
对于你的问题,只需像这样添加一个装饰器:
@retry(retry=retry_if_exception_type(Timeout))
def do_request():
while (response := requests.get("https://www.bbc.com/", timeout=0.1)).status_code >= 500:
print("sleeping")
time.sleep(20)
if response.status_code not in (200, 404):
response.raise_for_status()
print("Successful requests!")
soup = BeautifulSoup(response.text, 'html.parser')
for link in soup.find_all("a", {"class": "media__link"}):
yield link.get('href')