如何将 Backoff 脚本插入到我的网络抓取中
How do I insert Backoff script into my web scrape
我想在我的网络抓取中使用包 "Backoff",但我无法让它工作。我在哪里插入它?如何让 "r = requests..." 仍然被识别?
我试过以各种方式将该语句放入我的代码中,但它不起作用。我希望能够将它用于包的预期目的。谢谢!
要插入的代码
@backoff.on_exception(backoff.expo,
requests.exceptions.RequestException,
max_time=60)
def get_url(what goes here?):
return requests.get(what goes here?)
现有代码:
import os
import requests
import re
import backoff
asin_list = ['B079QHML21']
urls = []
print('Scrape Started')
for asin in asin_list:
product_url = f'https://www.amazon.com/dp/{asin}'
urls.append(product_url)
base_search_url = 'https://www.amazon.com'
scraper_url = 'http://api.scraperapi.com'
while len(urls) > 0:
url = urls.pop(0)
payload = {key, url} #--specific parameters
r = requests.get(scraper_url, params=payload)
print("we got a {} response code from {}".format(r.status_code, url))
soup = BeautifulSoup(r.text, 'lxml')
#Scraping Below#
我希望 "Backoff" 代码能够按照代码中的设计工作,重试 500 次错误并且没有失败
而不是直接调用:
requests.get(scraper_url, params=payload)
更改 get_url
以完全做到这一点,然后调用 get_url
:
@backoff.on_exception(backoff.expo,
requests.exceptions.RequestException,
max_time=60)
def get_url(scraper_url, payload):
return requests.get(scraper_url, params=payload)
并在您的代码中代替:
r = requests.get(scraper_url, params=payload)
做:
r = get_url(scraper_url, payload)
我想在我的网络抓取中使用包 "Backoff",但我无法让它工作。我在哪里插入它?如何让 "r = requests..." 仍然被识别?
我试过以各种方式将该语句放入我的代码中,但它不起作用。我希望能够将它用于包的预期目的。谢谢!
要插入的代码
@backoff.on_exception(backoff.expo,
requests.exceptions.RequestException,
max_time=60)
def get_url(what goes here?):
return requests.get(what goes here?)
现有代码:
import os
import requests
import re
import backoff
asin_list = ['B079QHML21']
urls = []
print('Scrape Started')
for asin in asin_list:
product_url = f'https://www.amazon.com/dp/{asin}'
urls.append(product_url)
base_search_url = 'https://www.amazon.com'
scraper_url = 'http://api.scraperapi.com'
while len(urls) > 0:
url = urls.pop(0)
payload = {key, url} #--specific parameters
r = requests.get(scraper_url, params=payload)
print("we got a {} response code from {}".format(r.status_code, url))
soup = BeautifulSoup(r.text, 'lxml')
#Scraping Below#
我希望 "Backoff" 代码能够按照代码中的设计工作,重试 500 次错误并且没有失败
而不是直接调用:
requests.get(scraper_url, params=payload)
更改 get_url
以完全做到这一点,然后调用 get_url
:
@backoff.on_exception(backoff.expo,
requests.exceptions.RequestException,
max_time=60)
def get_url(scraper_url, payload):
return requests.get(scraper_url, params=payload)
并在您的代码中代替:
r = requests.get(scraper_url, params=payload)
做:
r = get_url(scraper_url, payload)