如果请求响应为 404 或 505,如何跳过页面
How to skip page if request response is 404 or 505
我在 python 写了一个爬虫。不幸的是,当爬虫遇到 404
或 505
页面时,它会停止工作。我怎样才能在我的循环中跳过这些页面以避免这个问题?
这是我的代码:
import requests
from bs4 import BeautifulSoup
import time
c = int(40622)
a = 10
for a in range(10):
url = 'https://example.com/rockery/'+str(c)
c = int(c) + 1
print('-------------------------------------------------------------------------------------')
print(url)
print(c)
time.sleep(5)
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
name = soup.find('a', attrs={'class': 'name-hyperlink'})
name_final = name.text
name_details = soup.find('div', attrs={'class': 'post-text'})
name_details_final = name_details.text
name_taglist = soup.find('div', attrs={'class': 'post-taglist'})
name_taglist_final = name_taglist.text
name_accepted_tmp = soup.find('div', attrs={'class': 'accepted-name'})
name_accepted = name_accepted_tmp.find('div', attrs={'class': 'post-text'})
name_accepted_final = name_accepted.text
print('q_title=',name_final,'\nq_details=',name_details,'\nq_answer=',name_accepted)
print('-------------------------------------------------------------------------------------')
这是我在点击 404
或 505
页面时遇到的错误:
error
Traceback (most recent call last):
File "scrab.py", line 18, in
name_final = name.text
AttributeError: 'NoneType' object has no attribute 'text'
检查响应的状态代码,如果它不是 200(确定),您可以通过使用 continue
语句进入循环中的下一个迭代来跳过它:
response = requests.get(url)
if response.status_code != 200: #could also check == requests.codes.ok
continue
我在 python 写了一个爬虫。不幸的是,当爬虫遇到 404
或 505
页面时,它会停止工作。我怎样才能在我的循环中跳过这些页面以避免这个问题?
这是我的代码:
import requests
from bs4 import BeautifulSoup
import time
c = int(40622)
a = 10
for a in range(10):
url = 'https://example.com/rockery/'+str(c)
c = int(c) + 1
print('-------------------------------------------------------------------------------------')
print(url)
print(c)
time.sleep(5)
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, "html.parser")
name = soup.find('a', attrs={'class': 'name-hyperlink'})
name_final = name.text
name_details = soup.find('div', attrs={'class': 'post-text'})
name_details_final = name_details.text
name_taglist = soup.find('div', attrs={'class': 'post-taglist'})
name_taglist_final = name_taglist.text
name_accepted_tmp = soup.find('div', attrs={'class': 'accepted-name'})
name_accepted = name_accepted_tmp.find('div', attrs={'class': 'post-text'})
name_accepted_final = name_accepted.text
print('q_title=',name_final,'\nq_details=',name_details,'\nq_answer=',name_accepted)
print('-------------------------------------------------------------------------------------')
这是我在点击 404
或 505
页面时遇到的错误:
error
Traceback (most recent call last):
File "scrab.py", line 18, in
name_final = name.text
AttributeError: 'NoneType' object has no attribute 'text'
检查响应的状态代码,如果它不是 200(确定),您可以通过使用 continue
语句进入循环中的下一个迭代来跳过它:
response = requests.get(url)
if response.status_code != 200: #could also check == requests.codes.ok
continue