如何在使用带有 for 循环的请求时忽略 HTTP 错误?
How to ignore HTTP errors while using requests with for loop?
这是我的代码,用于检查特定关键字的多个 URL 并在是否找到该关键字时写入输出文件。
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
res = requests.get(url_1)
finalresult= print(keyword in res.text)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
但是,一旦我的多个 URL 中的任何一个出现故障并且出现 HTTP 错误,脚本就会停止并显示以下错误:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
如何忽略此类错误并让我的脚本继续扫描?有人可以帮我吗?谢谢
简单的你可以使用try-except方式
示例:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try:
res = requests.get(url_1)
finalresult = keyword in res.text
print(finalresult)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
except Exception as e:
print(f"There was an error, error = {e}")
pass
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
尽量把try..except
放在requests.get()
和res.text
附近。
例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
finalresult = keyword in res.text # <-- remove print()
except:
finalresult = False
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
编辑:在出现错误时将 Down
放入列表中:
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
if keyword in res.text:
myList.append("OK")
else:
myList.append("NOT OK")
except:
myList.append("Down")
这是我的代码,用于检查特定关键字的多个 URL 并在是否找到该关键字时写入输出文件。
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
res = requests.get(url_1)
finalresult= print(keyword in res.text)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
但是,一旦我的多个 URL 中的任何一个出现故障并且出现 HTTP 错误,脚本就会停止并显示以下错误:
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='argos-yoga.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x122582d90>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))
如何忽略此类错误并让我的脚本继续扫描?有人可以帮我吗?谢谢
简单的你可以使用try-except方式
示例:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try:
res = requests.get(url_1)
finalresult = keyword in res.text
print(finalresult)
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
except Exception as e:
print(f"There was an error, error = {e}")
pass
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
尽量把try..except
放在requests.get()
和res.text
附近。
例如:
import requests
import pandas as pd
from bs4 import BeautifulSoup
df = pd.read_csv('/path/to/input.csv')
urls = df.T.values.tolist()[2]
myList= []
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
finalresult = keyword in res.text # <-- remove print()
except:
finalresult = False
if finalresult == False:
myList.append("NOT OK")
else:
myList.append("OK")
df["myList"] = pd.DataFrame(myList, columns=['myList'])
df.to_csv('/path/to/output.csv', index=False)
编辑:在出现错误时将 Down
放入列表中:
for url in urls:
url_1 = url
keyword ='myKeyword'
try: # <-- put try..except here
res = requests.get(url_1)
if keyword in res.text:
myList.append("OK")
else:
myList.append("NOT OK")
except:
myList.append("Down")