BeautifulSoup 查找 returns NoneType 时的错误处理
BeautifulSoup error handling when find returns NoneType
我正在从一个网站上抓取搜索结果,其中每个结果都包含在 a 中,并且有一系列与之关联的数据。然而,其中一些数据值丢失了,当它们丢失时,将返回错误“'NoneType' 对象没有属性 'text'”。
我已经放入了 try/except 块。当前,当缺少其中一个值时,将跳过整个搜索结果。在我要保存到的 xls 文件中,我该怎么做才能将缺失值替换为“”或空白?
我的代码如下:
divs = soup.find_all("div", class_="result-item standard") + soup.find_all("div", class_="result-item standard basic-ad")
for div in divs:
try:
#item_title = " ".join(div.h2.a.text.split())
item = div.h2.a.text.split()
item_year = item[0]
item_make = item[1]
item_model = ""
for i in range (2,len(item)):
item_model = item_model + item[i] + " "
item_eng = div.find("li", "item-engine").text
item_trans = div.find("li", "item-transmission").text
item_body = div.find("li", "item-body").text
item_odostr = div.find("li", "item-odometer").text
item_odo = ''.join(c for c in item_odostr if c.isdigit())
item_pricestr = " ".join(div.find("div", "primary-price").text.split())
item_price = ''.join(c for c in item_pricestr if c.isdigit())
item_adtype = div.find("div", "ad-type").span.text
#item_distance = div.find("a", "distance-from-me-link").text
item_loc = div.find("div", "call-to-action").p.text
item_row = (str(x),item_year,item_make,item_model,item_eng,item_trans,item_body,item_odo,item_price,item_adtype,item_loc)
print ",".join(item_row)
print(" ")
for i in range(len(item_row)):
ws.write(x,i,item_row[i])
if x % 500 == 0 :
wb.save("data.xls")
except AttributeError as e:
with open("error"+str(x)+".txt", "w+") as error_file:
error_file.write(div.text.encode("utf-8"))
例如:
item_eng = div.find("li", "item-engine").text if div.find("li", "item-engine") else ''
或:
item_eng = div.find("li", "item-engine").text if len(div.find_all("li", "item-engine"))!=0 else ''
我正在从一个网站上抓取搜索结果,其中每个结果都包含在 a 中,并且有一系列与之关联的数据。然而,其中一些数据值丢失了,当它们丢失时,将返回错误“'NoneType' 对象没有属性 'text'”。
我已经放入了 try/except 块。当前,当缺少其中一个值时,将跳过整个搜索结果。在我要保存到的 xls 文件中,我该怎么做才能将缺失值替换为“”或空白?
我的代码如下:
divs = soup.find_all("div", class_="result-item standard") + soup.find_all("div", class_="result-item standard basic-ad")
for div in divs:
try:
#item_title = " ".join(div.h2.a.text.split())
item = div.h2.a.text.split()
item_year = item[0]
item_make = item[1]
item_model = ""
for i in range (2,len(item)):
item_model = item_model + item[i] + " "
item_eng = div.find("li", "item-engine").text
item_trans = div.find("li", "item-transmission").text
item_body = div.find("li", "item-body").text
item_odostr = div.find("li", "item-odometer").text
item_odo = ''.join(c for c in item_odostr if c.isdigit())
item_pricestr = " ".join(div.find("div", "primary-price").text.split())
item_price = ''.join(c for c in item_pricestr if c.isdigit())
item_adtype = div.find("div", "ad-type").span.text
#item_distance = div.find("a", "distance-from-me-link").text
item_loc = div.find("div", "call-to-action").p.text
item_row = (str(x),item_year,item_make,item_model,item_eng,item_trans,item_body,item_odo,item_price,item_adtype,item_loc)
print ",".join(item_row)
print(" ")
for i in range(len(item_row)):
ws.write(x,i,item_row[i])
if x % 500 == 0 :
wb.save("data.xls")
except AttributeError as e:
with open("error"+str(x)+".txt", "w+") as error_file:
error_file.write(div.text.encode("utf-8"))
例如:
item_eng = div.find("li", "item-engine").text if div.find("li", "item-engine") else ''
或:
item_eng = div.find("li", "item-engine").text if len(div.find_all("li", "item-engine"))!=0 else ''