当相关变量有值时,为什么我的 BeautifulSoup 代码会收到属性错误?
Why am I receiving an attribute error for my BeautifulSoup code when the variable in question has a value?
我正在使用 Python 3.9.1 与 selenium 和 BeatifulSoup 来为 Tesco 的网站创建我的第一个网络爬虫(一个自学的小项目)。但是,当我 运行 代码时,如下所示,我收到一个属性错误:
Traceback (most recent call last):
File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\Tesco.py", line 37, in <module>
clean_product_data = process_products(html)
File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\Tesco.py", line 23, in process_products
weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
AttributeError: 'NoneType' object has no attribute 'find'
我不确定出了什么问题 - 标题和 URL 部分工作正常,但重量和价格部分 return 这个值。当我尝试打印 product_price 和 product_price_weight 变量时,它们已经 return 编辑了我期望的值(我不会 post 在这里,它只是非常长 HTML).
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome(ChromeDriverManager().install())
def process_products(html):
clean_product_list = []
soup = BeautifulSoup(html, 'html.parser')
products = soup.find_all("div",{"class":"product-tile-wrapper"})
for product in products:
data_dict = {}
product_details = product.find("div",{"class":"product-details--content"})
product_price = product.find("div",{"class":"price-control-wrapper"})
product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})
data_dict['title'] = product_details.find('a').text.strip()
data_dict['product_url'] = ('tesco.com') + (product_details.find('a')['href'])
weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
clean_product_list.append(data_dict)
return clean_product_list
master_list = []
for i in range (1,3):
print (i)
driver.get(f"https://www.tesco.com/groceries/en-GB/shop/fresh-food/all?page={i}&count=48")
html = driver.page_source
driver.maximize_window()
clean_product_data = process_products(html)
master_list.extend(clean_product_data)
print (master_list)
非常感谢任何帮助。
非常感谢,
您可以通过更新 process_products
函数来尝试。再次注意有一些情况,其中您尝试做的一些变量是.find()
return是None
,这只是意味着它没有 find
任何基于 .find()
函数给定参数的元素。
例如这个:
假设这部分代码已经执行
product_details = product.find("div",{"class":"product-details--content"})
现在,如果它找到一个基于 tags
和 class
的元素,它将 return 一个 bs4
对象,但如果没有,它将 return None
这么说吧 returned None
.
所以您的 product_details
变量将是一个 None
对象,所以一旦它在您的代码中再次成为 None
,您就可以执行此操作。同样,product_details
是 None
data_dict['title'] = product_details.find('a').text.strip()
#Another way of saying is
#data_dict['title'] = None.find('a').text.strip() ##Clearly an ERROR
所以我在这里做的是把它放在 try
except
中来简单地捕获这些错误并给你空字符串,表明你可能正在尝试做一个 .find()
returns a None
或者可能是一些错误(关键是没有相关数据被 returned),这就是为什么我使用 try
except
但你也可以从中制作一个 if
else
,但我认为在 try
except
中制作更好。
def process_products(html):
clean_product_list = []
soup = BeautifulSoup(html, 'html.parser')
products = soup.find_all("div",{"class":"product-tile-wrapper"})
for product in products:
data_dict = {}
product_details = product.find("div",{"class":"product-details--content"})
product_price = product.find("div",{"class":"price-control-wrapper"})
product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})
try:
data_dict['title'] = product_details.find('a').text.strip()
data_dict['product_url'] = ('tesco.com') + (product_details.find('a')['href'])
except BaseException as no_prod_details:
'''
This would mean that your product_details variable might be equal to None, so catching the error & setting
yoour data with empty strings, indicating it can't do a .find()
'''
data_dict['title'] = ''
data_dict['product_url'] = ''
try:
data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
except BaseException as no_prod_price:
#Same here
data_dict['price'] =''
try:
weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
except BaseException as no_prod_price_weigth:
#Same here again
weight = ''
data_dict['price'+weight] = ''
clean_product_list.append(data_dict)
return clean_product_list
我正在使用 Python 3.9.1 与 selenium 和 BeatifulSoup 来为 Tesco 的网站创建我的第一个网络爬虫(一个自学的小项目)。但是,当我 运行 代码时,如下所示,我收到一个属性错误:
Traceback (most recent call last):
File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\Tesco.py", line 37, in <module>
clean_product_data = process_products(html)
File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\Tesco.py", line 23, in process_products
weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
AttributeError: 'NoneType' object has no attribute 'find'
我不确定出了什么问题 - 标题和 URL 部分工作正常,但重量和价格部分 return 这个值。当我尝试打印 product_price 和 product_price_weight 变量时,它们已经 return 编辑了我期望的值(我不会 post 在这里,它只是非常长 HTML).
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome(ChromeDriverManager().install())
def process_products(html):
clean_product_list = []
soup = BeautifulSoup(html, 'html.parser')
products = soup.find_all("div",{"class":"product-tile-wrapper"})
for product in products:
data_dict = {}
product_details = product.find("div",{"class":"product-details--content"})
product_price = product.find("div",{"class":"price-control-wrapper"})
product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})
data_dict['title'] = product_details.find('a').text.strip()
data_dict['product_url'] = ('tesco.com') + (product_details.find('a')['href'])
weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
clean_product_list.append(data_dict)
return clean_product_list
master_list = []
for i in range (1,3):
print (i)
driver.get(f"https://www.tesco.com/groceries/en-GB/shop/fresh-food/all?page={i}&count=48")
html = driver.page_source
driver.maximize_window()
clean_product_data = process_products(html)
master_list.extend(clean_product_data)
print (master_list)
非常感谢任何帮助。 非常感谢,
您可以通过更新 process_products
函数来尝试。再次注意有一些情况,其中您尝试做的一些变量是.find()
return是None
,这只是意味着它没有 find
任何基于 .find()
函数给定参数的元素。
例如这个:
假设这部分代码已经执行
product_details = product.find("div",{"class":"product-details--content"})
现在,如果它找到一个基于 tags
和 class
的元素,它将 return 一个 bs4
对象,但如果没有,它将 return None
这么说吧 returned None
.
所以您的 product_details
变量将是一个 None
对象,所以一旦它在您的代码中再次成为 None
,您就可以执行此操作。同样,product_details
是 None
data_dict['title'] = product_details.find('a').text.strip()
#Another way of saying is
#data_dict['title'] = None.find('a').text.strip() ##Clearly an ERROR
所以我在这里做的是把它放在 try
except
中来简单地捕获这些错误并给你空字符串,表明你可能正在尝试做一个 .find()
returns a None
或者可能是一些错误(关键是没有相关数据被 returned),这就是为什么我使用 try
except
但你也可以从中制作一个 if
else
,但我认为在 try
except
中制作更好。
def process_products(html):
clean_product_list = []
soup = BeautifulSoup(html, 'html.parser')
products = soup.find_all("div",{"class":"product-tile-wrapper"})
for product in products:
data_dict = {}
product_details = product.find("div",{"class":"product-details--content"})
product_price = product.find("div",{"class":"price-control-wrapper"})
product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})
try:
data_dict['title'] = product_details.find('a').text.strip()
data_dict['product_url'] = ('tesco.com') + (product_details.find('a')['href'])
except BaseException as no_prod_details:
'''
This would mean that your product_details variable might be equal to None, so catching the error & setting
yoour data with empty strings, indicating it can't do a .find()
'''
data_dict['title'] = ''
data_dict['product_url'] = ''
try:
data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
except BaseException as no_prod_price:
#Same here
data_dict['price'] =''
try:
weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
except BaseException as no_prod_price_weigth:
#Same here again
weight = ''
data_dict['price'+weight] = ''
clean_product_list.append(data_dict)
return clean_product_list