bs4: 在 for 循环中跳过 AttributeError

Question

我第一次进行网络抓取，运行遇到了问题。我必须获取某些产品的产品价格（代码中的 url），但是，当产品有折扣时，它会报错。这是我现在的代码（删除了中间的几行，但它是这样工作的）：

import requests
from bs4 import BeautifulSoup
#import csv
#import pandas as pd

links = []


url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,2):
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for link in soup.select('div[class="product-card-portrait_content__2xN-b"] a'):
        abs_url = 'https://www.ah.nl' + link.get('href')
        #print(abs_url)
   
        #GETTING THE PRICE
        p_price = []
        req4 = requests.get(abs_url)
        soup = BeautifulSoup(req4.content, 'html.parser')
        p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u')).text

输出是这样的：

runcell(0, '/Users/eva/Desktop/MDDD/TestingProduct.py')
0.55
0.35
2.99
0.65
Traceback (most recent call last):

  File "/Applications/Spyder.app/Contents/Resources/lib/python3.9/spyder_kernels/py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "/Users/eva/Desktop/MDDD/TestingProduct.py", line 22, in <module>
    p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u')).text

AttributeError: 'NoneType' object has no attribute 'text'

所以它给了我前四个价格，然后是错误。我试图解决它添加这个：

 if p_price != AttributeError:continue

但这没有用。我不介意有折扣的产品是否不在数据集中。关于如何保持 for 循环运行的任何提示 - 因此删除给出错误的价格？

谢谢！

Answer 1

您收到 NoneType 错误，因为所有商品均未包含价格，要消除此错误，您可以使用 if else None statement

import requests
from bs4 import BeautifulSoup
#import csv
#import pandas as pd

links = []


url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'
for page in range(1,2):
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for link in soup.select('div[class="product-card-portrait_content__2xN-b"] a'):
        abs_url = 'https://www.ah.nl' + link.get('href')
        #print(abs_url)
   
        #GETTING THE PRICE
        p_price = []
        req4 = requests.get(abs_url)
        soup = BeautifulSoup(req4.content, 'html.parser')
        p_price = (soup.find(class_='price-amount_root__37xv2 product-card-hero-price_now__PlF9u'))
        p_price = p_price.text if p_price else None
        print(p_price)

输出：

0.55
0.35
2.99
0.65
None
0.49
0.65
0.59
0.92
0.99
0.79
0.55
0.65
2.19
3.00
2.00
0.89
0.89
0.66
0.89
0.89
1.25
1.99
1.19
0.99
0.79
1.79
1.99
1.79
6.29
1.19
1.39
2.19
0.65
1.95
0.79
None
None
None
None
2.00
2.29
0.49
1.29
1.55
1.59
1.39
2.99
2.00
0.99
1.39
1.65
1.19
0.99
0.99
2.29
1.99
2.69
0.49
0.99
0.79
2.19
2.00
3.69
0.89
2.29
0.45
1.85
2.00
2.00
5.99
1.09
2.79
1.19
3.29
0.95

Answer 2

您可以从结果页面获取价格，而无需访问每个人 link。此外，通过明智地使用 css :not() pseudo-class 选择器，您可以排除出现折扣的旧价格，然后消除错误：

import requests
from bs4 import BeautifulSoup

url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'

for page in range(1, 3):
    print(f"Page: {page}")
    print()
    req = requests.get(url.format(page=page))
    soup = BeautifulSoup(req.content, 'html.parser')

    for product in soup.select('[data-testhook="product-card"]'):
        print(product.select_one('[data-testhook="product-title"]').get_text(strip=True))
        print(product.select_one('[data-testhook="price-amount"]:not([class*=price_was])').text)
    print()

如果想检查麸质，一些产品页面上有一个不含麸质的图标。警惕误报，例如AH Tomatenblokjes gesneden 单罐标明不含麸质，但 AH Tomatenblokjes gesneden 4 件装 没有此类标签。相同的产品，不同的标签。

import requests
from bs4 import BeautifulSoup

url='https://www.ah.nl/producten/pasta-rijst-en-wereldkeuken?page={page}'

with requests.Session() as s:
    for page in range(1, 3):
        print(f"Page: {page}")
        print()
        req = s.get(url.format(page=page))
        soup = BeautifulSoup(req.content, 'html.parser')

        for product in soup.select('[data-testhook="product-card"]'):
            print(product.select_one('[data-testhook="product-title"]').get_text(strip=True))
            print(product.select_one('[data-testhook="price-amount"]:not([class*=price_was])').text)
            print(f"Gluten free: {'svg--glutenvrij' in s.get('https://www.ah.nl' + product.select_one('a[class^=link_root]')['href']).text}")
            print()

bs4: 在 for 循环中跳过 AttributeError

bs4: skipping AttributeError in for loop

python

beautifulsoup

attributeerror

web-scraping